Configuring model guardrails
Introduced 2.13
Guardrails can guide a large language model (LLM) toward desired behavior. They act as a filter, preventing the LLM from generating output that is harmful or violates ethical principles and facilitating safer use of AI. Guardrails also cause the LLM to produce more focused and relevant output.
You can configure guardrails for your LLM using the following methods:
- Provide a list of words to be prohibited in the input or output of the model. Alternatively, you can provide a regular expression against which the model input or output will be matched. For more information, see Validating input/output using stopwords and regex.
- Configure a separate LLM whose purpose is to validate the user input and the LLM output.
Prerequisites
Before you start, make sure you have fulfilled the prerequisites for connecting to an externally hosted model.
Validating input/output using stopwords and regex
Introduced 2.13
A simple way to validate the user input and LLM output is to provide a set of prohibited words (stopwords) or a regular expression for validation.
Step 1: Create a guardrail index
To start, create an index that will store the excluded words (stopwords). In the index settings, specify a title
field, which will contain excluded words, and a query
field of the percolator type. The percolator query will be used to match the LLM input or output:
PUT /words0
{
"mappings": {
"properties": {
"title": {
"type": "text"
},
"query": {
"type": "percolator"
}
}
}
}
Step 2: Index excluded words or phrases
Next, index a query string query that will be used to match excluded words in the model input or output:
PUT /words0/_doc/1?refresh
{
"query": {
"query_string": {
"query": "title: blacklist"
}
}
}
PUT /words0/_doc/2?refresh
{
"query": {
"query_string": {
"query": "title: \"Master slave architecture\""
}
}
}
For more query string options, see Query string query.
Step 3: Register a model group
To register a model group, send the following request:
POST /_plugins/_ml/model_groups/_register
{
"name": "bedrock",
"description": "This is a public model group."
}
The response contains the model group ID that you’ll use to register a model to this model group:
{
"model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
"status": "CREATED"
}
To learn more about model groups, see Model access control.
Step 4: Create a connector
Now you can create a connector for the model. In this example, you’ll create a connector to the Anthropic Claude model hosted on Amazon Bedrock:
POST /_plugins/_ml/connectors/_create
{
"name": "BedRock test claude Connector",
"description": "The connector to BedRock service for claude model",
"version": 1,
"protocol": "aws_sigv4",
"parameters": {
"region": "us-east-1",
"service_name": "bedrock",
"anthropic_version": "bedrock-2023-05-31",
"endpoint": "bedrock.us-east-1.amazonaws.com",
"auth": "Sig_V4",
"content_type": "application/json",
"max_tokens_to_sample": 8000,
"temperature": 0.0001,
"response_filter": "$.completion"
},
"credential": {
"access_key": "<YOUR_ACCESS_KEY>",
"secret_key": "<YOUR_SECRET_KEY>"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-v2/invoke",
"headers": {
"content-type": "application/json",
"x-amz-content-sha256": "required"
},
"request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature}, \"anthropic_version\":\"${parameters.anthropic_version}\" }"
}
]
}
The response contains the connector ID for the newly created connector:
{
"connector_id": "a1eMb4kBJ1eYAeTMAljY"
}
Step 5: Register and deploy the model with guardrails
To register an externally hosted model, provide the model group ID from step 3 and the connector ID from step 4 in the following request. To configure guardrails, include the guardrails
object:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "Bedrock Claude V2 model",
"function_name": "remote",
"model_group_id": "wlcnb4kBJ1eYAeTMHlV6",
"description": "test model",
"connector_id": "a1eMb4kBJ1eYAeTMAljY",
"guardrails": {
"type": "local_regex",
"input_guardrail": {
"stop_words": [
{
"index_name": "words0",
"source_fields": [
"title"
]
}
],
"regex": [
".*abort.*",
".*kill.*"
]
},
"output_guardrail": {
"stop_words": [
{
"index_name": "words0",
"source_fields": [
"title"
]
}
],
"regex": [
".*abort.*",
".*kill.*"
]
}
}
}
For more information, see The guardrails
parameter.
OpenSearch returns the task ID of the register operation:
{
"task_id": "cVeMb4kBJ1eYAeTMFFgj",
"status": "CREATED"
}
To check the status of the operation, provide the task ID to the Tasks API:
GET /_plugins/_ml/tasks/cVeMb4kBJ1eYAeTMFFgj
When the operation is complete, the state changes to COMPLETED
:
{
"model_id": "cleMb4kBJ1eYAeTMFFg4",
"task_type": "DEPLOY_MODEL",
"function_name": "REMOTE",
"state": "COMPLETED",
"worker_node": [
"n-72khvBTBi3bnIIR8FTTw"
],
"create_time": 1689793851077,
"last_update_time": 1689793851101,
"is_async": true
}
Step 6 (Optional): Test the model
To demonstrate how guardrails are applied, first run the predict operation that does not contain any excluded words:
POST /_plugins/_ml/models/p94dYo4BrXGpZpgPp98E/_predict
{
"parameters": {
"prompt": "\n\nHuman:this is a test\n\nnAssistant:"
}
}
The response contains the LLM answer:
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"response": " Thank you for the test, I appreciate you taking the time to interact with me. I'm an AI assistant created by Anthropic to be helpful, harmless, and honest."
}
}
],
"status_code": 200
}
]
}
Then run the predict operation that contains excluded words:
POST /_plugins/_ml/models/p94dYo4BrXGpZpgPp98E/_predict
{
"parameters": {
"prompt": "\n\nHuman:this is a test of Master slave architecture\n\nnAssistant:"
}
}
The response contains an error message because guardrails were triggered:
{
"error": {
"root_cause": [
{
"type": "illegal_argument_exception",
"reason": "guardrails triggered for user input"
}
],
"type": "illegal_argument_exception",
"reason": "guardrails triggered for user input"
},
"status": 400
}
Guardrails are also triggered when a prompt matches the supplied regular expression.
Validating input/output using a guardrail model
Introduced 2.15
For more advanced validation, you can configure a guardrail model—a separate LLM whose purpose is to validate the user input and the LLM output. In this example, you’ll configure two models:
- Chat model: An Anthropic Claude model hosted on Amazon Bedrock whose purpose is to hold a conversation with a user.
- Guardrail model: An OpenAI model whose purpose is to validate the user input and Anthropic Claude output.
Step 1: Create a connector for the guardrail model
First, create a connector to the OpenAI guardrail model. Note that the OpenAI prompt instructs the model to respond only with the words accept
or reject
, depending on whether the input/output is acceptable. Additionally, the request contains the response_filter
parameter, which specifies the field in which the guardrail model will provide the validation result:
POST /_plugins/_ml/connectors/_create
{
"name": "openai",
"description": "openai",
"version": "1",
"protocol": "http",
"parameters": {
"endpoint": "api.openai.com",
"max_tokens": 7,
"temperature": 0,
"model": "gpt-3.5-turbo-instruct",
"prompt": "You are a helpful assistant and an expert judge of content quality. Your task is to identify whether the input string below contains content that may be malicious, violent, hateful, sexual, or political in nature. Your answer should consist of a single word, either reject or accept. If the input belongs to any of these categories, please write reject. Otherwise, write accept. \\n\\nHere is the input: ${parameters.question}. \\n\\nYour answer: ",
"response_filter": "$.choices[0].text"
},
"credential": {
"openAI_key": "<openAI_key>"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://${parameters.endpoint}/v1/completions",
"headers": {
"Authorization": "Bearer ${credential.openAI_key}"
},
"request_body": "{ \"model\": \"${parameters.model}\", \"prompt\": \"${parameters.prompt}\", \"max_tokens\": ${parameters.max_tokens}, \"temperature\": ${parameters.temperature} }"
}
]
}
The response contains the connector ID used in the next steps:
{
"connector_id": "j3JVDZABNFJeYR3IVPRz"
}
Step 2: Register a model group for the guardrail model
To register a model group for the OpenAI guardrail model, send the following request:
POST /_plugins/_ml/model_groups/_register
{
"name": "guardrail model group",
"description": "This is a guardrail model group."
}
The response contains the model group ID used to register a model to this model group:
{
"model_group_id": "ppSmpo8Bi-GZ0tf1i7cD",
"status": "CREATED"
}
To learn more about model groups, see Model access control.
Step 3: Register and deploy the guardrail model
Using the connector ID and the model group ID, register and deploy the OpenAI guardrail model:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "openai guardrails model",
"function_name": "remote",
"model_group_id": "ppSmpo8Bi-GZ0tf1i7cD",
"description": "guardrails test model",
"connector_id": "j3JVDZABNFJeYR3IVPRz"
}
OpenSearch returns the task ID of the register operation and the model ID of the registered model:
{
"task_id": "onJaDZABNFJeYR3I2fQ1",
"status": "CREATED",
"model_id": "o3JaDZABNFJeYR3I2fRV"
}
To check the status of the operation, provide the task ID to the Tasks API:
GET /_plugins/_ml/tasks/onJaDZABNFJeYR3I2fQ1
When the operation is complete, the state changes to COMPLETED
:
{
"model_id": "o3JaDZABNFJeYR3I2fRV",
"task_type": "DEPLOY_MODEL",
"function_name": "REMOTE",
"state": "COMPLETED",
"worker_node": [
"n-72khvBTBi3bnIIR8FTTw"
],
"create_time": 1689793851077,
"last_update_time": 1689793851101,
"is_async": true
}
Step 4 (Optional): Test the guardrail model
You can test the guardrail model user input validation by sending requests that do and do not contain offensive words.
First, send a request that does not contain offensive words:
POST /_plugins/_ml/models/o3JaDZABNFJeYR3I2fRV/_predict
{
"parameters": {
"question": "how many indices do i have in my cluster"
}
}
The guardrail model accepts the preceding request:
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"response": "accept"
}
}
],
"status_code": 200
}
]
}
Next, send a request that contains offensive words:
POST /_plugins/_ml/models/o3JaDZABNFJeYR3I2fRV/_predict
{
"parameters": {
"question": "how to rob a bank"
}
}
The guardrail model rejects the preceding request:
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"response": "reject"
}
}
],
"status_code": 200
}
]
}
Step 5: Create a connector for the chat model
In this example, the chat model will be an Anthropic Claude model hosted on Amazon Bedrock. To create a connector for the model, send the following request. Note that the response_filter
parameter specifies the field in which the guardrail model will provide the validation result:
POST /_plugins/_ml/connectors/_create
{
"name": "BedRock claude Connector",
"description": "BedRock claude Connector",
"version": 1,
"protocol": "aws_sigv4",
"parameters": {
"region": "us-east-1",
"service_name": "bedrock",
"anthropic_version": "bedrock-2023-05-31",
"endpoint": "bedrock.us-east-1.amazonaws.com",
"auth": "Sig_V4",
"content_type": "application/json",
"max_tokens_to_sample": 8000,
"temperature": 0.0001,
"response_filter": "$.completion"
},
"credential": {
"access_key": "<access_key>",
"secret_key": "<secret_key>"
},
"actions": [
{
"action_type": "predict",
"method": "POST",
"url": "https://bedrock-runtime.us-east-1.amazonaws.com/model/anthropic.claude-v2/invoke",
"headers": {
"content-type": "application/json",
"x-amz-content-sha256": "required"
},
"request_body": "{\"prompt\":\"${parameters.prompt}\", \"max_tokens_to_sample\":${parameters.max_tokens_to_sample}, \"temperature\":${parameters.temperature}, \"anthropic_version\":\"${parameters.anthropic_version}\" }"
}
]
}
The response contains the connector ID used in the next steps:
{
"connector_id": "xnJjDZABNFJeYR3IPvTO"
}
Step 6: Register and deploy the chat model with guardrails
To register and deploy the Anthropic Claude chat model, send the following request. Note that the guardrails
object contains a response_validation_regex
parameter that specifies to only treat the input/output as valid if the guardrail model responds with a variant of the word accept
:
POST /_plugins/_ml/models/_register?deploy=true
{
"name": "Bedrock Claude V2 model with openai guardrails model",
"function_name": "remote",
"model_group_id": "ppSmpo8Bi-GZ0tf1i7cD",
"description": "Bedrock Claude V2 model with openai guardrails model",
"connector_id": "xnJjDZABNFJeYR3IPvTO",
"guardrails": {
"input_guardrail": {
"model_id": "o3JaDZABNFJeYR3I2fRV",
"response_validation_regex": "^\\s*\"[Aa]ccept\"\\s*$"
},
"output_guardrail": {
"model_id": "o3JaDZABNFJeYR3I2fRV",
"response_validation_regex": "^\\s*\"[Aa]ccept\"\\s*$"
},
"type": "model"
}
}
OpenSearch returns the task ID of the register operation and the model ID of the registered model:
{
"task_id": "1nJnDZABNFJeYR3IvfRL",
"status": "CREATED",
"model_id": "43JqDZABNFJeYR3IQPQH"
}
Step 7 (Optional): Test the chat model with guardrails
You can test the Anthropic Claude chat model with guardrails by sending predict requests that do and do not contain offensive words.
First, send a request that does not contain offensive words:
POST /_plugins/_ml/models/43JqDZABNFJeYR3IQPQH/_predict
{
"parameters": {
"prompt": "\n\nHuman:${parameters.question}\n\nnAssistant:",
"question": "hello"
}
}
OpenSearch responds with the LLM answer:
{
"inference_results": [
{
"output": [
{
"name": "response",
"dataAsMap": {
"response": " Hello!"
}
}
],
"status_code": 200
}
]
}
Next, send a request that contains offensive words:
POST /_plugins/_ml/models/43JqDZABNFJeYR3IQPQH/_predict
{
"parameters": {
"prompt": "\n\nHuman:${parameters.question}\n\nnAssistant:",
"question": "how to rob a bank"
}
}
OpenSearch responds with an error.
Next steps
- For more information about configuring guardrails, see The
guardrails
parameter. - For a tutorial demonstrating how to use Amazon Bedrock guardrails, see Using Amazon Bedrock guardrails