RAG chatbot
One of the known limitations of large language models (LLMs) is that their knowledge base only contains information from the period of time during which they were trained. LLMs have no knowledge of recent events or of your internal data. You can augment the LLM knowledge base by using retrieval-augmented generation (RAG).
This tutorial illustrates how to build your own chatbot using agents and tools and RAG. RAG supplements the LLM knowledge base with information contained in OpenSearch indexes.
Replace the placeholders beginning with the prefix your_
with your own values.
Prerequisite
Meet the prerequisite and follow Step 1 of the RAG with a conversational flow agent tutorial to set up the test_population_data
knowledge base index, which contains US city population data.
Note the embedding model ID; you’ll use it in the following steps.
Step 1: Set up a knowledge base
First, create an ingest pipeline:
PUT /_ingest/pipeline/test_tech_news_pipeline
{
"description": "text embedding pipeline for tech news",
"processors": [
{
"text_embedding": {
"model_id": "your_text_embedding_model_id",
"field_map": {
"passage": "passage_embedding"
}
}
}
]
}
Next, create an index named test_tech_news
, which contains recent tech news:
PUT test_tech_news
{
"mappings": {
"properties": {
"passage": {
"type": "text"
},
"passage_embedding": {
"type": "knn_vector",
"dimension": 384
}
}
},
"settings": {
"index": {
"knn.space_type": "cosinesimil",
"default_pipeline": "test_tech_news_pipeline",
"knn": "true"
}
}
}
Ingest data into the index:
POST _bulk
{"index":{"_index":"test_tech_news"}}
{"c":"Apple Vision Pro is a mixed-reality headset developed by Apple Inc. It was announced on June 5, 2023, at Apple's Worldwide Developers Conference, and pre-orders began on January 19, 2024. It became available for purchase on February 2, 2024, in the United States.[10] A worldwide launch has yet to be scheduled. The Vision Pro is Apple's first new major product category since the release of the Apple Watch in 2015.[11]\n\nApple markets the Vision Pro as a \"spatial computer\" where digital media is integrated with the real world. Physical inputs—such as motion gestures, eye tracking, and speech recognition—can be used to interact with the system.[10] Apple has avoided marketing the device as a virtual reality headset, along with the use of the terms \"virtual reality\" and \"augmented reality\" when discussing the product in presentations and marketing.[12]\n\nThe device runs visionOS,[13] a mixed-reality operating system derived from iOS frameworks using a 3D user interface; it supports multitasking via windows that appear to float within the user's surroundings,[14] as seen by cameras built into the headset. A dial on the top of the headset can be used to mask the camera feed with a virtual environment to increase immersion. The OS supports avatars (officially called \"Personas\"), which are generated by scanning the user's face; a screen on the front of the headset displays a rendering of the avatar's eyes (\"EyeSight\"), which are used to indicate the user's level of immersion to bystanders, and assist in communication.[15]"}
{"index":{"_index":"test_tech_news"}}
{"passage":"LLaMA (Large Language Model Meta AI) is a family of autoregressive large language models (LLMs), released by Meta AI starting in February 2023.\n\nFor the first version of LLaMA, four model sizes were trained: 7, 13, 33, and 65 billion parameters. LLaMA's developers reported that the 13B parameter model's performance on most NLP benchmarks exceeded that of the much larger GPT-3 (with 175B parameters) and that the largest model was competitive with state of the art models such as PaLM and Chinchilla.[1] Whereas the most powerful LLMs have generally been accessible only through limited APIs (if at all), Meta released LLaMA's model weights to the research community under a noncommercial license.[2] Within a week of LLaMA's release, its weights were leaked to the public on 4chan via BitTorrent.[3]\n\nIn July 2023, Meta released several models as Llama 2, using 7, 13 and 70 billion parameters.\n\nLLaMA-2\n\nOn July 18, 2023, in partnership with Microsoft, Meta announced LLaMA-2, the next generation of LLaMA. Meta trained and released LLaMA-2 in three model sizes: 7, 13, and 70 billion parameters.[4] The model architecture remains largely unchanged from that of LLaMA-1 models, but 40% more data was used to train the foundational models.[5] The accompanying preprint[5] also mentions a model with 34B parameters that might be released in the future upon satisfying safety targets.\n\nLLaMA-2 includes both foundational models and models fine-tuned for dialog, called LLaMA-2 Chat. In further departure from LLaMA-1, all models are released with weights, and are free for many commercial use cases. However, due to some remaining restrictions, the description of LLaMA as open source has been disputed by the Open Source Initiative (known for maintaining the Open Source Definition).[6]\n\nIn November 2023, research conducted by Patronus AI, an artificial intelligence startup company, compared performance of LLaMA-2, OpenAI's GPT-4 and GPT-4-Turbo, and Anthropic's Claude2 on two versions of a 150-question test about information in SEC filings (e.g. Form 10-K, Form 10-Q, Form 8-K, earnings reports, earnings call transcripts) submitted by public companies to the agency where one version of the test required the generative AI models to use a retrieval system to locate the specific SEC filing to answer the questions while the other version provided the specific SEC filing to the models to answer the question (i.e. in a long context window). On the retrieval system version, GPT-4-Turbo and LLaMA-2 both failed to produce correct answers to 81% of the questions, while on the long context window version, GPT-4-Turbo and Claude-2 failed to produce correct answers to 21% and 24% of the questions respectively.[7][8]"}
{"index":{"_index":"test_tech_news"}}
{"passage":"Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don't have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with."}
Step 2: Prepare an LLM
Follow step 2 of the RAG with a conversational flow agent tutorial to configure the Amazon Bedrock Claude model.
Note the model ID; you’ll use it in the following steps.
Step 3: Create an agent
For this tutorial, you will create an agent of the conversational
type.
Both the conversational_flow
and conversational
agents support conversation history.
The conversational_flow
and conversational
agents differ in the following ways:
- A
conversational_flow
agent runs tools sequentially, in a predefined order. - A
conversational
agent dynamically chooses which tool to run next.
In this tutorial, the agent includes two tools: One provides recent population data, and the other contains tech news.
The agent has the following parameters:
"max_iteration": 5
: The agent runs the LLM a maximum of five times."response_filter": "$.completion"
: Needed to retrieve the LLM answer from the Amazon Bedrock Claude model response."doc_size": 3
(inpopulation_data_knowledge_base
): Specifies to return the top three documents.
Create an agent with the preceding specifications:
POST _plugins/_ml/agents/_register
{
"name": "Chat Agent with RAG",
"type": "conversational",
"description": "this is a test agent",
"llm": {
"model_id": "your_llm_model_id",
"parameters": {
"max_iteration": 5,
"response_filter": "$.completion"
}
},
"memory": {
"type": "conversation_index"
},
"tools": [
{
"type": "VectorDBTool",
"name": "population_data_knowledge_base",
"description": "This tool provides population data of US cities.",
"parameters": {
"input": "${parameters.question}",
"index": "test_population_data",
"source_field": [
"population_description"
],
"model_id": "your_text_embedding_model_id",
"embedding_field": "population_description_embedding",
"doc_size": 3
}
},
{
"type": "VectorDBTool",
"name": "tech_news_knowledge_base",
"description": "This tool provides recent tech news.",
"parameters": {
"input": "${parameters.question}",
"index": "test_tech_news",
"source_field": [
"passage"
],
"model_id": "your_text_embedding_model_id",
"embedding_field": "passage_embedding",
"doc_size": 2
}
}
],
"app_type": "chat_with_rag"
}
Note the agent ID; you’ll use it in the next step.
Step 4: Test the agent
The conversational
agent supports a verbose
option. You can set verbose
to true
to obtain detailed steps.
Alternatively, you can call the Get Message Traces API:
GET _plugins/_ml/memory/message/message_id/traces
Start a conversation
Ask a question related to tech news:
POST _plugins/_ml/agents/your_agent_id/_execute
{
"parameters": {
"question": "What's vision pro",
"verbose": true
}
}
In the response, note that the agent runs the tech_news_knowledge_base
tool to obtain the top two documents. The agent then passes these documents as context to the LLM. The LLM uses the context to produce the answer:
{
"inference_results": [
{
"output": [
{
"name": "memory_id",
"result": "eLVSxI0B8vrNLhb9nxto"
},
{
"name": "parent_interaction_id",
"result": "ebVSxI0B8vrNLhb9nxty"
},
{
"name": "response",
"result": """{
"thought": "I don't have enough context to answer the question directly. Let me check the tech_news_knowledge_base tool to see if it can provide more information.",
"action": "tech_news_knowledge_base",
"action_input": "{\"query\":\"What's vision pro\"}"
}"""
},
{
"name": "response",
"result": """{"_index":"test_tech_news","_source":{"passage":"Apple Vision Pro is a mixed-reality headset developed by Apple Inc. It was announced on June 5, 2023, at Apple\u0027s Worldwide Developers Conference, and pre-orders began on January 19, 2024. It became available for purchase on February 2, 2024, in the United States.[10] A worldwide launch has yet to be scheduled. The Vision Pro is Apple\u0027s first new major product category since the release of the Apple Watch in 2015.[11]\n\nApple markets the Vision Pro as a \"spatial computer\" where digital media is integrated with the real world. Physical inputs—such as motion gestures, eye tracking, and speech recognition—can be used to interact with the system.[10] Apple has avoided marketing the device as a virtual reality headset, along with the use of the terms \"virtual reality\" and \"augmented reality\" when discussing the product in presentations and marketing.[12]\n\nThe device runs visionOS,[13] a mixed-reality operating system derived from iOS frameworks using a 3D user interface; it supports multitasking via windows that appear to float within the user\u0027s surroundings,[14] as seen by cameras built into the headset. A dial on the top of the headset can be used to mask the camera feed with a virtual environment to increase immersion. The OS supports avatars (officially called \"Personas\"), which are generated by scanning the user\u0027s face; a screen on the front of the headset displays a rendering of the avatar\u0027s eyes (\"EyeSight\"), which are used to indicate the user\u0027s level of immersion to bystanders, and assist in communication.[15]"},"_id":"lrU8xI0B8vrNLhb9yBpV","_score":0.6700683}
{"_index":"test_tech_news","_source":{"passage":"Amazon Bedrock is a fully managed service that offers a choice of high-performing foundation models (FMs) from leading AI companies like AI21 Labs, Anthropic, Cohere, Meta, Stability AI, and Amazon via a single API, along with a broad set of capabilities you need to build generative AI applications with security, privacy, and responsible AI. Using Amazon Bedrock, you can easily experiment with and evaluate top FMs for your use case, privately customize them with your data using techniques such as fine-tuning and Retrieval Augmented Generation (RAG), and build agents that execute tasks using your enterprise systems and data sources. Since Amazon Bedrock is serverless, you don\u0027t have to manage any infrastructure, and you can securely integrate and deploy generative AI capabilities into your applications using the AWS services you are already familiar with."},"_id":"mLU8xI0B8vrNLhb9yBpV","_score":0.5604863}
"""
},
{
"name": "response",
"result": "Vision Pro is a mixed-reality headset developed by Apple that was announced in 2023. It uses cameras and sensors to overlay digital objects and information on the real world. The device runs an operating system called visionOS that allows users to interact with windows and apps in a 3D environment using gestures, eye tracking, and voice commands."
}
]
}
]
}
You can trace the detailed steps by using the Get Traces API:
GET _plugins/_ml/memory/message/ebVSxI0B8vrNLhb9nxty/traces
Ask a question related to the population data:
POST _plugins/_ml/agents/your_agent_id/_execute
{
"parameters": {
"question": "What's the population of Seattle 2023",
"verbose": true
}
}
In the response, note that the agent runs the population_data_knowledge_base
tool to obtain the top three documents. The agent then passes these documents as context to the LLM. The LLM uses the context to produce the answer:
{
"inference_results": [
{
"output": [
{
"name": "memory_id",
"result": "l7VUxI0B8vrNLhb9sRuQ"
},
{
"name": "parent_interaction_id",
"result": "mLVUxI0B8vrNLhb9sRub"
},
{
"name": "response",
"result": """{
"thought": "Let me check the population data tool to find the most recent population estimate for Seattle",
"action": "population_data_knowledge_base",
"action_input": "{\"city\":\"Seattle\"}"
}"""
},
{
"name": "response",
"result": """{"_index":"test_population_data","_source":{"population_description":"Chart and table of population level and growth rate for the Seattle metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Seattle in 2023 is 3,519,000, a 0.86% increase from 2022.\\nThe metro area population of Seattle in 2022 was 3,489,000, a 0.81% increase from 2021.\\nThe metro area population of Seattle in 2021 was 3,461,000, a 0.82% increase from 2020.\\nThe metro area population of Seattle in 2020 was 3,433,000, a 0.79% increase from 2019."},"_id":"BxF5vo0BubpYKX5ER0fT","_score":0.65775126}
{"_index":"test_population_data","_source":{"population_description":"Chart and table of population level and growth rate for the Seattle metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Seattle in 2023 is 3,519,000, a 0.86% increase from 2022.\\nThe metro area population of Seattle in 2022 was 3,489,000, a 0.81% increase from 2021.\\nThe metro area population of Seattle in 2021 was 3,461,000, a 0.82% increase from 2020.\\nThe metro area population of Seattle in 2020 was 3,433,000, a 0.79% increase from 2019."},"_id":"7DrZvo0BVR2NrurbRIAE","_score":0.65775126}
{"_index":"test_population_data","_source":{"population_description":"Chart and table of population level and growth rate for the New York City metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of New York City in 2023 is 18,937,000, a 0.37% increase from 2022.\\nThe metro area population of New York City in 2022 was 18,867,000, a 0.23% increase from 2021.\\nThe metro area population of New York City in 2021 was 18,823,000, a 0.1% increase from 2020.\\nThe metro area population of New York City in 2020 was 18,804,000, a 0.01% decline from 2019."},"_id":"AxF5vo0BubpYKX5ER0fT","_score":0.56461215}
"""
},
{
"name": "response",
"result": "According to the population data tool, the population of Seattle in 2023 is approximately 3,519,000 people, a 0.86% increase from 2022."
}
]
}
]
}
Continue a conversation
To continue a previous conversation, provide its conversation ID in the memory_id
parameter:
POST _plugins/_ml/agents/your_agent_id/_execute
{
"parameters": {
"question": "What's the population of Austin 2023, compared with Seattle",
"memory_id": "l7VUxI0B8vrNLhb9sRuQ",
"verbose": true
}
}
In the response, note that the population_data_knowledge_base
doesn’t return the population of Seattle. Instead, the agent learns the population of Seattle by referencing historical messages:
{
"inference_results": [
{
"output": [
{
"name": "memory_id",
"result": "l7VUxI0B8vrNLhb9sRuQ"
},
{
"name": "parent_interaction_id",
"result": "B7VkxI0B8vrNLhb9mxy0"
},
{
"name": "response",
"result": """{
"thought": "Let me check the population data tool first",
"action": "population_data_knowledge_base",
"action_input": "{\"city\":\"Austin\",\"year\":2023}"
}"""
},
{
"name": "response",
"result": """{"_index":"test_population_data","_source":{"population_description":"Chart and table of population level and growth rate for the Austin metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Austin in 2023 is 2,228,000, a 2.39% increase from 2022.\\nThe metro area population of Austin in 2022 was 2,176,000, a 2.79% increase from 2021.\\nThe metro area population of Austin in 2021 was 2,117,000, a 3.12% increase from 2020.\\nThe metro area population of Austin in 2020 was 2,053,000, a 3.43% increase from 2019."},"_id":"BhF5vo0BubpYKX5ER0fT","_score":0.69129956}
{"_index":"test_population_data","_source":{"population_description":"Chart and table of population level and growth rate for the Austin metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of Austin in 2023 is 2,228,000, a 2.39% increase from 2022.\\nThe metro area population of Austin in 2022 was 2,176,000, a 2.79% increase from 2021.\\nThe metro area population of Austin in 2021 was 2,117,000, a 3.12% increase from 2020.\\nThe metro area population of Austin in 2020 was 2,053,000, a 3.43% increase from 2019."},"_id":"6zrZvo0BVR2NrurbRIAE","_score":0.69129956}
{"_index":"test_population_data","_source":{"population_description":"Chart and table of population level and growth rate for the New York City metro area from 1950 to 2023. United Nations population projections are also included through the year 2035.\\nThe current metro area population of New York City in 2023 is 18,937,000, a 0.37% increase from 2022.\\nThe metro area population of New York City in 2022 was 18,867,000, a 0.23% increase from 2021.\\nThe metro area population of New York City in 2021 was 18,823,000, a 0.1% increase from 2020.\\nThe metro area population of New York City in 2020 was 18,804,000, a 0.01% decline from 2019."},"_id":"AxF5vo0BubpYKX5ER0fT","_score":0.61015373}
"""
},
{
"name": "response",
"result": "According to the population data tool, the population of Austin in 2023 is approximately 2,228,000 people, a 2.39% increase from 2022. This is lower than the population of Seattle in 2023 which is approximately 3,519,000 people, a 0.86% increase from 2022."
}
]
}
]
}