Using OpenSearch as a Vector Database

An open-source, all-in-one vector database for building flexible, scalable, and future-proof AI applications

Traditional lexical search, based on term frequency models like BM25, is widely used and effective for many search applications. However, lexical search techniques require significant investment in time and expertise to tune them to account for the meaning or relevance of the terms searched.  Today, more and more developers want to embed semantic understanding into their search applications. Enter machine learning embedding models that can encode the meaning and context of documents, images, and audio into vectors for similarity search. These embedded meanings can, in turn, be searched using the k-nearest neighbors (k-NN) functionality provided by OpenSearch. 

Using OpenSearch as a vector database brings together the power of traditional search, analytics, and vector search in one complete package. OpenSearch’s vector database capabilities can accelerate artificial intelligence (AI) application development by reducing the effort for builders to operationalize, manage, and integrate AI-generated assets. Bring your models, vectors, and metadata into OpenSearch to power vector, lexical, and hybrid search and analytics, with performance and scalability built in.

What is a vector database?

Information comes in many forms: unstructured data, like text documents, rich media, and audio, and structured data, like geospatial coordinates, tables, and graphs. Innovations in AI have enabled the use of models, or embeddings, to encode all types of data into vectors. These vectors are data points in a high-dimensional space that capture the meaning and context of an asset, allowing search tools to find similar assets by searching for neighboring data points.

Vector databases allow you to store and index vectors and metadata, unlocking the ability to use low-latency queries to discover assets by degree of similarity. Typically powered by k-NN indexes built using algorithms like Hierarchical Navigable Small Worlds (HNSW) and Inverted File (IVF) System, vector databases augment k-NN functionality by providing a foundation for applications like data management, fault tolerance, resource access controls, and a query engine.

OpenSearch provides an integrated  vector database that can support AI systems by serving as a knowledge base. This benefits AI applications like generative AI and natural language search by providing a long-term memory of AI-generated outputs. These outputs can be used to enhance information retrieval and analytics, improve efficiency and stability, and give generative AI models a broader and deeper pool of data from which to draw more accurate and truthful responses to queries.

Trusted in production
Power AI applications on a mature search and analytics engine trusted in production by tens of thousands of users.
Proven at scale
Build stable applications with a data platform proven to scale to up to tens of billions of vectors, with low latency and high availability.
Open and flexible
Choose open-source tools and take advantage of integrations with popular open frameworks, plus the option to use managed services from major cloud providers.
Build for the future
Future-proof your AI applications with vector, lexical, and hybrid search, analytics, and observability capabilities, all in one software suite.

Vector Database Use Cases

OpenSearch as a vector database supports a range of applications. Following are a few examples of solutions you can build.

Search
Visual search Create applications that allow users to take a photograph and search for similar images without having to manually tag images.
Semantic search Enhance search relevancy by powering vector search with text embedding models that capture semantic meaning and use hybrid scoring to blend term frequency models (BM25) for improved results.
Multimodal search Use state-of-the-art models that can fuse and encode text, image, and audio inputs to generate more accurate digital fingerprints of rich media and enable more relevant search and insights.
Generative AI agents Build intelligent agents with the power of generative AI while minimizing hallucinations by using OpenSearch to power retrieval augmented generation (RAG) workflows with large language models (LLMs). (Whether you refer to them as chatbots, automated conversation entities, question answering bots, or something else, OpenSearch’s vector database functionality can help them deliver better results).
Personalization
Recommendation engine Generate product and user embeddings using collaborative filtering techniques and use OpenSearch to power your recommendation engine.
User-level content targeting Personalize web pages by using OpenSearch to retrieve content ranked by user propensities using embeddings trained on user interactions.
Data Quality
Automate pattern matching and de-duplication Use similarity search for automating pattern matching and duplicates in data to facilitate data quality processes.
Vector database engine
Data and machine learning platforms Build your platform with an integrated, Apache 2.0-licensed vector database that provides a reliable and scalable solution to operationalize embeddings and power vector search.

Getting Started

You can begin exploring OpenSearch's vector database functionality by downloading your preferred distribution. To learn more or start a discussion, join the Slack channel or check out our user forum and follow our blog for the latest on OpenSearch tools.

Resources

Following are links to documents from users, application developers, and other members of the OpenSearch community that explore the ways OpenSearch can be deployed as a vector database solution.