Introducing OpenSearch 2.12

Tue, Feb 20, 2024 · James McIntyre

OpenSearch 2.12 is ready to download! This release increases performance for search and analytics applications and includes user experience enhancements and an array of upgrades to OpenSearch’s machine learning (ML) toolkit. The release also features OpenSearch’s integration with Apache Spark, unlocking new ways to analyze your operational data. You can find a comprehensive view of what’s new in the 2.12 release notes. To explore the latest release using OpenSearch’s visualization tools, check out OpenSearch Playground.

Analyze operational data with Apache Spark integration

The OpenSearch community makes great use of OpenSearch’s rich set of analytics tools, such as dashboards, anomaly detection, and geospatial functionality. However, due to cost constraints, users typically only ingest their primary, more frequently accessed operational and security data into OpenSearch while leaving their secondary, less frequently accessed data in object stores like Amazon S3, Azure Blob Storage, and others. When users need to analyze their secondary data, they are left to either manually ingest it into OpenSearch for temporary analysis or use alternative, ad hoc query tools that don’t provide the full set of OpenSearch features. Now, with our new integration with Apache Spark, users can analyze all of their operational data using OpenSearch in combination with Apache Spark in a single place. This is a first step toward integrating OpenSearch with third-party sources of analytics data, and we hope that those who explore this functionality will share their feedback with the community.

Enhance search and analytics performance with new functionality

The OpenSearch community has delivered significant performance improvements since the launch of the project. This release continues that trend with a number of updates designed to enhance performance for search applications:

  • Introduced as an experimental feature in OpenSearch 2.10, concurrent segment search is generally available as of this release. By default, OpenSearch executes search requests sequentially across all segments on each shard. Now users have the option to query index segments in parallel at the shard level. This offers improved latency for different kinds of search queries, such as long-running requests that contain aggregations or large ranges. There are several ways to enable concurrent segment search, depending on installation type; for more information, please refer to the documentation.
  • Date histogram aggregations without sub-aggregations can now be transformed into and executed as range filters, delivering a remarkable 10–50 times speed boost on the nyc_taxis benchmark. This optimization is also applied to auto date histograms and composite aggregations using a single date histogram source.
  • Multi-terms aggregations are now significantly faster for high-cardinality search terms, offering improved performance for many prefix and wildcard queries. Composite aggregation is now supported within filters.
  • The new match-only text field type, a variant of the text field, can help reduce storage costs while maintaining term and multi-term query performance. On one of the OpenSearch benchmark workloads for full-text searches, replacing the text field with match_only_text yielded reductions of up to 20 percent in the amount of storage required.
  • Queries using the keyword, numeric, and IP field types now also offer improved performance. These field types are now searchable using doc_values queries. Although the query performance for doc_values is slightly lower than for queries using an index structure, this type of query can decrease storage requirements for rarely accessed fields. Term matching for numeric searches is faster when terms are indexed and doc_values are enabled.
  • Another contributor to improved performance comes in the form of an upgrade to Lucene version 9.9. The latest minor version of Lucene introduces efficiency enhancements for a range of search queries through active optimizations. The enhancements to Lucene’s query performance can be observed in its nightly benchmarks, designed to reflect real-world query scenarios.

Get real-time insights into query performance

In addition to making search more performant, this release introduces tracking of high-latency queries with the top N queries feature. A simple OpenSearch API (GET /_insights/top_queries) fetches recent slow queries. Future releases will add more types of performance metrics, such as CPU and JVM usage metrics.

Build ML applications with an expanded set of tools

Last year included a long list of advancements in ML within OpenSearch. The first release of 2024 brings a host of powerful capabilities designed to make building ML applications with OpenSearch easier and more straightforward:

  • Enhance the search experience with conversational search: First released as experimental functionality in OpenSearch 2.11, conversational search transforms OpenSearch’s lexical, vector, and hybrid search features into conversational experiences without requiring custom middleware. It enables users to search through a series of interactions such as “what is OpenSearch” and “how do I use it with GenAI”. It includes a retrieval-augmented generation (RAG) search pipeline that uses AI connectors to send information to generative large language models (LLMs) like ChatGPT and Claude2. The RAG pipeline processes a query by retrieving knowledge articles from your indexes and sending them to a generative LLM to elicit a conversational response. This method grounds the generative LLM in facts to minimize hallucinations, which can yield misinformation. Conversation history is also tracked and included in the request context sent to generative LLMs, providing them with long-term memory for ongoing conversations. Conversational search offers a more engaging search experience and can improve information retrieval performance.
  • Build semantic search applications with default data processors for Amazon Bedrock: Currently, the primary use case for AI connectors is powering semantic search on the neural search architecture, where the job of the connector is to provide secure data exchange between the external model and the neural search architecture. The request and response format of the integrated model or service varies, so data pre- and post-processing must be implemented accordingly. To reduce the effort required to build AI connectors, we’ve been providing default pre- and post-processors for data exchange between text embeddings and the neural search architecture to minimize the need for custom pre- and post-processors. In 2.12, we have released default processors for Amazon Bedrock text embedding connectors. You can learn how to build AI connectors in this blog post.
  • Simplify vector search applications with built-in chunking through nested vector fields: Users can now represent long documents as multiple vectors in a nested field. When users run k-NN queries, each nested field is treated as a single vector (an encoded long document). Previously, users had to implement custom processing logic in their application to support the querying of documents represented as vector chunks. With this feature, users can simply run k-NN queries, making it easier for builders to create vector search applications.

Explore improvements to the Discover experience

In OpenSearch 2.10, we released an updated version of the Discover tool in OpenSearch Dashboards, which included user interface changes and the removal of outdated dependencies. The goal of these improvements was to deliver a more intuitive and cohesive tool. In OpenSearch 2.11, we removed the option to revert back to the older version of Discover. After receiving feedback from the community about usability gaps within the new Discover experience, we have made improvements for the 2.12 release, including updates to density, column order, and sorting controls. We also brought back elements of the previous experience that users said they missed, including improvements to the table interactions and the ability to expand and collapse the field selection bar. In 2.12, we are reintroducing the ability to choose between the previously implemented Discover experience (in line with OpenSearch 2.9) and the new experience. This will allow for more time to gather feedback and make improvements before moving to the new experience in a future release. We look forward to your feedback on the new Discover experience.

Experimental features

OpenSearch 2.12 includes experimental features designed to allow users to preview new tools before they are generally available. Experimental features should not be used in a production environment.

  • Build interactive experiences powered by generative AI: The OpenSearch Assistant Toolkit is now available as an experimental feature in OpenSearch 2.12. Released last November as a standalone set of ML skills and frameworks available for download, this toolkit helps developers build generative AI experiences in OpenSearch Dashboards. With integrated natural language processing and context-aware features, developers can apply generative AI to create interactive user experiences and extract insights from OpenSearch data. The toolkit incorporates several updates from the November release, including a new experimental flow framework designed to simplify development of complex workflows, such as RAG, with JSON templates. You can explore the experiences this toolkit supports by logging in to OpenSearch Playground and asking questions.
  • Simplify use of LLMs with a new agent framework and built-in tooling: The OpenSearch Assistant Toolkit also incorporates a new experimental agent framework added to ML Commons. This framework uses remote LLMs for step-by-step problem-solving and can coordinate ML tools using LLMs. Designed to support different types of agents, this flexible framework now includes a flow agent and a conversational agent. Also added to ML Commons in this release is a set of built-in tools; users are also enabled to build their own tools.
  • Automate configurations for ML Commons resources: A new experimental workflow engine allows users to automate configurations for OpenSearch resources. The initial release supports configuring ML Commons resources like models, connectors, and agent tools. Previously, users had to manually create resources or write custom scripts to assemble and manage the resources to support various AI use cases on OpenSearch. This feature lets you define workflows to automate configurations and package them as portable templates that can be reused on any OpenSearch cluster. To learn more, refer to this tutorial, which provides an example of a workflow for creating model, connector, and agent tool resources to support generative AI agents.
  • Query multiple clusters with cross-cluster monitors: Currently, users are limited when performing a cross-cluster search call with a monitor through the Alerting plugin because they do not have the necessary permissions to access remote clusters. With this experimental feature, users can now perform alerting monitor searches seamlessly across multiple clusters, including remote clusters, as well as create monitors based on a remote cluster index. This allows users to better monitor observability data across a range of data sources and indexes.

Get started with OpenSearch 2.12

You can access the latest version of OpenSearch on our downloads page. There’s more to learn about OpenSearch’s new and enhanced functionality in the release notes, documentation release notes, and documentation, and OpenSearch Playground is a great way to explore the latest visualizations. As always, we invite your feedback on this release on the community forum!