OpenSearch Project Roadmap 2024–2025

Thu, Sep 12, 2024 · Pallavi Priyadarshini, Bryan Burkholder, Jon Handler, Yupeng Fu

OpenSearch is an open-source product suite comprising a search engine, an ingestion system, language clients, and a user interface for analytics. Our goal at the OpenSearch Project is to make OpenSearch the preferred open-source solution for search, vector databases, log analytics, and security analytics and to establish it as the preferred backend for generative AI applications. OpenSearch contributors and maintainers are innovating in all these areas at a fast pace. With more than 1,400 unique contributors working across 110+ public GitHub repositories on a daily basis, OpenSearch is a rapidly growing open-source project.

To steer the project’s development effectively, we have revamped the project roadmap to provide better transparency into both short- and long-term enhancements. This will help the community provide feedback more easily, assist with prioritization, foster collaboration, and ensure that contributor efforts align with the community’s needs. To achieve this, the OpenSearch Project recently introduced a new public process for developing a theme-based, community-driven OpenSearch roadmap board, which we are excited to share today. The roadmap board will provide the community with visibility into the project’s high-level technological direction and will facilitate the sharing of feedback.

In this blog post, we will outline the OpenSearch roadmap for 2024–2025, focusing on the key areas that foster innovation among OpenSearch contributors. These innovation areas are categorized into the following nine main themes:

  1. Vector Database and Generative AI
  2. Search
  3. Ease of Use
  4. Observability, Log Analytics, and Security Analytics
  5. Cost, Performance, and Scalability
  6. Stability, Availability, and Resiliency
  7. Security
  8. Modular Architecture
  9. Releases and Project Health

In the rest of this post, we will first summarize the key innovation areas in the context of the roadmap themes. For readers interested in a comprehensive understanding, we have a section dedicated to each theme containing information about key innovations and links to the relevant GitHub RFCs/METAs for the features.

Roadmap summary

As a technology, OpenSearch innovates in three main areas: search, streaming data, and vectors. Search use cases employ lexical and semantic means to match end user queries to the catalog of information, stored in indexes, that drives your application. Streaming data includes a wide range of real-time data types, such as raw log data, observability trace data, security event data, metric data, and other event data like Internet of Things (IoT) events. Vector data includes the outputs of embedding-generating large language models (LLMs), vectors produced by machine learning (ML) models, and encodings of media like audio and video.

OpenSearch’s roadmap is aligned vertically in some cases and horizontally in others, depending on the workloads it supports. Features relevant to search workloads are described in Theme 1 and Theme 2. Features relevant to vector workloads are described in Theme 1. Features relevant to streaming data workloads are described in Theme 4. Features relevant to all three workload types are described in Theme 3, and Themes 5–9.

Theme 1 (Vector Database and Generative AI) is centered on price performance and ease of use for vector workloads, creating new features that help reduce costs through quantization, disk storage, and GPU utilization. Ease-of-use features will make it easier to get started with and use embedding vectors to improve search results. Theme 2 (Search) focuses on enhancing the query capabilities of core search, building a new query engine with query planning, tight integrations with Lucene innovations, improving search relevance, and searching across external data sources with Data Prepper. Theme 3 (Ease of Use) encompasses building a richer dashboard experience and serverless dashboards that feature simplified installation, migration, and multi-data-source support. Theme 4 (Observability, Log Analytics, and Security Analytics) emphasizes integrating with industry standards, such as OpenTelemetry, to unify workflows across metrics, logs, and traces; providing a richer SQL-PPL experience; positioning Discover as the main entry point for analytical workflows; improving Data Prepper for various analytics use cases; and developing well-integrated security analytics workflows. Theme 5 (Cost, Performance, and Scalability) includes improving core search engine performance, scaling shard management, providing context-aware templates for different workloads, moving to remote-store-backed tiered storage, and scaling cluster management. Theme 6 (Stability, Availability, and Resiliency) includes features involving query visibility, query resiliency, workload management, and cluster management resilience. Theme 7 (Security) centers on providing constructs that are secure by default and adopting a streamlined plugin security model as the plugin ecosystem grows. Theme 8 (Modular Architecture) involves modularizing the OpenSearch codebase to suit different deployments and moving to a decoupled, service-oriented architecture. Theme 9 (Releases and Project Health) dives into initiatives for faster automated releases, with streamlined continuous integration/continuous delivery (CI/CD) and metrics dashboards to measure community health and operations.

Roadmap details

In the following sections, we cover each theme in detail. You can find the associated RFCs and METAs on the new roadmap board. We would love for you to get involved with the OpenSearch community by contributing to innovation in these areas or by providing your feedback.

Roadmap Theme 1: Vector Database and Generative AI

The OpenSearch roadmap includes several innovations to OpenSearch’s vector database and ML functionality. These innovations focus on enhancing vector search and making ML-powered applications and integrations more flexible and easier to build. AI advancements are transforming the search experience for end users of all skill levels. By integrating AI models, OpenSearch delivers more relevant search results to all users. Experienced builders can apply additional techniques such as query rewriting, result reranking, personalization, semantic search, summarization, and retrieval-augmented generation (RAG) in order to further enhance search result accuracy. Many of these techniques rely on a vector database. With the current rise of generative AI, OpenSearch is gaining traction as a vector database solution powered by k-NN indexes. Our planned innovations will make OpenSearch vector database features easy to use and more efficient while lowering operational costs.

Vector search price performance: To further improve the price performance of vector search, we are planning several key initiatives, such as offering a disk-optimized approximate nearest neighbor (ANN) solution that uses quantized vectors to provide up to 32x compression and a 70% cost reduction, while still maintaining recall and requiring no pretraining. We are optimizing memory footprint functionality using techniques like iterative product quantization (PQ) and data types like binary vectors. Additionally, we are implementing smart routing capabilities that allow you to organize indexes by semantic similarity for doubled query throughput, enabling multi-tenancy and smart filtering for high-recall ANN search at the tenant level and using GPUs to significantly accelerate index build times for k-NN indexes, with a 10–40x better price/performance ratio compared to CPU-based infrastructure. We also plan to further lower costs by storing full-precision vectors on cold storage systems like Amazon Simple Storage Service (Amazon S3). The smart routing capabilities will place neighboring embeddings on the same node, improving query efficiency. The multi-tenancy and smart filtering features will cater to use cases requiring granular filtering of large datasets with stringent recall targets, enhancing efficiency and cost effectiveness. OpenSearch already provides memory footprint reduction techniques, such as PQ using HNSWPQ and IVFPQ and scalar quantization (SQ) in byte and fp16 formats. We are now investing in additional techniques to further compress vectors while maintaining recall similar to that provided when using full-precision vectors. The upcoming innovations are expected to significantly improve the price performance of vector search, making it more accessible and cost effective for a wide range of applications.

Out-of-the-box (OOB) experience: OpenSearch aims to enhance the OOB experience of vector search. While the community appreciates the wide variety of tools and algorithms provided for tuning clusters according to workloads, having too many options can make it challenging for users to choose the right configuration. To address this, OpenSearch’s AutoTune feature will recommend the optimal hyperparameter values for a given workload based on metrics such as recall, latency, and throughput. Additionally, we plan to introduce smarter defaults to automatically tune indexing threads and enable concurrent segment search based on traffic patterns and hardware resources. By simplifying the tuning process and providing intelligent defaults, OpenSearch will make it easier for users to achieve optimal performance without the need for extensive manual configuration.

Neural search: Ingestion performance has been a significant barrier to the adoption of neural search, especially for users who work with large-scale datasets. To address this, in version 2.16 we introduced online batch inference support that reduces communication overhead. We will further enhance ingestion performance by supporting offline batch inference. By using the offline batch processing capabilities of inference services like Amazon SageMaker, Amazon Bedrock, OpenAI, and Cohere, users will be able to directly process batch requests from preferred storage locations such as Amazon S3. This will significantly boost ingestion throughput while simultaneously reducing costs. Offline batch inference eliminates real-time communication with remote services, unlocking the full potential of neural search. We want to allow users to efficiently process large datasets and use advanced search capabilities at scale without compromising performance or incurring excessive costs.

Neural sparse search: Neural sparse search provides yet another semantic search option for builders. Sparse encoding models create a reduced token set in which related tokens have semantically similar weights. A neural sparse index uses Lucene’s inverted index to store tokens and weights, providing fast, token-based recall and fast scoring through dot products. The OpenSearch 2.13 release included self-pretrained sparse encoders on Hugging Face. Further optimizations will enhance both model effectiveness and efficiency:

  • More powerful models: OpenSearch will continue tuning neural sparse models to boost both relevance and efficiency.
  • Weight quantization: Compressing the payload of sparse term weights will considerably reduce index sizes, providing an economic solution comparable to BM25.
  • Multilingual support: In addition to English, neural sparse models will support at least three more languages.

Development process for ML-powered search: Enhancing the builder experience and streamlining the development process for ML-powered search is our top priority. To achieve this, we will introduce a low-code search flow builder within OpenSearch Dashboards, enabling the creation and customization of AI-enhanced search capabilities with minimal coding effort. Additionally, we will extend both the model-serving framework in ML Commons and its search pipeline functionality, allowing users to seamlessly integrate various third-party models, such as OpenAI or Cohere embedding models. This will provide greater flexibility and enable builders to use the most suitable solution for their specific use case.

ML connector certification program: To keep up with the rapid evolution of ML and the emergence of new inference services, we are launching a self-service certification program through which the community and service providers can contribute blueprints for their preferred inference models. OpenSearch already provides OOB blueprints for popular services such as Cohere and OpenAI. However, adding a new blueprint requires a manual code review and merging process, as shown in this pull request for adding a blueprint for the Cohere chat model. The new certification program encourages users to submit blueprints for their favorite models and have them verified and approved through automated pipelines. Once approved, these blueprints will be distributed alongside OpenSearch version releases, benefiting the entire community and ensuring that OpenSearch remains current with the latest advancements in the field.

OpenSearch Assistant Toolkit: The OpenSearch Assistant Toolkit helps create AI-powered assistants for OpenSearch Dashboards. Its main goal is to simplify interactions with OpenSearch features and enhance their accessibility. For example, using natural language queries allows for interaction with OpenSearch without the need to learn a custom query language. The toolkit empowers OpenSearch users to build their own AI-powered applications tailored to their customized use cases. It contains built-in skills that will allow builders to use LLMs to create new visualizations based on their data, summarize their data, and help configure anomaly detectors. The OpenSearch Assistant will guide both novice and experienced users, simplifying complex tasks and making it easier to effectively navigate OpenSearch. For more information, see this video.

OpenSearch is designed to offer a highly scalable, reliable, and fast search experience, built to handle large-scale data environments while delivering accurate and relevant results. The community is committed to evolving OpenSearch’s core search capabilities to meet modern workload standards and business needs. As part of our ongoing investments in the core search engine, the roadmap focuses on the following key advancements.

Enhanced query capabilities: The OpenSearch community continues to push the boundaries of query capabilities. Features like derived fields, wildcard fields, and bitmap filtering offer greater flexibility in search queries, allowing users to extract more precise insights from their data. The adoption of new ranking techniques and algorithms such as combined_fields (BM25F) improves search result relevance, contributing to a more refined search experience. We plan to introduce query categorization and insights, providing fine-grained monitoring to identify problematic queries, diagnose bottlenecks, and optimize performance.

Sophisticated query engine: We are committed to further enhancing the core query engine, with plans to integrate advanced capabilities from the SQL plugin directly into OpenSearch. This effort is aimed at unifying query planning and distributed execution across different query languages, bringing OpenSearch query domain-specific language (DSL), SQL, and Piped Processing Language (PPL) into closer parity. This integration will support more sophisticated query optimizations and distributed executions, unlocking more efficient data processing at scale. The introduction of join support in the core engine will offer users a powerful method of combining and analyzing datasets. These capabilities are crucial for those dealing with relational-style data, enabling greater query complexity without sacrificing performance. A key step in improving the query engine is separating the search coordinator logic from the shard-level Lucene search logic. This separation will allow the search coordinator to focus on complex distributed logic (including joins) and process results from a variety of data sources (including future support for non-Lucene data sources like relational databases and Parquet files).

Query performance: In terms of broader query engine speed and scale, OpenSearch is moving toward writer/searcher separation, which will provide a more modular and adaptable framework for managing indexing and search processes. Efforts like Star Tree index and the introduction of Protobuf for search execution and communication further reduce costs and improve performance, enabling the platform to efficiently handle even larger data volumes. The roadmap includes several key advancements in query processing, such as improving range query performance through approximation techniques, accelerating aggregations such as date histograms, enhancing concurrent segment search, developing multi-level request caching with tiered caching, and integrating Rust and SIMD operations.

Contributions to core dependencies: As part of our community-driven effort to optimize OpenSearch’s underlying architecture, we continue to contribute to the Lucene search library. A notable example includes ongoing work on BKD doc ID encoding, which will improve indexing and query performance. These contributions ensure that OpenSearch remains on the cutting edge of search technology, benefiting from the latest Lucene advancements.

Hybrid search enhancements: OpenSearch continues to enhance search relevance through hybrid search, which combines text and vector queries. In addition to the existing score-based normalization and combination techniques, OpenSearch plans to launch a rank-based approach called reciprocal rank fusion. This approach will combine search results based on their rank, allowing users to make informed choices by considering the score distribution. Moreover, hybrid search will be augmented with pagination and profiling capabilities, enabling users to debug scores at different stages of score normalization and combination. These enhancements will further improve the search experience, providing more accurate and insightful results while offering greater transparency into the ranking process.

User behavior insights: Search users are turning to AI to improve search relevance and reduce manual effort. However, it is challenging to train and tune opaque models without a data feedback loop. To help users gain search insights and build a tuning feedback loop, we are launching User Behavior Insights (UBI). UBI consists of a standard data schema, server-side collection components, query-side collection components, and analytics dashboards. This will provide a standard way for users to record and analyze search behavior and train and fine-tune models.

Ingestion from other databases: OpenSearch can ingest data from Amazon DynamoDB and Amazon DocumentDB databases using Data Prepper, which enables using OpenSearch as a search engine for these sources. Data Prepper is continuing to add support for new database types, with the immediate goal of supporting SQL databases. With this new source type, the community can search even more databases, including Amazon Aurora and Amazon Relational Database Service (Amazon RDS)/MySQL databases.

Roadmap Theme 3: Ease of Use

OpenSearch Dashboards provides an intuitive interface and powerful visualization and analytics tools for OpenSearch users. Additionally, OpenSearch Dashboards contains a rich set of features and tools that enable advanced analytics use cases. These easy-to-use tools simplify data exploration, monitoring, and management for both OpenSearch administrators and end users.

Richer dashboard experience: We are planning dynamic and interactive features to make data visualization more intuitive and powerful. Additionally, we aim to enable multiple data sources, allowing seamless integration and operations, such as cross-source alerting, within a unified interface. As part of this effort, we plan to introduce a dataset concept, which extends the index pattern concept in OpenSearch Dashboards and enables working with different types of data sources, such as relational databases or Prometheus. This will allow users to seamlessly access and visualize data from a variety of sources within the OpenSearch Dashboards interface. We are also introducing a workspace concept in OpenSearch Dashboards. Workspaces will streamline user workflows by providing curated vertical experiences for search, observability, and security analytics. Additionally, workspaces will enhance collaboration on workspace assets and improve data connections.

Serverless dashboards and migration: Our strategy for OpenSearch Dashboards also includes decoupling release and distribution from the OpenSearch engine. We are aiming to allow OpenSearch Dashboards to run as a standalone application, independent from the OpenSearch installation. OpenSearch Dashboards will have its own authentication and access control based on workspaces, and we’ll provide options for using a dedicated database for OpenSearch Dashboards saved objects. To simplify configuration and customization, we envision implementing one-click installation and setup, allowing users to get started quickly. We also plan to streamline plugin management to enable users to extend OpenSearch Dashboards without restarting the application. We aim to develop a migration toolkit to assist users in seamlessly transitioning data from older versions of OpenSearch Dashboards or other tools like Grafana. We’ll also implement an interactive onboarding experience to guide new users through key features and setup steps. Additionally, we plan to integrate live help powered by generative AI, which will offer real-time assistance within the platform, and to enhance the platform’s resilience with improved health and status monitoring. We also plan to focus on improving the overall performance of OpenSearch Dashboards. This will include optimizing the loading times of the application and visualizations, ensuring a smooth and responsive user experience. We will analyze the current performance bottlenecks and implement targeted optimizations to reduce latency and improve the responsiveness of OpenSearch Dashboards, especially when working with large or complex datasets.

Roadmap Theme 4: Observability, Log Analytics, and Security Analytics

The OpenSearch Project continues to enhance its observability and security analytics capabilities. We are dedicated to creating a more cohesive and user-friendly experience while expanding functionality and improving performance. Our roadmap for 2024–2025 focuses on delivering a more unified, powerful, and intuitive experience while maintaining the cost effectiveness and scalability our users expect.

OpenTelemetry support: OpenSearch has enhanced its observability features by incorporating support for the OpenTelemetry Protocol (OTLP), enabling the ingestion of metrics, logs, and traces. OTLP, a vendor-neutral protocol, standardizes telemetry data transmission, making it easier to send various types of observability data (traces, metrics, and logs) directly to OpenSearch. This integration with OpenTelemetry allows developers and operations teams to seamlessly ingest traces, metrics, and logs within a unified workflow, promoting a more efficient and standardized approach to collecting and analyzing observability data across complex, distributed systems. With robust support for OpenTelemetry and OTLP, OpenSearch offers a powerful platform for storing, analyzing, and visualizing essential observability data, simplifying system performance monitoring and issue troubleshooting across your entire infrastructure. To address the challenges of managing, monitoring, and analyzing traces, metrics, and logs, OpenSearch introduced a new schema compatible with OpenTelemetry. This schema supports predefined dashboards through an OpenSearch catalog for common systems like NGINX, HAProxy, and Kubernetes. Additionally, it enables cross-index querying of data containing shared structures from different telemetry data producers. OpenSearch is dedicated to continuously enhancing its schema to support emerging observability use cases and to develop more advanced correlation and alerting solutions. To further explore OpenSearch capabilities, see this demo.

Cost-effective, scalable analytics using Apache Spark: Many community members are opting to store data on cost-optimized cloud storage outside of OpenSearch, either because it is cost prohibitive to store in OpenSearch or because the amount of data raises scalability concerns. To analyze data outside of OpenSearch, users were forced to switch between tools or create one-off ingestion pipelines. OpenSearch’s integration with Apache Spark allows you to analyze data outside of OpenSearch, potentially reducing storage costs by up to 90%. OpenSearch has added support for indexing data on cloud storage using Spark Streaming. Naturally, analysts want to join data across OpenSearch indexes and the cloud. Our upcoming Iceberg-compatible table format will enable complex joins between OpenSearch indexes and cloud storage, enhancing your ability to analyze data across platforms. Additionally, the table enhances Iceberg by incorporating index capabilities, enabling the creation of search indexes on text fields, vector indexes, and geographical indexes. During query execution, these indexes will be automatically used to optimize full-text, neural, and geographical searches. Initially, this feature may be based on a customized Iceberg version that is fully compatible with Iceberg and named OpenSearch Table. As this feature is integrated into Iceberg, it will become available to all query engines.

Unified query experience—bridging PPL and SQL: By the end of 2024, we’ll consolidate SQL and PPL into a common interface within Discover. This unification will allow analysts to work more efficiently, using their preferred language without switching between tools. We’re also including autocomplete and auto-suggest functionality to make query building easier. Looking ahead to 2025, we’re planning to significantly enhance both OpenSearch’s PPL and SQL capabilities. For PPL, we’re introducing over 30 new PPL commands and functions, including JOINs, lookups, and JSON search capabilities. These additions will empower you to perform more sophisticated analyses, especially in observability and security contexts. Our SQL engine is also undergoing a major upgrade, with a focus on standardization and interoperability. You can look forward to support for vector search, geographical search, and advanced SQL queries, unlocking even more powerful analytics possibilities.

Discover—your central hub for analytics: We’re positioning Discover as the primary entry point to your analytics workflows. Soon, you’ll be able to seamlessly transition from refining queries to creating visualizations, performing trace analytics, generating reports, or setting up alerts—all without leaving the Discover interface. This interconnected approach will streamline your workflow, saving time and reducing context switching. While we know the community is interested in the workflows we highlighted, we will build the functionality generically so that the community can easily plug in custom workflows that meet their needs.

Enhanced observability tools: OpenSearch is working on several new observability features to enhance the existing capabilities and user experience. These include the development of a correlation zones framework, which aims to simplify and automate site reliability engineers’ (SREs) daily tasks by identifying critical issues more efficiently. The framework will categorize anomalies and incidents into correlation zones, reducing the need for constant monitoring and allowing SREs to focus on significant segments. Additionally, OpenSearch is optimizing its Trace Analytics plugin by adding improved storage capabilities, UI enhancements, query performance, and seamless integration with other OpenSearch Dashboards plugins. This includes the ability to store configurations, support for custom indexes and cross-cluster queries, and better correlation between logs, traces, and metrics. OpenSearch is also working on adding support for PromQL in dashboards, enabling users to query Prometheus data sources directly and further expanding its observability capabilities and data integration options.

Data Prepper: Data Prepper allows the community to ingest traces, logs, and metrics into OpenSearch. Currently, the primary means for ingesting these signals are through OpenTelemetry over gRPC, HTTP, and Apache Kafka and through loading from Amazon S3. The community has looked for other ways to ingest data into OpenSearch, and Data Prepper is planning to support those. First, an Amazon Kinesis source will allow the community to pull data from Amazon Kinesis, which is popular for streaming data. Second, Data Prepper is planning to provide a new OpenSearch API source for ingesting data using existing OpenSearch APIs. This API will initially accept requests made using the OpenSearch Bulk API and will support other document update APIs in the future. Third, Data Prepper will support Apache Kafka as a sink. While users can currently read from Apache Kafka using Data Prepper, there is growing interest in using Data Prepper as an ingestion tool for Kafka clusters. One of Data Prepper’s major use cases is observability and analytics, and both the maintainers and community continue to improve upon Data Prepper capabilities for these important use cases.

Security analytics: Our mission is to empower security and operations teams to quickly discover and isolate threats or operational issues, minimizing the impact on business operations and protecting confidential data. OpenSearch users ingest security and operations data into their clusters for real-time security threat detection and correlation, security event investigation, and operational trend visualization to generate meaningful insights. Security Analytics provides a prebuilt library of over 3,300 threat detection rules for common security event logs, a threat intelligence framework, a real-time detection rules engine, alerting capabilities for notifying incident response teams, and a correlation rules engine for identifying associations across events. In the coming year, we will create a unified experience so that users can move faster to find and address threats. We will support security insights without creating detectors, expand support for new security log types, add new threat intelligence feed integrations, and simplify the data mapping workflows. We will integrate generative AI features into existing workflows to enable users of all skill levels to easily configure threat detection, create security rules, and obtain security insights and remediation steps. In addition, we will improve investigation workflows that will enable users to query and analyze historical logs for compliance and investigation purposes. Native integrations with incident response and case management systems, such as ServiceNow and PagerDuty, will help users monitor updates from a centralized location.

Roadmap Theme 5: Cost, Performance, and Scalability

Search performance and a new query engine: As data volumes increase in size and workloads become more complex, price performance remains a top priority for OpenSearch users. OpenSearch recently implemented significant engine performance enhancements, as highlighted in a previous blog post. Compared to OpenSearch 1.0, recent OpenSearch versions demonstrate a 50% improvement for text queries, a 40% improvement for multi-term queries, a 100x boost for term queries, and a 50x boost for date histograms. These advancements stem from the engine performance optimizations outlined in our performance roadmap. The roadmap also includes future initiatives such as document reordering, query rewriting, dynamic pruning, and count-only caching. Additionally, the OpenSearch community is now taking the initiative to evolve the core engine in order to embrace new technologies like custom engines, parallelization, and composable architectures—all within an open-source framework. This includes rearchitecting the engine toward indexing and search separation and offering a more modular and adaptable system. Additionally, faster interconnections using an efficient binary format for client-server communication, such as gRPC, and node-to-node messaging through Protobuf, have yielded promising early results. While actively contributing to core Lucene, we’re also focused on building a cloud-native architecture to further enhance engine performance at scale.

Application-based context templates: Application-based context templates provide predefined, use-case-specific templates that package the right configuration for the specific use case. For example, an index created based on the logs template is configured with the Zstd compression codec and log_byte_size merge policy. This configuration helps reduce disk utilization and enhances overall performance. Multi-field indexes aim to provide constant query latency when a query searches across multiple fields. The first implementation of a multi-field index is available as a Star Tree index. The roadmap includes plans to introduce additional context-specific templates, such as those for metrics, traces, and events. It also aims to enhance existing templates with specialized optimizations, including the Star Tree index.

Scaling shard management: Shard splitting aims to provide the capability to scale shards based on size or throughput with zero downtime for read and write traffic. In search use cases, it can be difficult to predict the number of primary shards in advance. As a result, the OpenSearch cluster can become “hot,” impacting performance. This can yield insufficient resources on the node hosting the shard, potentially triggering Lucene’s hard limit on the number of documents (2B) in a Lucene index. Today, there are two options available to solve this problem: document reindexing or index splitting. With document reindexing, the entire index is reindexed into a new index with a larger number of primary shards. This is a very slow process that requires additional compute and I/O. With index splitting, the index is first marked as read-only, and then all its shards are split, causing write downtime for users. Additionally, the Split API does not provide the granularity of splitting at the shard level, so a single hot shard cannot be scaled independently. In-place shard splitting will address these limitations and provide a more holistic way to scale shards. One challenge of running a bigger cluster is optimally allocating a large number of shards while honoring a set of placement constraints. Because all placement decisions are executed sequentially, the cluster manager is unable to prioritize other critical operations, such as index creation and settings updates, which can eventually time out. To address this issue, all placement decisions are optimized and bounded so they finish early, preventing starvation of critical tasks.

Remote-backed storage and automatic storage tiering: OpenSearch already offers remote store indexes, which improve durability and indexing performance. Building on this architecture, we plan to deliver an end-to-end multi-tier storage experience, which will provide users with an optimal balance of cost and performance. The warm tier will handle more storage per compute while maintaining the interactive search experience on the warm data without requiring all data to be locally available. The on-demand cold tier experience will provide compute and storage separation, allowing users to store large amounts of data that can be made searchable when needed. Additionally, we’ll introduce new use-case-specific index templates to simplify index configuration for users.

Pull-based ingestion: Native pull-based ingestion that pulls events from an external event stream provides further benefits compared to the current push-based model. These benefits include better handling of ingestion throughput spikes and removing the need for the translog in the indexing nodes. OpenSearch can be extended to support pull-based indexing, which can also present the possibility of priority-based ingestion. Time-sensitive and critical updates can be isolated from lower-priority events, and ingestion spikes can be handled by throttling low-priority events.

Next-generation snapshots for remote-backed clusters: Snapshots v2 aims to enhance the scalability of snapshots for remote-backed clusters and reduce dependence on per-shard state updates in the cluster manager. The new snapshots rely on a timestamp-based pinning strategy, where instead of resolving shard-level files at snapshot time, the timestamp for the snapshot is pinned and the resolution is deferred until restore time. This approach makes the snapshot process much faster, allowing snapshot operations to finish within a couple of minutes, even for larger clusters, while significantly reducing the computational load associated with data backup. Timestamp pinning serves as the fundamental building block for future features, such as Point-In-Time-Restore (PITR).

Scaling admin APIs: For large cluster configurations, cluster manager nodes become scaling bottlenecks as multiple admin APIs obtain the cluster state from the active cluster manager node, even if the latest state is present locally or present in a remote store. With the ongoing optimizations, the coordinator node can serve the admin APIs without relaying the request to the cluster manager node in most cases. Also, for APIs like CAT Shards and CAT Snapshots, the response size increases as the cluster expands to 100K shards or more. We plan to introduce pagination and cancellation for these APIs to ensure that they continue to operate efficiently regardless of the metadata size. We are implementing multiple optimizations to the Stats and Cluster APIs that will eliminate redundant processing and perform pre-aggregation on the data node before responding to the coordinator node receiving the user request.

Roadmap Theme 6: Stability, Availability, and Resiliency

OpenSearch is designed to provide capabilities for search and analytics at scale by using the underlying Lucene search engine that also powers other distributed systems. The OpenSearch Project has dedicated time and effort to improving stability and resiliency and making the service highly available. The following are some of the planned key efforts.

Coordinator-level latency visibility: This initiative provides users visibility into the different phases of search request execution in OpenSearch. This is particularly useful for statistically identifying possible changes in a workload by monitoring latency metrics across different phases. Coordinator slow logs were recently introduced to give users the ability to capture “slow” requests along with a breakdown of time spent in different search phases, something that was otherwise only available for the query and fetch phases.

Query insights: We recently introduced the ability for users to access computationally expensive queries (top N queries). We plan to integrate OpenSearch with external metrics collectors, like OpenTelemetry, to deliver more comprehensive analytics. Currently, queries can be analyzed according to various metrics, such as latency, CPU, memory utilization, and even query structure. Support for visualizing the execution profile will help users easily identify bottlenecks in their workload execution. Given a sufficient level of insight data, we will use AI/ML to build recommendation systems, which will eventually be able to automatically manage cluster settings for users with minimal intervention on their part.

Query resiliency: One significant risk to cluster stability is runaway queries that continuously consume memory, leading to out-of-memory states and potentially catastrophic outcomes. Search backpressure introduces a mechanism to automatically identify and terminate such problematic queries when an OpenSearch host is low on memory or CPU. Existing mechanisms like circuit breakers and thread pool size thresholds provide a generic solution, but they do not specifically target the problematic queries. New search backpressure and hard cancellation techniques are designed to address these limitations.

Workload management: An OpenSearch installation often contains a large number of tenants, all of which experience the same quality of service (QoS). However, this potentially means that an inexperienced tenant can consume more than the desired amount of cluster resources, which can lead to a degraded experience for other tenants. Admission control and search backpressure provide a best-effort assurance for cluster stability but do not guarantee a consistent QoS. With the introduction of query groups, system administrators of OpenSearch clusters will be able to provide tenant-based performance isolation for search workloads, manage tenant-based query groups, and enforce resource-based limits on tenant workloads. This enhancement will allow system administrators to prioritize execution of some workloads over others, thereby further improving QoS guarantee levels.

Cluster state management: The cluster manager node manages all admin operations in a cluster. These operations include creating and deleting indexes, updating fields in an existing index, taking snapshots, and adding and removing nodes. The metadata about indexes and other entities—data streams, templates, aliases, snapshots, and custom entities stored by plugins—is stored in a data structure called the cluster state. Any change to the cluster state is processed by the cluster manager node and persisted to that node’s local disk. Starting with version 2.12, OpenSearch added support for storing the cluster state remotely in order to provide durability guarantees. With the introduction of a remote cluster state, replacing all cluster manager nodes will not result in any data loss in remote store clusters. The cluster manager node processes any cluster state updates and then sends the updated state to all the follower nodes in the cluster. As the state and number of follower nodes grow, the overhead on the cluster manager node increases significantly because the cluster manager node is responsible for publishing the updated state to every node in the cluster. This impacts the cluster’s stability and availability. To reduce strain on the cluster manager node, we are proposing to use the remote store for cluster state publication. The cluster manager node will publish the entire cluster state to the remote store to be downloaded by each follower node. The published cluster state will include ephemeral entities like the shard routing table, which stores the mapping of the shards assigned to each data node in the cluster. The cluster manager node will only communicate that a new state is available and provide the remote location of the new state, instead of publishing the entire cluster state. Publishing the state remotely will reduce memory, CPU, and transport thread overhead on the cluster manager node during cluster state changes. This approach will also allow on-demand downloading of entities on the data or coordinator nodes instead of requiring all nodes to maintain the full cluster state. This will align with our vision of a more cloud-native architecture. Remote publication will be generally available in OpenSearch 2.17 and is planned to be further enhanced in future version releases.

Data Prepper pipeline DLQ: Data Prepper provides resilience when OpenSearch is down by buffering data and eventually writing to a dead-letter queue (DLQ) if the cluster remains unavailable. Currently supported DLQ targets are local files and Amazon S3. One current limitation is that data is only sent to the DLQ if it fails to write to the sink. Other failures, such as during processing in the pipeline, do not case data to be sent to the DLQ. With the proposed pipeline DLQ, Data Prepper will be able to send failed events to the DLQ or continue to send them downstream, allowing the pipeline author to decide. This will improve the resiliency of data throughout the pipeline. Additionally, the pipeline DLQ will be a pipeline just like any other and will be able to write to any supported Data Prepper sink, such as Apache Kafka.

Roadmap Theme 7: Security

Security is a Tier 0 prerequisite for modern workloads. In OpenSearch, security features are primarily implemented by the Security plugin, which offers a rich set of capabilities. These include various authentication backends (SAML, JWT, LDAP), authorization primitives, fine-grained access control (document-level and file-level security, or DLS/FLS), and encryption in transit. OpenSearch has rapidly developed new plugin capabilities, attracting increased interest from the community. This growth also raises critical security implications. Importantly, security should not come at the cost of performance. To address these challenges, OpenSearch is focusing on the following initiatives to strengthen its security posture.

Plugin resource permissions: We are developing a mechanism for sharing plugin resources that supports existing use cases while allowing more granular control over resource sharing. Examples include model groups in the ML Commons plugin, anomaly detectors in the Time Series Analytics plugin, and detectors in the Alerting plugin.

Plugin isolation: OpenSearch is moving toward a zero-trust model for plugins. Cluster administrators will have full visibility into all permissions requested by a plugin before installation.

Optimized privilege evaluation: Performance is a key focus for OpenSearch. We’ve identified areas within the Security plugin that can yield significant performance improvements, especially for clusters with numerous indexes or roles mapped to users.

API tokens: API tokens introduce a new way to interact with OpenSearch clusters by associating permissions directly with a token. Cluster administrators will have full visibility into and control over the issued tokens and their usage.

Ease of use: We aim to simplify security setup for cluster administrators. Many useful security features remain underused because they are not exposed through OpenSearch Dashboards. To address this, we will add security dashboard pages where administrators can configure rate limiters to protect clusters from unauthenticated actors.

Looking ahead, security primitives like authorization could be extracted and made pluggable, allowing integration with newer open standards for policy evaluation, such as Open Policy Agent (OPA) or Cedar.

Roadmap Theme 8: Modular Architecture

OpenSearch is working toward well-supported modularity in order to enable rapid development of properly encapsulated features and flexible deployment architectures for cloud-native use cases. Historically, OpenSearch has been deployed and operated as a cluster model, in which all functions (such as replication and durability) were implemented within the cluster. While the project has grown organically, offering many extension points through plugins, it still relies on a monolithic server module at its core, with tight coupling across the architecture. As the project grows within a globally distributed community, this monolithic architecture will become an unsustainable bottleneck. Innovations such as the next-generation query engine are not possible with tightly coupled components. Additionally, the Java Security Manager is pending deprecation and removal from the Java runtime, and the recommended replacement technique, (shallow sandboxing), relies on using newer language features that require properly modularized code. The overall goal of the modularity effort is to allow the same core OpenSearch code to run across all variants (for example, on-premises clusters and large managed serverless offerings) while providing strong encapsulation of cluster functions. This will facilitate more independent development and innovation across the project.

Roadmap Theme 9: Releases and Project Health

With contributions ranging from code enhancements to feature requests across all roadmap themes, the OpenSearch community is working together to maintain the stability of the codebase while ensuring that CI/CD pipelines remain green across all active branches. This provides a reliable foundation for both new and existing contributors, reduces bugs, and safeguards feature integrity. Key repository health metrics are publicly available on the Ops Dashboard.

The OpenSearch release process is fully automated, including a one-click release system for products such as OpenSearch Benchmark. Each product adheres to semantic versioning (semver), ensuring that breaking changes only occur in major versions. Releases follow a structured schedule, starting with a code freeze and release candidate generation, and are driven by automated workflows that eliminate the need for manual sign-offs. We’re also building a Central Release Dashboard to streamline and provide visibility into the release pipeline from beginning to end.

Get involved

We recognize that community engagement is crucial to the success of all the innovations mentioned in this post. We invite the open-source community to review our roadmap, provide feedback, and contribute to the OpenSearch Project. Your insights and contributions will be invaluable in helping us to achieve these goals and continue improving OpenSearch.

You can propose new ideas and features at any time by creating a GitHub issue and following our feature request template. Once proposed, the feature can be included in the public roadmap by adding corresponding labels (such as Meta, RFC, or Roadmap), which are automatically populated for all the repositories and are categorized by themes for clarity. If you have any questions or suggestions for improving our processes, please feel free to reach out or contribute directly through GitHub.

We encourage you to actively participate in our project because your involvement will help shape the future of OpenSearch. By engaging with our community, sharing your ideas, and contributing to development, you’ll play a crucial role in driving innovation and improving the project. Thank you for your continued support and commitment to open source!