An update on the OpenSearch Project’s continued performance progress through version 2.11

Wed, Jan 10, 2024 · Saurabh Singh, Pallavi Priyadarshini, Dagney Braun, Rishabh Singh

OpenSearch is a community-driven, open-source search and analytics suite used by developers to ingest, search, visualize, and analyze data. Introduced in January 2021, the OpenSearch Project originated as an open-source fork of Elasticsearch 7.10.2. OpenSearch 1.0 was released for production use in July 2021 and is licensed under the Apache License, Version 2.0 (ALv2), with the complete codebase published to GitHub. The project has consistently focused on improving the performance of its core open-source engine for high-volume indexing and low-latency search operations. OpenSearch aims to provide the best experience for every user by reducing latency and improving efficiency.

In this blog post, we’ll share a comprehensive view of strategic enhancements and performance features that OpenSearch has delivered to date. Additionally, we’ll provide a look forward at the Performance Roadmap. We’ll compare the core engine performance of the latest OpenSearch version (OpenSearch 2.11) to the state just before the OpenSearch fork, with a specific focus on the advancements made since then. With this goal in mind, we have chosen Elasticsearch 7.10.2 to represent the baseline from which OpenSearch was forked, allowing us to measure all changes that have been delivered after the fork (in OpenSearch 1.0–2.11). These improvements were made in collaboration with the community; thus, the OpenSearch Project is actively seeking to enhance community engagement, specifically in the area of performance improvement.

Performance improvements to date

OpenSearch performance improvements can be categorized into three high-level buckets:

  1. Indexing performance
  2. Query performance
  3. Storage

The following image summarizes OpenSearch performance improvements since launch.

OpenSearch performance improvements since launch

Log analytics workloads are typically indexing heavy, often relying on specific resource-intensive queries. In contrast, search workloads have a more balanced distribution between indexing and query operations. Based on the analysis we’ll detail below comparing Elasticsearch 7.10.2 to OpenSearch 2.11, we have seen a 25% improvement in indexing throughput, a 15–98% decrease in query latencies among some of the most popular query types, and, now with Zstandard compression, a 15–30% reduction in on-disk data size.

Indexing performance investments

Some of the key OpenSearch features launched this year delivered efficiency improvements in indexing performance. OpenSearch rearchitected the way indexing operations are performed in order to deliver segment replication—a physical replication method that replicates index segments rather than source documents. Segment replication, a new replication strategy built on Lucene’s Near-Real-Time (NRT) Segment Index Replication API, was released as generally available in OpenSearch 2.7. Segment replication showed increased ingestion rate throughput of up to 25% when compared to default document replication. You can find a more detailed look at segment replication in this blog post.

In version 2.10, OpenSearch introduced remote-backed storage, allowing users to directly write segments to object storage, such as Amazon Simple Storage Service (Amazon S3) or Oracle Cloud Infrastructure (OCI) Object Storage, to improve data durability. With remote-backed storage, in addition to storing data on a local disk, all the ingested data is stored in the configured remote store. At every refresh, the remote store also automatically becomes a point-in-time recovery point, helping users achieve a recovery point objective (RPO) of zero with the same durability properties of the configured remote store. To learn more, see this blog post.

Query performance investments

OpenSearch supports an extensive array of query types for different use cases, from comprehensive search capabilities to a broad spectrum of aggregations, filtering options, and sorting functionalities. One of the major query performance areas in which OpenSearch has improved is in helping vector queries perform at scale, given the rise in vector search popularity. OpenSearch’s vector engine offers fast, billion-scale vector searches with efficient latency and recall.

Recent additions like scalar and product quantization reduced the cluster memory footprint by up to 80%. The incorporation of native libraries (nmslib and faiss) and HNSW with SIMD instructions has expedited vector indexing and search queries. At a large scale, tested with billions of documents, OpenSearch delivered a roughly 30% lower latency compared to Lucene ANN searches. For more information, see this partner highlight.

We’ve also continued to invest broadly in core query performance for popular query types used for log analytics, time-series data, and search. OpenSearch has demonstrated significant improvement since the fork from Elasticsearch 7.10.2 for many query types. The benchmarking we performed showed a 15%–98% increase in performance across popular query operations such as match all, range queries, aggregation queries, and full-text queries. You can review key benchmarking findings in the following sections.

Storage investments

Storage is another major factor that affects the overall efficiency of log analytics and search workloads. In OpenSearch 2.9 and later, customers can use Zstandard compression, resulting in a 30% reduction in on-disk data size while maintaining a near-identical CPU utilization pattern compared to the default compression. Some of the ongoing work, such as the addition of a match_only_text field (see #11039), has shown a promising reduction of about 25% in data on disk, primarily with text data field optimization, and should be available to users in the upcoming OpenSearch 2.12 release.

Measured performance improvements

To compare performance between Elasticsearch 7.10.2 and OpenSearch 2.11, we ran query operations for widely used scenarios in log analytics, time series, and search. We ran the queries across clusters running each version and documented the resulting performance. The following subsections provide the key findings from this exercise.

Log analytics

For log analytics use cases, we used the http_logs workload from OpenSearch Benchmark—a macro-benchmark utility within the OpenSearch Project—to replicate some of the common query operations. Here are the key highlights:

  • match_all queries with sorting showed a more than 20x performance boost across the board because of multiple improvements made in the area (see #6321 and #7244) and other Lucene enhancements.

  • Queries for ascending and descending sort-after-timestamp saw a significant performance improvement of up to 70x overall. The optimizations introduced (such as #6424 and #8167) extend across various numeric types, including int, short, float, double, date, and others.

  • Other popular queries, such as search_after, saw about a 60x reduction in latency, attributed to the improvements made in the area involving optimally skipping segments during search (see #7453). The search_after queries can be used as the recommended alternative to scroll queries for a better search experience.

  • Implementation support for match_only_text field optimization on storage and indexing/search latency for text queries is in progress (see #11039).

Time series

In the context of aggregations over range and time-series data, we used the nyc_taxis and http_logs workloads from OpenSearch Benchmark to benchmark various popular use cases. Here are the key highlights:

  • Range queries, popular for aggregation use cases, exhibited about a 50%–75% improvement, attributed to system upgrades such as Lucene (from v8.8 in Elasticsearch 7.10.2 to v9.7 in OpenSearch 2.11) and JDK (from JDK15 in Elasticsearch 7.10.2 to JDK17 in OpenSearch 2.11).

  • Hourly aggregations and multi-term aggregations also demonstrated improvement, varying from 5% to 35%, attributed to the time-series improvements discussed previously.

  • date_histograms and date_histogram_agg queries exhibited either comparable or slightly decreased performance, ranging from 5% to around 20% in multi-node environments. These issues are actively being addressed as part of ongoing project efforts (see #11083).

  • For date histogram aggregations, there are upcoming changes aiming to improve performance by rounding down dates to the nearest interval (such as year, quarter, month, week, or day) using SIMD (see #11194).

In the realm of text queries, we used pmc workloads from OpenSearch Benchmark to emulate various common use cases. Here are the noteworthy highlights:

  • Phrase and term queries for text search showed improved latency, with a 25% to 65% reduction, underscoring their improved effectiveness.

  • Popular queries related to scrolling exhibited about 15% lower latency, further improving the overall user experience.

Additional optional performance-enhancing features available in version 2.11

The core engine optimizations discussed in the previous sections are available by default. Additionally, OpenSearch 2.11 includes a few key performance-enhancing features that can be optionally enabled by users. These features were not available in prior versions, so we separately benchmarked performance with those features individually enabled, resulting in the following findings:

  • LogByteSize merge policy: Showed a 40–70% improvement in ascending and descending sort queries, which is advantageous for time-series data with timestamp sorting and minimal timestamp overlap between segments.

  • Zstandard compression: This addition empowers OpenSearch users with the new Zstandard compression codecs for their data, resulting in a 30% reduction in on-disk data size while maintaining a near-identical CPU utilization pattern compared to the default compression.

  • Concurrent segment search: Enabling every shard-level request to concurrently search across segments during the query phase resulted in latency reduction across multiple query types. Aggregate queries showed a 50%–70% improvement, range queries showed a 65% improvement, and aggregation queries with hourly data aggregations showed a 50% improvement.

Future roadmap

The OpenSearch Project remains steadfast in its commitment to continuously improving the core engine performance in search, ingestion, and storage operations. The OpenSearch Project roadmap on GitHub is constantly evolving, and we are excited to share it with you. This roadmap places a special emphasis on the core engine advancements while also encompassing critical areas like benchmarking and query visibility. As part of our ongoing commitment, we plan to consistently update this roadmap with both short- and long-term improvement plans. We’re keeping performance excellence at the forefront of our investments, and OpenSearch users can anticipate a series of impactful improvements in new releases in 2024, starting with OpenSearch 2.12.

In the upcoming releases, we will continue to improve the core engine by targeting specific query types. We will also undertake broad strategic initiatives to further enhance the core engine through Protobuf integration, query rewrites, tiered caching, and SIMD and RUST implementations. In addition to improving the core engine, we are committed to improving OpenSearch tooling capabilities. One such improvement that we’re currently working on is the query insights functionality, which helps identify the top N queries that impact performance. Additionally, OpenSearch is working on making benchmarks easier for community members to use. For a comprehensive list of investments and additional improvements, or to provide feedback, please check out the OpenSearch Performance Roadmap on GitHub.

This concludes the main summary of OpenSearch performance improvements to date. The following appendix sections provide the benchmarking details for readers interested in replicating any run.


Appendix: Detailed execution and results

If you’re interested in the details of the performance benchmarks we used, exploring the methodologies behind their execution, or examining the comprehensive results, keep reading. For OpenSearch users interested in establishing benchmarks and replicating these runs, we’ve provided comprehensive setup details alongside each result. This section provides the core engine performance comparison between the latest OpenSearch version (OpenSearch 2.11) and the state just before the OpenSearch fork, Elasticsearch 7.10.2, with a mid-point performance measurement on OpenSearch 2.3.

OpenSearch Benchmark and workloads

OpenSearch Benchmark serves as a macro-benchmark utility within the OpenSearch Project. With the help of this tool, OpenSearch users and developers can generate and visualize performance metrics from an OpenSearch cluster for various purposes, including:

  • Monitoring the overall performance of an OpenSearch cluster.
  • Evaluating the benefits of and making decisions about upgrading the cluster to a new version.
  • Assessing the potential impact on the cluster resulting from changes to the workflows, such as modifications to the index mappings or changes in queries.

The OpenSearch Benchmark workloads are comprised of one or multiple benchmarking scenarios. A workload typically includes the ingestion of one or more data corpora into indexes and a collection of queries and operations that are executed as a part of the benchmark.

We used the following workloads for performance evaluation, encompassing aspects such as text/term queries, sorting, aggregations, histograms, and ranges:

The configurations specific to the setup for each evaluation are provided along with the results in the following sections.

Comparative baseline analysis: Elasticsearch 7.10.2 vs. OpenSearch 2.3 vs. OpenSearch 2.11

We compared the performance of Elasticsearch 7.10.2 (pre-fork), OpenSearch 2.3 (an interim release), and OpenSearch 2.11 (latest version at the time of testing). This analysis covers three workloads (http-logs, nyc-taxis, and pmc) used to assess performance across different use cases. The goal is to provide comparable core engine performance metrics since the Elasticsearch 7.10.2 fork. The benchmarks in the following sections show averages from 7 days of data, generated during nightly runs using OpenSearch Benchmark, intentionally excluding outliers. In the detailed results section, each table contains a percentage improvement column. This column emphasizes the improvements to OpenSearch 2.11 over the previous releases, with positive values indicating improvement and negative values indicating regression.

Detailed results: Elasticsearch 7.10.2 vs. OpenSearch 2.3 vs. OpenSearch 2.11—deep dive into single-shard performance

Objective: To eliminate variables introduced at the coordination level and concentrate on data node query performance.

Setup: 1 data node (r5.xlarge) with 32 GB RAM and 16 GB heap. Index settings: 1 shard and 0 replicas.

http_logs workload results: The following table provides a benchmark comparison for the http_logs workload between Elasticsearch 7.10.2, OpenSearch 2.3, and OpenSearch 2.11. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations ES 7.10.2 (p90_value) OS 2.3.0 (p90_value) OS 2.11.0 (p90_value) OS 2.11.0 improvement (vs. OS 2.3.0) OS 2.11.0 improvement (vs. ES 7.10.2)
200s-in-range 13 5 5.33 -7% 59%
400s-in-range 2.73 2 2.53 -26% 7%
asc_sort_size 4,262.2 4,471 4.6 100% 100%
asc_sort_timestamp 10.73 785 6.47 99% 40%
asc_sort_with_after_timestamp 4,576 5,368 34.47 99% 99%
default 2.91 3 3 0% -3%
desc_sort_size 3,800.4 3,994 9.53 100% 100%
desc_sort_timestamp 39.18 5,228 58.8 99% -50%
desc_sort_with_after_timestamp 5,824.27 6,925 87.8 99% 98%
hourly_agg 9,387.55 9,640 9,112.4 5% 3%
multi_term_agg N/A 14,703 9,669.8 34% N/A
range 28.18 12 13.4 -12% 52%
scroll 213.91 173 197.4 -14% 8%
term 3.45 3 3.4 -13% 1%

nyc_taxis workload results: The following table provides a benchmark comparison for the nyc_taxis workload between Elasticsearch 7.10.2, OpenSearch 2.3, and OpenSearch 2.11. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations ES 7.10.2 (p90_value) OS 2.3.0 (p90_value) OS 2.11.0 (p90_value) OS 2.11.0 improvement (vs. OS 2.3.0) OS 2.11.0 improvement (vs. ES 7.10.2)
autohisto_agg 559.13 596 554.6 7% 1%
date_histogram_agg 562.4 584 545.07 7% 3%
default 4.73 6 5.07 15% -7%
distance_amount_agg 13181 12796 15285 -19% -16%
range 654.67 213 213.4 0% 67%

pmc workload results: The following table provides a benchmark comparison for the pmc workload between Elasticsearch 7.10.2, OpenSearch 2.3, and OpenSearch 2.11. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations ES 7.10.2 (p90_value) OS 2.3.0 (p90_value) OS 2.11.0 (p90_value) OS 2.11.0 improvement (vs. OS 2.3.0) OS 2.11.0 improvement (vs. ES 7.10.2)
articles_monthly_agg_cached 2 2 2.4 -20% -20%
articles_monthly_agg_uncached 26.5 27 27.8 -3% -5%
default 6.5 6 5.2 13% 20%
phrase 8.25 6 6.4 -7% 22%
scroll 894.5 857 753.8 12% 16%
term 9 5 5.6 -12% 38%

Detailed results: Elasticsearch 7.10.2 vs. OpenSearch 2.3 vs. OpenSearch 2.11—deep dive into multiple-shard performance

Objective: To introduce the coordination layer with parallel search operations extending across multiple nodes with primary shards.

Setup: 3 data nodes (r5.xlarge) with 32 GB RAM and 16 GB heap. 3 cluster manager nodes (c5.xlarge) with 8 GB RAM and 4 GB heap. Index settings: 3 shards and 0 replicas.

http_logs workload results: The following table provides a benchmark comparison for the http_logs workload between Elasticsearch 7.10.2, OpenSearch 2.3, and OpenSearch 2.11. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations ES 7.10.2 (p90_value) OS 2.3.0 (p90_value) OS 2.11.0 (p90_value) OS 2.11.0 improvement (vs. OS 2.3.0) OS 2.11.0 Improvement (vs. ES 7.10.2)
200s-in-range 9.8 6 6.85 -14% 30%
400s-in-range 5.8 5 5.5 -10% 5%
asc_sort_size 1,451.13 1,602 8.35 99% 99%
asc_sort_timestamp 10.4 291.5 10.58 96% -2%
asc_sort_with_after_timestamp 1,488.25 1,910.5 25.38 99% 98%
default 6 6 6.3 -5% -5%
desc_sort_size 1,281.3 1,431 13.91 99% 99%
desc_sort_timestamp 34.4 1,878.5 91.9 95% -167%
desc_sort_with_after_timestamp 1,887.7 2,480 85.78 97% 95%
hourly_agg 2,566.9 3,115 2,937 6% -14%
multi_term_agg N/A 5205 3603 31% N/A
range 18.1 9 8.95 1% 51%
scroll 340.1 267 323.85 -21% 5%
term 5.8 6 6.45 -8% -11%

nyc_taxis workload results: The following table provides a benchmark comparison for the nyc_taxis workload between Elasticsearch 7.10.2, OpenSearch 2.3, and OpenSearch 2.11. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations ES 7.10.2 (p90_value) OS 2.3.0 (p90_value) OS 2.11.0 (p90_value) OS 2.11.0 Improvement (vs. OS 2.3.0) OS 2.11.0 Improvement (vs. ES 7.10.2)
autohisto_agg 208.92 217 212.93 2% -2%
date_histogram_agg 198.77 218 209.4 4% -5%
default 8.46 7 9.67 -38% -14%
distance_amount_agg 4,131 4,696 5,067.4 -8% -23%
range 281.62 73 79.53 -9% 72%

pmc workload results: The following table provides a benchmark comparison for the pmc workload between Elasticsearch 7.10.2, OpenSearch 2.3, and OpenSearch 2.11. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations ES 7.10.2 (p90_value) OS 2.3.0 (p90_value) OS 2.11.0 (p90_value) OS 2.11.0 improvement (vs. OS 2.3.0) OS 2.11.0 improvement (vs. ES 7.10.2)
articles_monthly_agg_cached 3.55 3 3.71 -24% -5%
articles_monthly_agg_uncached 12 12 12.43 -4% -4%
default 9 6 6.79 -13% 25%
phrase 8.18 7 7.14 -2% 13%
scroll 755.18 593 642.79 -8% 15%
term 9 7 7.14 -2% 21%

Elevating performance with new features available in OpenSearch 2.11

The following sections present OpenSearch 2.11 features that improve performance.

LogByteSize merge policy

In the realm of log analytics, the Tiered merge policy has been a cornerstone of efficient shard merges. In OpenSearch 2.11 we introduced the LogByteSize merge policy. This new approach consistently merges adjacent segments, proving especially advantageous for time-series data characterized by timestamp sorting and minimal timestamp overlap between segments.

The following are the key findings from this exercise.

  • Timestamp queries with ascending sort had an improvement of over 75%. This transformation is attributable to the impactful contribution of enhancement #9241.
  • About a 40% enhancement in descending sort timestamp queries, surpassing the Tiered merge policy.
  • Use cases around ascending and descending sort with an after timestamp saw regression, which is a known case for smaller workloads with this merge policy.
  • Other common use cases for log analytics, such as multi-term aggregation, hourly_agg, range, and scroll queries exhibited comparable performance, with a subtle improvement of less than 5% attributed to the new segment merge policy.

Detailed results: OS 2.11 Tiered merge policy vs OS 2.11 LogByteSize merge policy

Setup: 3 data nodes (r5.xlarge) with 32 GB RAM and 16 GB heap. 3 cluster manager nodes (c5.xlarge) with 8 GB RAM and 4 GB heap. Index settings: 3 shards and 0 replicas, max_segment_size: 500 MB, refresh_interval: 1 s.

http_logs workload results: The following table provides a benchmark comparison for the http_logs workload for OpenSearch 2.11 with the Tiered merge policy vs. the LogByteSize merge policy. It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations OS 2.11—Tiered (p90_value) OS 2.11—LogByteSize (p90_value) % improvement (vs. Tiered merge policy)
200s-in-range 6 6 0%
400s-in-range 5 6 -20%
asc_sort_size 9 8 11%
asc_sort_timestamp 34 8 76%
asc_sort_with_after_timestamp 13 68 -423%
default 7 6 14%
desc_sort_size 11 10 9%
desc_sort_timestamp 29 17 41%
desc_sort_with_after_timestamp 35 130 -271%
hourly_agg 2816 2809 0%
multi_term_agg 2739 2800 -2%
range 7 8 -14%
scroll 349 344 1%
term 7 6 14%

Zstandard codec compression

This addition empowers OpenSearch users with the new Zstandard compression codecs for their data. Users can specify zstd or zstd_no_dict in the index.codec setting during index creation or modify the codecs for existing indexes. OpenSearch will continue to support the existing zlib and lz4 codecs, with the default as lz4.

The following sections contain benchmarking results representing the average of 5 days of data from nightly runs using OpenSearch Benchmark.

Highlights

Here are the key highlights:

  • A notable increase in the ingestion throughput, ranging from 5% to 8%, attributed to the new zstd codec. This enhancement owes its success to codec-related pull requests (#7908, #7805, and #7555).

  • About a 30% reduction in on-disk data size, surpassing the default lz4 codec for unparalleled efficiency, all while maintaining a near-identical CPU utilization pattern compared to the default compression.

  • The search p90 latencies remained virtually unchanged, with negligible differences of less than 2% in a few areas.

Detailed results: OpenSearch 2.11 default (lz4) compression vs. OpenSearch 2.11 zstd compression using multiple shards

Setup: 3 data nodes (r5.xlarge) with 32 GB RAM and 16 GB heap. 3 cluster manager nodes (c5.xlarge) with 8 GB RAM and 4 G heap. Index settings: 3 shards and 0 replicas.

Indexing throughput results (docs/sec): The following table provides an indexing throughput comparison of the http_logs and nyc_taxis workloads for OpenSearch 2.11 with the default codec vs. the zstd codec enabled and includes the percentage improvement observed when using zstd.

Workload OS 2.11—default codec (mean_value) OS 2.11—zstd codec (mean_value) % improvement (vs. default codec)
http_logs 209,959.75 220,948 5%
nyc_taxis 118,123.5 127,131 8%

nyc_taxis search workload results: The following table illustrates a benchmark comparison for the nyc_taxis workload for OpenSearch 2.11 with default codec vs. zstd codec enabled, including percentage improvement.

Operations OS 2.11—default codec (p90_value) OS 2.11—zstd codec (p90_value) % improvement (vs. default codec)
autohisto_agg 216.75 208 4%
date_histogram_agg 211.25 205.5 3%
default 8 7.5 6%
distance_amount_agg 5,012 4,980 1%
range 74.5 77.5 -4%

Detailed results: Index data size on disk (bytes) with zstd compression using a single shard

Setup: OpenSearch 2.11.0, single node (r5.xlarge) with 32 GB RAM and 16 GB heap. Index settings: 1 shard and 0 replicas.

Data size of disk results (bytes): The following table illustrates a benchmark comparison of the on-disk data size for the http_logs and pmc workloads for OpenSearch 2.11 with default vs. zstd codec enabled, including percentage improvement.

Default compression (bytes) ZSTD (bytes) ZSTD_NO_DICT (bytes) ZSTD improvement (vs. default codec) ZSTD_NO_DICT improvement (vs. default codec)
http_logs 20,056,770,878.5 15,800,734,037 16,203,187,551 21% 19%
pmc 20,701,211,614.5 15,608,881,718.5 15,822,040,185 25% 24%

Concurrent search improvements (Experimental in 2.11)

OpenSearch users can now achieve better execution speed with concurrent segment search, launched as experimental in OpenSearch 2.11. By default, OpenSearch processes a request sequentially across all the data segments on each shard during the query phase of a search request execution. With concurrent search, every shard-level request can concurrently search across segments during the query phase. Each shard divides its segments into multiple slices, where each slice serves as a unit of work executed in parallel on a separate thread. Therefore, the slice count governs the maximum degree of parallelism for a shard-level request. After all the slices finish their tasks, OpenSearch executes a reduce operation on the slices, merging them to generate the final result for the shard-level request.

The following benchmarking results show the benefits of concurrent search in action. These are the averages of data generated from over 4 days of nightly runs using OpenSearch Benchmark.

Highlights

Here are the key highlights:

  • An increase in performance with aggregate queries on workloads such as the nyc_taxis workload, showcasing an improvement ranging between 50% and 70% over the default configuration.
  • The log analytics use cases for range queries demonstrated an improvement of around 65%.
  • Aggregation queries with hourly data aggregations, such as those for the http_logs workload, demonstrated a boost of up to 50% in performance.
  • Comparable latencies for auto or date histogram queries, with no noteworthy improvement or regression in performance.
  • multi_term_agg, asc_sort_size, dec_sort_size, and scroll queries showed regression. To delve deeper into the intricacies, the concurrent search contributors are proactively addressing this in the upcoming OpenSearch 2.12 GA release.

Detailed results: OpenSearch 2.11 with concurrent search enabled vs. disabled

Setup: OpenSearch 2.11.0 single node (r5.2xlarge) with 64 GB RAM and 32 GB heap. Index settings: 1 shard and 0 replicas.

nyc_taxis workload results: The following table provides a benchmark comparison of the nyc_taxis workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of took time latency measurements for each (p90) and the observed percentage improvements.

Operations CS disabled (p90_value) CS enabled—0-slice (p90_value) CS enabled—4-slice (p90_value) % improvement (with 0 slices) % improvement (with 4 slices)
autohisto_agg 575 295 287 49% 50%
date_histogram_agg 563 292 288 48% 49%
default 6 6 5 0% 17%
distance_amount_agg 15,043 4,691 4744 69% 68%
range 201 73 77 64% 62%

http_logs workload results: The following table provides a benchmark comparison of the http_logs workload for OpenSearch 2.11 with concurrent search disabled and enabled (with 0 slices and with 4 slices). It includes the 90th percentile of latency measurements for each (p90) and the observed percentage improvements.

Operations CS disabled (p90_value) CS enabled—0-slice (p90_value) CS enabled—4-slice (p90_value) % improvement (with 0 slices) % improvement (with 4 slices)
200s-in-range 6 4 4 33% 33%
400s-in-range 2 2 2 0% 0%
asc-sort-timestamp-after-force-merge-1-seg 20 20 22 0% -10%
asc-sort-with-after-timestamp-after-force-merge-1-seg 85 86 86 -1% -1%
asc_sort_size 3 5 5 -67% -67%
asc_sort_timestamp 4 4 4 0% 0%
asc_sort_with_after_timestamp 34 33 34 3% 0%
default 4 4 3 0% 25%
desc-sort-timestamp-after-force-merge-1-seg 64 62 67 3% -5%
desc-sort-with-after-timestamp-after-force-merge-1-seg 67 66 68 1% -1%
desc_sort_size 6 91 9 -1417% -50%
desc_sort_timestamp 26 34 28 -31% -8%
desc_sort_with_after_timestamp 63 61 63 3% 0%
hourly_agg 8180 3832 4034 53% 51%
multi_term_agg 9818 40015 54107 -308% -451%
range 15 12 13 20% 13%
scroll 179 375 212 -109% -18%
term 3 3 3 0% 0%

We would like to take this opportunity to thank the OpenSearch core developers for their contributions to the technical roadmap. We sincerely appreciate all the suggestions from Michael Froh, Andriy Redko, Jonah Kowall, Amitai Stern, Jon Handler, Prabhakar Sithanandam, Mike McCandless, Anandhi Bumstead, Eli Fisher, Carl Meadows, and Mukul Karnik in writing this blog post. Credits to Fanit Kolchina and Nathan Bower for editing and Carlos Canas for creating the graphics.