Reducing hybrid query latency in OpenSearch 3.1 with efficient score collection

Hybrid queries combine the precision of traditional lexical search with the semantic power of vector search. In OpenSearch 3.1, we delivered significant latency reductions for hybrid queries by redesigning how scores are collected across subqueries. This improvement builds on the concurrent segment search capabilities already present in OpenSearch and introduces a low-level, efficient bulk scorer tailored for hybrid workloads.

Background: Bottleneck in hybrid scoring

Prior to OpenSearch 3.1, score collection worked as follows:

Each query branch (for example, a match clause and a neural clause) scored documents separately.
Score collection was performed in a serialized manner, even though the OpenSearch core already supported concurrent segment-level search.
Subquery scorers introduced additional overhead because of redundant control logic and per-document operations.

The new score collection method enhances performance by using optimized parallel execution and efficient score aggregation across query branches.

Optimization: Smarter score collection and batch processing

In OpenSearch 3.1, we introduced a custom bulk scorer for hybrid queries. This scorer is designed to work with OpenSearch’s concurrent segment search, in which different segments of an index are searched in parallel by design. Our focus was on minimizing overhead within each segment’s execution.

To achieve this goal, we introduced the following key improvements:

Single leaf collector per segment: Subquery scorers now share one leaf collector instance, eliminating redundant score collection infrastructure and improving cache locality.
Small batch processing: The scorer processes documents in small, fixed-size windows (for example, 4,096 documents at a time), allowing strict loops with reduced memory overhead.
Bit set merging: Document IDs across subquery scorers are tracked using efficient bit sets, which reduce memory allocations and simplify score merging.
Low-level control flow: We replaced priority queues and dynamic collections with arrays and preallocated structures to minimize garbage collection pressure and avoid runtime branching.

The following diagram shows the high-level architecture of the hybrid query execution path before and after introducing the custom bulk scorer.

Two diagrams showing a hybrid query execution path before and after introducing a custom bulk scorer.

This design allows hybrid queries to use the natural parallelism of OpenSearch’s indexing structure (segments) without introducing thread-level concurrency within scorers, which often adds more overhead than benefit.

Implementation details

Our custom bulk scorer operates on the principle of coordinated iteration across multiple subquery scorers. Here’s how it works:

Document ID discovery: For each segment, we first identify candidate document IDs that match at least one subquery.
Batch processing: Rather than processing documents one at a time, we collect them in fixed-size batches. We chose a batch size of 2^12 = 4096, which provides an optimal balance between frequency of score collection and memory usage.
Coordinated scoring: For each batch, we score all matching documents across all subqueries in a tightly optimized loop, maintaining better cache locality.
Score combination: Within the same iteration, scores from different subqueries are combined according to the hybrid query’s weight settings.

This approach minimizes the overhead of document iteration while maximizing the benefits of OpenSearch’s segment-level concurrency model.

Real-world results: Up to 3.5x throughput and 80% latency reductions

We evaluated the improved hybrid scorer against OpenSearch 3.0 using diverse workloads on multiple Amazon Elastic Compute Cloud (Amazon EC2) instance types. The following table shows improvements in response latency time percentiles.

Dataset	Query type	Metric	OpenSearch 3.1 (ms)	OpenSearch 3.0 (ms)	Latency reduction
NOAA (text only)	1 keyword query, 1M matches	p50	117	178	34%
		p90	128	193	34%
		p99	137	208	34%
	1 keyword query, 10M matches	p50	183	703	74%
		p90	205	958	79%
		p99	220	1260	83%
	3 keyword queries, 15M matches	p50	336	1759	81%
		p90	382	2518	85%
		p99	523	3243	84%
Quora (hybrid)	Neural + lexical	p50	199	256	22%
		p90	240	332	28%
		p99	286	414	31%
	Neural + 2 lexical	p50	248	337	27%
		p90	316	481	34%
		p99	387	603	36%
	Neural + 2 lexical + aggregations	p50	262	426	39%
		p90	339	631	46%
		p99	407	799	49%

These gains were consistent under both benchmark and steady-load conditions (25 queries per second), with even greater performance improvements when concurrent segment search was enabled.

Tail latencies (p99) improved by up to 49% even on large instances, making hybrid search more reliable under high load.

The following graph presents the improvements in response times for different datasets and query types.

A graph representing the improvements in response times for different datasets and query types.

The new bulk scorer positively impacts system throughput. The following table shows improvements in search throughput depending on the type of machine used as a data node.

Dataset	Instance type	Query complexity	Throughput gain
Combined vector and text data	R5.xlarge	Neural + 2 lexical + aggregations	83%
	R5.xlarge	Neural + 2 lexical	56%
	C5.2xlarge	Neural + 2 lexical + aggregations	18%
	R5.4xlarge	Hybrid query overall	19–27%
Only text data	C5.2xlarge	1 keyword query, 10M matches	115%
	C5.2xlarge	3 keyword queries, 15M matches	254%
	R5.xlarge	1 keyword query, 10M matches	272%
	R5.xlarge	3 keyword queries, 15M matches	384%
	R5.4xlarge	1 keyword query, 10M matches	63%
	R5.4xlarge	3 keyword queries, 15M matches	151%

The following graph illustrates these throughput improvements.

A graph representing throughput improvements.

Scaling behavior: Better utilization and more predictable performance

The improved scorer not only speeds up individual queries but also scales more gracefully with increased hardware. On larger instances, such as R5.4xlarge, we observed the following performance improvements:

Throughput for hybrid queries with multiple keyword-based subqueries increased by 234% in OpenSearch 3.1 compared to 3.0 when concurrent segment search was enabled.
Hybrid and complex hybrid queries showed up to a 27% improvement in throughput.
p99 latency was reduced by over 60% for most complex cases.

These results suggest that the custom scorer improves core efficiency rather than simply consuming more hardware.

How we ran benchmarks

We used diverse datasets and related OpenSearch Benchmark workloads to evaluate the performance of the new solution with the custom bulk scorer:

AI search workload: Based on the Quora dataset, includes semantic and hybrid queries with both text and vector data.
Semantic search workload: Based on the NOAA dataset, includes hybrid queries over lexical and numeric fields.

We set up the OpenSearch cluster using the OpenSearch Cluster CDK with the following parameters:

3 data nodes using various EC2 instance types (C5.2xlarge, R5.xlarge, R5.4xlarge) running x64 architecture.
3 manager nodes (c5.xlarge EC2 instances).
All cluster nodes were deployed in a single AWS Availability Zone to minimize network latency effects.

We took several steps to ensure consistent and reliable benchmark results:

All index segments were force-merged before running the tests to eliminate background merge operations.
We verified that no other concurrent workloads were running on the cluster during testing.

We ingested documents into an index configured as follows:

number_of_shards: 6
number_of_segments: 8 (per shard; total = 48)

Looking ahead

These improvements lay the groundwork for further innovation in hybrid search, including:

Adaptive batch sizing based on query complexity and segment size.
Collecting scores in parallel within an OpenSearch segment for each subquery.

If you use hybrid queries in production or are building search features that blend lexical and vector semantics, upgrading to OpenSearch 3.1 is a great way to benefit from faster response times and more efficient resource usage.

References

Authors

Martin Gaievski

Martin Gaievski is a software engineer at AWS working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include machine learning, vector search, and semantic search. Outside of work, he enjoys running and listening to music.

View all posts
Stavros Macrakis

Stavros Macrakis is the senior technical product manager for OpenSearch focusing on document and e-commerce search. He has worked on search for almost 20 years and is passionate about search relevance.

View all posts
Sean Zheng

Sean Zheng is an engineering manager at Amazon Web Services working on OpenSearch, with a focus on machine-learning-based plugins, including Anomaly Detection, k-NN, and ML Commons.

View all posts

Reducing hybrid query latency in OpenSearch 3.1 with efficient score collection

Background: Bottleneck in hybrid scoring

Optimization: Smarter score collection and batch processing

Implementation details

Real-world results: Up to 3.5x throughput and 80% latency reductions

Scaling behavior: Better utilization and more predictable performance

How we ran benchmarks

Looking ahead

References

Authors

OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

Participate

Providers

Resources

Reducing hybrid query latency in OpenSearch 3.1 with efficient score collection

Background: Bottleneck in hybrid scoring

Optimization: Smarter score collection and batch processing

Implementation details

Real-world results: Up to 3.5x throughput and 80% latency reductions

Scaling behavior: Better utilization and more predictable performance

How we ran benchmarks

Looking ahead

References

Share or Summarize with AI

Authors

Participate

Providers

Resources