Skip to main content
search

Neural sparse search is a powerful and efficient method for semantic retrieval in OpenSearch. It encodes text into (token, weight) entries, allowing OpenSearch to index and search efficiently using Lucene’s inverted index. Since its introduction in OpenSearch 2.11 and the improvements brought by the v2 series, neural sparse search has delivered strong search relevance alongside the efficiency benefits of inference‑free retrieval. Today, we’re excited to share two major advancements:

  • The v3 series of neural sparse models – Our most accurate sparse retrieval models to date, delivering substantial gains in search relevance while maintaining lightweight, inference‑free efficiency.
  • A new multilingual retrieval model – The first multilingual neural sparse retrieval model in OpenSearch.

Neural sparse search v3 models: Advancing search relevance

We are excited to announce the release of our v3 neural sparse models:

  • v3-distill: Building on the success of the v2-distill model, v3-distill delivers higher search relevance (0.517 NDCG@10 vs. 0.504 for v2-distill) through improved training while retaining its lightweight architecture for fast ingestion and low memory usage.
  • v3-gte: Our most accurate v3 model, offering the best search relevance across all benchmarks (0.546 NDCG@10 vs. 0.517 for v3-distill) while maintaining the high efficiency and low-latency performance of doc-only sparse retrieval.

The v3 models are now available in both OpenSearch and Hugging Face.

All v3 models achieve better search relevance than their v2 counterparts. The following table compares search relevance across model generations.

Modelv1v2-distilldoc-v1doc-v2-distilldoc-v2-minidoc-v3-distilldoc-v3-gte
Inference-free✔️✔️✔️✔️✔️✔️✔️
Model Parameters133M67M133M67M23M67M133M
AVG NDCG@100.5240.5280.490.5040.4970.5170.546
AVG FLOPS11.48.32.31.81.71.81.7

From v2 to v3 series models

The transition from v2 to v3 models in OpenSearch represents a significant leap forward in neural sparse search relevance while maintaining the hallmark efficiency of the v2 series.

Limitations of v2 models

The v2 series models made neural sparse search widely accessible by significantly reducing the number of model parameters and improving ingestion throughput while maintaining nearly the same search relevance. However, as search workloads and datasets grew in complexity, certain challenges emerged:

  • Relevance bottleneck: While v2 models delivered strong efficiency and solid performance, their inference-free design still trailed behind well-trained Siamese dense or sparse retrievers in retrieval quality.
  • Limited teacher guidance: v2 models relied primarily on heterogeneous bi-encoder teachers for distillation, without using the richer ranking signals from stronger models, such as large language models (LLMs).

These limitations motivated us to rethink both training strategies and model architecture for the next-generation models.

Advancements in v3 models

For the v3 series, our primary goal was to push search relevance to a new level while retaining the lightweight and low-latency characteristics of v2 models. Key advancements include the following:

  • v3-distill: Builds on v2-distill by incorporating ℓ0-based sparsification techniques and training on a larger and more diverse dataset. This combination improves search relevance while maintaining the same lightweight architecture for fast ingestion and low memory usage.
  • v3-gte: Replaces the v3-distill backbone with a General Text Embedding (GTE) architecture, providing stronger semantic representation and support for 8192-token context windows. This model employs LLM teacher models, capturing richer semantic nuances and setting a new benchmark for sparse retrieval relevance in OpenSearch.

The technology behind v3 models

Two core techniques drive the technology improvements in v3 models: ℓ0-based sparsification for efficient document representation and GTE architecture with LLM teachers for enhanced training quality.

With these advancements, the v3 series delivers substantial improvements in search relevance while preserving the hallmark speed, efficiency, and inference-free advantages of previous generations. This ensures that you can achieve state-of-the-art retrieval performance without compromising scalability or latency.

ℓ0-based sparsification

The ℓ0-based approach selectively sparsifies document-side representations to balance efficiency and ranking quality:

  • ℓ0 mask loss: Regularizes only document vectors exceeding the desired sparsity threshold.
  • ℓ0 approximation activation: Provides a differentiable approximation for ℓ0, enabling precise sparsity control during training.

Combined with expanded training data, this enables v3-distill to achieve higher relevance without sacrificing efficiency.

GTE architecture with LLM teachers

The GTE architecture strengthens semantic representation and handles much longer inputs, while LLM-based teacher signals offer richer ranking guidance. This combination allows v3-gte to deliver the highest relevance scores among all OpenSearch sparse retrievers.

Search relevance benchmarks

Similarly to the tests described in our previous blog post, we evaluated the search relevance of the models on the BEIR benchmark. The search relevance results are shown in the following table. All v3 series models outperform their v2 and v1 counterparts, with v3‑gte achieving the highest relevance scores across all tests and setting a new record for OpenSearch neural sparse retrieval models.

ModelAverageTrec-CovidNFCorpusNQHotpotQAFiQAArguAnaToucheDBPediaSciDocsFEVERClimateFEVERSciFactQuora
v10.5240.7710.360.5530.6970.3760.5080.2780.4470.1640.8210.2630.7230.856
v2-distill0.5280.7750.3470.5610.6850.3740.5510.2780.4350.1730.8490.2490.7220.863
doc-v10.490.7070.3520.5210.6770.3440.4610.2940.4120.1540.7430.2020.7160.788
doc-v2-distill0.5040.690.3430.5280.6750.3570.4960.2870.4180.1660.8180.2240.7150.841
doc-v2-mini0.4970.7090.3360.510.6660.3380.480.2850.4070.1640.8120.2160.6990.837
doc-v3-distill0.5170.7240.3450.5440.6940.3560.520.2940.4240.1630.8450.2390.7080.863
doc-v3-gte0.5460.7340.360.5820.7160.4070.520.3890.4550.1670.860.3120.7250.873

Multilingual sparse retrieval

We are also excited to announce multilingual-v1, the first multilingual neural sparse retrieval model in OpenSearch. Using the same proven training techniques as the English-language v2 series, multilingual-v1 brings high‑quality sparse retrieval to a wide range of languages, achieving strong relevance across multilingual benchmarks while maintaining the same efficiency as our English-language models.

The following table shows the detailed search relevance evaluation of multilingual-v1 across different languages, compared to BM25. Results are reported using the MIRACL benchmark. multilingual-v1 delivers substantial improvements over BM25 in all languages, demonstrating the effectiveness of applying our neural sparse retrieval techniques beyond the English language. The table also presents results for a pruned version of multilingual-v1 (using a prune ratio of 0.1), which maintains competitive relevance while reducing index size.

ModelAveragebnteesfridhiruarzhfajafiswkoen
BM250.3050.4820.3830.0770.1150.2970.350.2560.3950.1750.2870.3120.4580.3510.3710.267
multilingual-v10.6290.670.740.5420.5580.5820.4860.6580.740.5620.5140.6690.7670.7680.6070.575
multilingual-v1; prune_ratio 0.10.6260.6670.740.5370.5550.5760.4810.6550.7370.5580.5110.6640.7610.7660.6040.572

Further reading

For more information about neural sparse search, see these previous blog posts:

Next steps

Try our newest v3 neural sparse models in your OpenSearch cluster and share your experience with us on the OpenSearch forum. Your feedback helps us improve future versions.

Authors

  • Yiwen Wang is a machine learning engineer intern with the OpenSearch Project. She works on improving search relevance and efficiency.

    View all posts
  • Zhichao Geng is a machine learning engineer with the OpenSearch Project. His interests include improving search relevance using machine learning.

    View all posts
  • Charlie Yang is an AWS engineering manager with the OpenSearch Project. He focuses on machine learning, search relevance, and performance optimization.

    View all posts