Watch as DataStax details their experience with OpenSearch, demonstrating how they built billion-scale vector indexes saving time with OpenSearch.
Executive Summary
Customer: DataStax an IBM Company
Industry: Data Platforms / Vector Search
Deployment Scale: Java‑based embedding pipelines, large vector collections, real‑world retrieval use cases
Platform: Integrated with OpenSearch via plugin, JVector, hybrid indices, graph and quantization techniques
DataStax wanted to build high-performance, production-ready vector search that could scale to billions of vectors without sacrificing recall or breaking budgets. The team created JVector, a Java-based vector search library.
They used OpenSearch as the foundation for running and testing JVector through a custom plugin that enabled dense vector retrieval. The combination allowed the team to innovate quickly, benchmark performance at scale, and optimize for real-world use, not just benchmark results.
Key outcomes:
- Achieved lower query latency and less disk I/O through inline vector storage.
- Improved index build speed using concurrent graph construction.
- Reduced infrastructure costs by avoiding high-RAM or GPU-only setups.
- Preserved search recall by tuning quantization and reranking carefully.
“We created JVector to bring dense vector search to Cassandra and other Java based projects—but OpenSearch allows us to go beyond simple vector search into a holistic end to end search and RAG solution while keeping the advantages of having a pure Java plugin, JVM profiling, full control over indexing—it just fit.”
— Samuel Herman, Technical Lead at DataStax and member of the OpenSearch Technical Steering Committee
The Challenge
DataStax needed a vector search implementation that wasn’t just academically good, but production‑ready. Vector search has become critical for modern AI and retrieval applications, but production systems face major trade-offs.
DataStax needed a system that could:
- Deliver high recall and fast results for dense vector data.
- Scale to hundreds of millions or billions of vectors.
- Operate within normal compute and memory limits.
- Integrate smoothly with existing Java-based systems.
Off-the-shelf vector libraries often perform well on benchmarks but fail in production due to unpredictable vector distributions, slow indexing, or recall loss. Brute-force search was too slow, and existing ANN (approximate nearest neighbor) indices required heavy memory use or precision sacrifices.
The Solution
The DataStax team focused on solving core operational bottlenecks through engineering and measurement rather than configuration alone.
Inline vectors
They embedded full-precision vectors directly within the graph index instead of storing them separately. This reduces disk seeks and system calls, making queries faster even when data already fit in memory.
Selective quantization
Quantization compressed vectors for smaller storage but reduced recall. Tests showed recall loss up to 90% in extreme cases.
The team used quantization only where it added value:
- For index construction on low-memory hardware.
- Avoided it during query execution, where accuracy mattered most.
Concurrent graph construction
Traditional graph-based indices like HNSW insert one node at a time, which slows down indexing. The team created a snapshot-based coordination method that allowed parallel insertion without locking. This improved build times and used CPU cores efficiently, no GPU required.
Performance-aware tuning
They tuned threads to match CPU SIMD capabilities, avoided unnecessary context switching, and benchmarked everything: recall, latency, disk I/O, and build time. Each configuration was tested with real data distributions, not synthetic benchmarks.
Operating at Scale
In real use, the team saw that vector dimensionality is increasing (e.g. embeddings of 128 dimensions to sometimes up to 3,000), and quantity of embeddings is growing as well (text, vision, sound, multimodal). They found that factors such as how vectors are distributed matter a lot for recall, especially with approximate nearest neighbour (ANN) indices. When data sets are large, recall can slip significantly unless index techniques are tuned. They explored trade‑offs: full precision vs quantization, more RAM vs more disk access, faster index building vs longer build times, etc. They benchmarked JVector against Lucene to observe system call overheads, buffer cache effects, and latency. They also optimized thread reuse and reduced context switching to minimize overhead.
Results
DataStax delivered a vector search solution that was faster, leaner, and more reliable in production.
- The system achieved faster distance calculation when using quantization or inline vectors vs relying purely on full precision, especially when RAM is limited or when disk IO would otherwise dominate.
- They observed that compression techniques (product quantization) improved IO and latency in many cases, although at some cost to recall (which needed reranking).
- Hybrid graph layer modifications helped in “rogue” data sets to maintain better recall without sacrificing too much performance.
- Their concurrency tweaks in graph construction improved index build times and made the addition of nodes less painful.
- Overall, the setup allowed them to reason clearly about trade‑offs and make informed decisions rather than trusting vendor claims or published benchmarks alone.
“The beauty in OpenSearch is that it also allows us to easily integrate the jVector engine with user facing metrics to monitor our engine operations. Those helped our team dramatically to improve our performance in a data driven manner” — Samuel Herman, Technical Lead at DataStax and member of the OpenSearch Technical Steering Committee
Why It Matters
By using OpenSearch and JVector together, DataStax created a practical, production-ready approach to vector search, one that balances speed, precision, and cost. Their work shows that with the right architecture and a focus on measurable performance, vector search can move from research into reliable, real-world deployment.
For organizations evaluating vector search solutions, being able to scrutinize metrics, benchmarks, and trade‑offs is essential.
Watch the session:
See Samuel Herman’s full talk on YouTube: Vector Search Beyond the Hype for performance insights, implementation details, and code‑level examples.
For more on the JVector plugin and OpenSearch vector search capabilities explore the JVector project on GitHub.
