Skip to main content
search
Technical

Introducing byte vector support for Faiss in the OpenSearch vector engine

The growing popularity of generative AI and large language models (LLMs) has led to an increased demand for efficient vector search and similarity operations. These models often rely on high-dimensional vector representations of text, images, or other data. Performing similarity searches or nearest neighbor queries on these vectors becomes computationally expensive, especially as vector databases grow in size. OpenSearch’s support for Faiss byte vectors offers a promising solution to these challenges.

Using byte vectors instead of float vectors for vector search provides significant improvements in memory efficiency and performance. This is especially beneficial for large-scale vector databases or environments with limited resources. Faiss byte vectors enable you to store quantized embeddings, significantly reducing memory consumption and lowering costs. This approach typically results in only minimal recall loss compared to using full-precision (float) vectors.

How to use a Faiss byte vector

A byte vector is a compact vector representation in which each dimension is a signed 8-bit integer ranging from -128 to 127. To use byte vectors, you must convert your input vectors, typically in float format, into the byte type before ingestion. This process requires quantization techniques, which compress float vectors while maintaining essential data characteristics. For more information, see Quantization techniques.

To use a byte vector, set the data_type parameter to byte when creating a k-NN index (the default value of the data_type parameter is float):

PUT test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 8,
        "data_type": "byte",
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 100,
            "m": 16
          }
        }
      }
    }
  }
} 

During ingestion, make sure that each dimension of the vector is within the supported [-128, 127] range:

PUT test-index/_doc/1
{
"my_vector": [-126, 28, 127, 0, 10, -45, 12, -110]
} 
PUT test-index/_doc/2
{
"my_vector": [100, -25, 4, -67, -2, 127, 99, 0]
} 

During querying, make sure that the query vector is also within the byte range:

GET test-index/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector1": {
        "vector": [-1, 45, -100, 125, -128, -8, 5, 10],
        "k": 2
      }
    }
  }
}

Note: When using byte vectors, expect some loss of recall precision as compared to using float vectors. Byte vectors are useful for large-scale applications and use cases that prioritize reducing memory usage in exchange for a minimal loss in recall.

Benchmarking results

We used OpenSearch Benchmark to run benchmarking tests on popular datasets to compare recall, indexing, and search performance between float vectors and byte vectors using Faiss HNSW.

Note: Without single instruction, multiple data (SIMD) optimization (such as AVX2 or NEON) or when AVX2 is disabled (on x86 architectures), the quantization process introduces additional latency. For more information about AVX2-compatible processors, see CPUs with AVX2. In an AWS environment, all community Amazon Machine Images (AMIs) with HVM support AVX2 optimization.

These tests were conducted on a single-node cluster, except for the cohere-10m dataset, which used two r5.2xlarge instances.

Configuration

The following table lists the cluster configuration for the benchmarking tests.

mef_constructionef_searchReplicasPrimary shardsIndexing clients
161001000816

The following table lists the dataset configuration for the benchmarking tests.

Dataset IDDatasetVector dimensionData sizeNumber of queriesTraining data rangeQuery data rangeSpace type
Dataset 1gist-960-euclidean9601,000,0001,000[0.0, 1.48][0.0, 0.729]L2
Dataset 2cohere-ip-10m76810,000,00010,000[-4.142334, 5.5211477][-4.109505, 5.4809895]innerproduct
Dataset 3cohere-ip-1m7681,000,00010,000[-4.1073565, 5.504557][-4.109505, 5.4809895]innerproduct
Dataset 4sift-128-euclidean1281,000,00010,000[0.0, 218.0][0.0, 184.0]L2

Recall, memory, and indexing results

Dataset IDFaiss HNSW recall@100Faiss HNSW byte recall@100% Reduction in recallFaiss HNSW memory usage (GB)Faiss HNSW byte memory usage (GB)% Reduction in memoryFaiss HNSW mean indexing throughput (docs/sec)Faiss HNSW byte mean indexing throughput (docs/sec)% Gain in indexing throughput
Dataset 10.910.892.203.721.0472.0046739686107.28
Dataset 20.910.838.7930.038.5771.46491110207107.84
Dataset 30.940.868.513.000.8671.3361121167390.98
Dataset 40.990.981.010.620.2658.06382734326713.05

Query results

Dataset IDQuery clientsFaiss HNSW p90 (ms)Faiss HNSW byte p90 (ms)Faiss HNSW p99 (ms)Faiss HNSW byte p99 (ms)
Dataset 115.355.345.955.59
Dataset 186.686.6410.239.14
Dataset 11610.597.3812.9411.47
      
Dataset 217.397.148.357.59
Dataset 2815.4714.8321.3816.20
Dataset 21625.0125.3231.9829.42
      
Dataset 314.974.725.625.02
Dataset 386.755.987.697.7
Dataset 31610.516.9413.8712.4
      
Dataset 412.913.033.163.15
Dataset 483.383.306.304.75
Dataset 4164.353.808.768.83

Key findings

The following are the key findings derived from comparing the benchmarking results:

  • Memory savings: Byte vectors reduced memory usage by up to 72%, with higher-dimensional vectors achieving greater reductions.
  • Indexing performance: The mean indexing throughput for byte vectors was 2x to 107.84% higher than for float vectors, especially with larger vector dimensions.
  • Search performance: Search latencies were similar, with byte vectors occasionally performing better.
  • Recall: For byte vectors, there was a slight (up to 8.8%) reduction in recall as compared to float vectors, depending on the dataset and the quantization technique used.

How does Faiss work with byte vectors internally?

Faiss doesn’t directly support the byte data type for vector storage. To achieve this, OpenSearch uses a QT_8bit_direct_signed scalar quantizer. This quantizer accepts float vectors within the signed 8-bit value range and encodes them as unsigned 8-bit integer vectors. During indexing and search, these encoded unsigned 8-bit integer vectors are decoded back into the original signed 8-bit vectors for distance computation.

This quantization approach reduces the memory footprint by a factor of four. However, encoding and decoding during scalar quantization introduce additional latency. To mitigate this, you can use SIMD optimization with the QT_8bit_direct_signed quantizer to reduce search latencies and improve indexing throughput.

Example

The following example shows how an input vector is encoded and decoded using the QT_8bit_direct_signed scalar quantizer:

// Input vector:
[-126, 28, 127, 0, 10, -45, 12, -110]

// Encoded vector generated by adding 128 to each dimension of the input vector to convert signed int8 to unsigned int8:
[2, 156, 255, 128, 138, 83, 140, 18]

// Encoded vector is decoded back into the original signed int8 vector by subtracting 128 from each dimension for distance computation:
[-126, 28, 127, 0, 10, -45, 12, -110]

Future enhancements

In future versions, we plan to enhance this feature by adding an on_disk mode with a 4x Faiss compression level. This mode will accept fp32 vectors as input, perform online training, and quantize the data into byte-sized vectors, eliminating the need to perform external quantization.

Conclusion

OpenSearch 2.17 introduced support for Faiss byte vectors, allowing you to efficiently store quantized byte vector embeddings. This reduces memory consumption by up to 75%, lowers costs, and maintains high performance. These advantages make byte vectors an excellent choice for large-scale similarity search applications, especially when memory resources are limited, and applications that handle large volumes of data within the signed byte value range.

References

Authors

  • Naveen Tatikonda is a software engineer at AWS working on the OpenSearch Project and Amazon OpenSearch Service. His interests include distributed systems and vector search. He is an active contributor to various plugins like k-NN, GeoSpatial.

    View all posts
  • Navneet Verma is a senior software engineer at AWS working on geospatial and vector search in OpenSearch.

    View all posts
  • Vamshi Vijay Nakkirtha is a software engineering manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems. He is an active contributor to various plugins, like k-NN, GeoSpatial, and dashboard-maps.

    View all posts
  • Dylan Tong leads OpenSearch AI and machine learning product initiatives at AWS.

    View all posts
  • Fanit Kolchina is a senior programmer writer at AWS focusing on OpenSearch.

    View all posts
Close Menu