Skip to main content
search
Technical

Byte-quantized vectors in OpenSearch

Until now, the OpenSearch k-NN plugin has supported the indexing and querying of vectors of type float, with each vector element occupying 4 bytes. This can be expensive in terms of memory and storage, especially for large-scale use cases. Using the new byte vector feature in OpenSearch 2.9, users can reduce memory requirements by a factor of 4 and significantly reduce search latency, with minimal loss in quality (recall).

In this post, we’ll show you how to use byte vectors and how to convert (quantize) 32-bit floats to 8-bit signed integers. We’ll also show you some benchmarking results related to both performance and quality.

Using byte vectors

With byte vectors, each vector element occupies 1 byte (8 bits) and is treated as a signed integer within the range of -128 to 127. This takes advantage of the support for byte vectors in the underlying Lucene engine.

To use a byte vector, set the optional data_type parameter to byte when creating a mapping for an index (the default value of the data_type parameter is float):

PUT test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 3,
        "data_type": "byte",
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "lucene",
          "parameters": {
            "ef_construction": 128,
            "m": 24
          }
        }
      }
    }
  }
}

Ingestion remains unchanged, but each value in the vector must be within the supported byte range [-128, 127]; otherwise, OpenSearch will return an error:

PUT test-index/_doc/1
{
  "my_vector1": [-126, 28, 127]
}

There is no change in k-NN search. Again, the query vector elements must be within the byte range:

GET test-index/_search
{
  "size": 2,
  "query": {
    "knn": {
      "my_vector1": {
        "vector": [26, -120, 99],
        "k": 2
      }
    }
  }
}

Quantizing float values as bytes

For many applications, existing float vector data can be quantized with little loss in quality.

There are many quantization techniques, such as scalar quantization or product quantization (used in the Faiss engine). The choice of quantization technique depends on the type of data and the k-NN distance metric. It can affect the accuracy of recall, so it is a best practice to run experiments with your own data. Some useful quantization method resources include the following:

  • Babak Rokh et al., “A Comprehensive Survey on Model Quantization for Deep Neural Networks in Image Classification”, ACM Transactions on Intelligent Systems Technology, 2023 (in press)
  • B. Jacob et al., “Quantization and Training of Neural Networks for Efficient Integer-Arithmetic-Only Inference,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, Salt Lake City, UT, USA, 2018, pp. 2704-2713, doi: 10.1109/CVPR.2018.00286.

The following section contain Python pseudocode demonstrating a scalar quantization technique suitable for data using Euclidean distance. Euclidean distance is shift invariant; that is, if we shift x and y, then the distance remains the same. Mathematically,

$$\lVert x-y\rVert =\lVert (x-z)-(y-z)\rVert$$

Scalar quantization technique for the L2 space type

import numpy as np
# Random dataset (Example to create a random dataset)
dataset = np.random.uniform(-300, 300, (100, 10))
# Random query set (Example to create a random queryset)
queryset = np.random.uniform(-350, 350, (100, 10))
# Number of values
B = 256

# INDEXING:
# Get min and max
dataset_min = np.min(dataset)
dataset_max = np.max(dataset)
# Shift coordinates to be non-negative
dataset -= dataset_min
# Normalize into [0,1]
dataset *= 1. / (dataset_max - dataset_min)
# Bucket into 256 values
dataset = np.floor(dataset * (B-1)) - int(B / 2)

# QUERYING:
# Clip (if queryset range is out of datset range)
queryset = queryset.clip(dataset_min, dataset_max)
# Shift coordinates to be non-negative
queryset -= dataset_min
# Normalize
queryset *= 1. / (dataset_max - dataset_min)
# Bucket into 256 values
queryset = np.floor(queryset * (B - 1)) - int(B / 2)

Scalar quantization technique for the cosinesimilarity space type

Angular datasets using cosine similarity need a different approach because cosine similarity is not shift invariant.

# For Positive Numbers

# INDEXING and QUERYING:

# Get Max of train dataset
max = np.max(dataset)
min = 0
B = 127

# Normalize into [0,1]
val = (val - min) / (max - min)
val = (val * B)

# Get int and fraction values
int_part = floor(val)
frac_part = val - int_part

if 0.5 < frac_part:
 bval = int_part + 1
else:
 bval = int_part

return Byte(bval)
# For Negative Numbers

# INDEXING and QUERYING:

# Get Min of train dataset
min = 0
max = -np.min(dataset)
B = 128

# Normalize into [0,1]
val = (val - min) / (max - min)
val = (val * B)

# Get int and fraction values
int_part = floor(var)
frac_part = val - int_part

if 0.5 < frac_part:
 bval = int_part + 1
else:
 bval = int_part

return Byte(bval)

These are just two simple examples. You will want to evaluate different techniques with your own data to evaluate recall.

Benchmarking results

We ran benchmarking tests on some popular datasets using our k-NN benchmarking perf-tool to compare the quality and performance of byte vectors against float vectors. All the tests were run on the r5.large instance type, which has 2 CPU cores and 16 GB of memory.

SIFT is a feature extraction method that reduces the image content to a set of points used to detect similar patterns in other images. This algorithm is usually related to computer vision applications, including image matching and object detection. A GIST descriptor is a compact representation of an image, which offers advantages such as dimensionality reduction and compatibility with machine learning techniques, making it a valuable choice for approximate nearest neighbor (ANN) tasks in computer vision.

The MNIST dataset is widely used in the field of machine learning and computer vision; it consists of a collection of handwritten digits from 0 to 9 represented as grayscale images of size 28×28 pixels and is a subset of a larger dataset collected by NIST. The GloVe dataset consists of pretrained word vectors generated using the GloVe algorithm. GloVe is an unsupervised learning algorithm used to learn dense vector representations or embeddings for words in a large corpus of text.

Recall and storage results

  • Recall@100 – This is the ratio of the top 100 results that were ground truth nearest neighbors. This metric helps you to understand whether you are using a good quantization technique for the dataset.
DatasetDimension of vectorData sizeNumber of queriesTraining data rangeQuery data rangeSpace typeRecall@100Index size (GB)% change in storage
floatbytefloatbyte
gist-960-euclidean9601,000,0001,000[ 0.0, 1.48 ][ 0.0, 0.729 ]L20.990.9616.54.275%
sift-128-euclidean1281,000,00010,000[ 0.0, 218.0 ][ 0.0, 184.0 ]L20.990.991.30.6352%
mnist-784-euclidean78460,00010,000[ 0.0, 255.0 ][ 0.0, 255.0 ]L20.990.990.380.1268%
glove-50-angular501,183,51410,000[ -6.97, 6.51 ][ -6.25, 5.85 ]cosine0.990.941.30.3573%
glove-100-angular1001,183,51410,000[ -6.53, 5.96 ][ -6.15, 4.22 ]cosine0.990.922.60.6276%
glove-200-angular2001,183,51410,000[ -6.80, 4.61 ][ -6.64, 2.72 ]cosine0.990.775.11.178%

Indexing and querying results

  • Indexing throughput (docs/s) – The number of documents per second that can be ingested.
Document_Cnt / (total_index_time_s + total_refresh_time_s)
Document_Cnt/ (( ingest_took_total)_s + (refresh_index_took_total)_s)
  • Query time (ms) – Time per query, summarized as mean, p50, p90, p99.
DatasetDimension of vectorSpace typeIndexing throughput (docs/sec)Query time p50 (ms)% change in p50 latencyQuery time p90 (ms)% change in p90 latencyQuery time p99 (ms)% change in p99 latency
floatbytefloatbytefloatbytefloatbyte
gist-960-euclidean960L226963971157120%148362558%204769766%
sift-128-euclidean128L2242221641126740%1257540%1509636%
mnist-784-euclidean784L2845935976434%1056934%1167832%
glove-50-angular50cosine139115661588944%1739644%20511942%
glove-100-angular100cosine1014107425214343%29115447%32417945%
glove-200-angular200cosine64068459726056%64427957%72631657%

RSS results

  • Peak RSS – The peak of the resident set size obtained by the aggregated sum of RSSAnon (size of resident anonymous memory) and RSSFile (size of resident file mappings).
DatasetDimension of vectorSpace typePeak RSS (GB)% change in memory
floatbyte
gist-960-euclidean960L26.714.4733%
sift-128-euclidean128L22.191.8615%
mnist-784-euclidean784L21.551.428%
glove-50-angular50cosine2.441.6533%
glove-100-angular100cosine3.471.8647%
glove-200-angular200cosine5.62.2759%

RSS comparison graphs (in KB)

Analysis

Comparing the benchmarking results, you can see that:

  • Using byte vectors rather than 32-bit floats results in a significant reduction in storage and memory usage while also improving indexing throughput and reducing query latency. Precision and recall were not greatly affected, although this will depend on the quantization technique and characteristics of your data.
  • Storage usage was reduced by up to 78%, and RAM usage was reduced by up to 59% (for the glove-200-angular dataset).
  • Recall values for angular datasets were lower than those of Euclidean datasets. We expect that better quantization techniques would improve recall.

References

  • Hervé Jégou, Matthijs Douze, Cordelia Schmid. Product Quantization for Nearest Neighbor Search. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2011, 33 (1), pp.117-128. 10.1109/TPAMI.2010.57 . inria-00514462v2
  • Y. LeCun, L. Bottou, Y. Bengio, and P. Haffner. “Gradient-based learning applied to document recognition.” Proceedings of the IEEE, 86(11):2278-2324, November 1998
  • Jeffrey Pennington, Richard Socher, and Christopher D. Manning. 2014. GloVe: Global Vectors for Word Representation

Authors

  • Naveen Tatikonda is a software engineer at AWS working on the OpenSearch Project and Amazon OpenSearch Service. His interests include distributed systems and vector search. He is an active contributor to various plugins like k-NN, GeoSpatial.

    View all posts
  • Vamshi Vijay Nakkirtha is a software engineering manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems. He is an active contributor to various plugins, like k-NN, GeoSpatial, and dashboard-maps.

    View all posts
  • Stavros Macrakis is the senior technical product manager for OpenSearch focusing on document and e-commerce search. He has worked on search for almost 20 years and is passionate about search relevance.

    View all posts
  • Tal Wagner is a senior applied scientist at AWS working on the OpenSearch Project and Amazon OpenSearch Service. He obtained his PhD in Computer Science at CSAIL, MIT in 2020. His interests include designing algorithms for massive datasets and large-scale machine learning.

    View all posts
Close Menu