Understanding vector radial search in OpenSearch

Mon, Jun 24, 2024 · Junqiu Lei, Vamshi Vijay Nakkirtha, Dylan Tong

Before OpenSearch version 2.14, the OpenSearch k-NN plugin only provided top K queries for approximate vector similarity search. OpenSearch 2.14 introduced a new type of vector search—radial search. Radial search is now supported in both k-NN search and neural search.

Radial search enhances the capabilities of the OpenSearch k-NN plugin beyond approximate top K searches. With radial search, you can search for all points in a vector space that are within a specified maximum distance or minimum score threshold from a query point. This provides increased flexibility and utility in search operations.

For instance, if you want to search for documents most similar to a given document, you can run a query to retrieve the top K most similar documents (where K = top N). However, if K is not chosen optimally, you might receive irrelevant results or miss relevant ones. With vector radial search, you can run a query to retrieve all highly similar documents by using a threshold (for example, similarity score > 0.95).

Defining the search radius

You can define the search radius in the following ways:

  • Maximum distance: Specify a physical distance threshold within the vector space, identifying all points that are within this distance from the query point. This approach is particularly useful for applications requiring spatial proximity or absolute distance measurements.

    The following image shows an example of radial search with the L2 space type and max_distance = 25.0. The number within each circle represents the distance of the point from the query target. All points within the max_distance of 25.0 are included in the results.

    Radial search with max distance

  • Minimum score: Define a similarity score threshold, retrieving all points that meet or exceed that score in relation to your query point. This is useful when relative similarity, based on a specific metric, is more critical than physical proximity.

    The following image shows an example of radial search with min_score= 0.90. The number within each circle represents the point’s OpenSearch score relative to the query target. All points whose score is 0.90 or above are included in the results.

    Radial search with min score

Use radial search instead of top K vector search in the following scenarios:

  • Proximity-based filtering: When you need to find all items within a specific distance from a query point. Unlike traditional k-NN searches that return a fixed number of top results, radial search allows you to find all items that fall within a specified distance threshold.
  • Threshold-specific queries: When you need to make sure that only items that meet your criteria are included in the results. Radial search lets you define specific similarity or distance thresholds, which is essential for tasks like anomaly detection and geospatial searches.
  • Dynamic range adjustments: When the acceptable similarity or distance range can vary. Radial search offers more flexible search results.

Supported configurations

Starting in OpenSearch 2.14, you can perform radial search with either the Lucene or Faiss engines using the OpenSearch k-NN plugin. The following table summarizes radial search use cases by engine.

Engine Filter supported Nested field supported Search type
Lucene Yes No Approximate
Faiss Yes Yes Approximate

Spaces

A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. For more information about how distance and score are calculated for various spaces, see Spaces.

The following examples can help you get started with radial search.

Prerequisites

To use a k-NN index with radial search, create a k-NN index by setting index.knn to true. Specify one or more fields of the knn_vector data type:

PUT knn-index-test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "faiss",
            "parameters": {
              "ef_construction": 100,
              "m": 16,
              "ef_search": 100
            }
          }
      }
    }
  }
}

After you create the index, add some data to it:

PUT _bulk?refresh=true
{"index": {"_index": "knn-index-test", "_id": "1"}}
{"my_vector": [7.0, 8.2], "price": 4.4}
{"index": {"_index": "knn-index-test", "_id": "2"}}
{"my_vector": [7.1, 7.4], "price": 14.2}
{"index": {"_index": "knn-index-test", "_id": "3"}}
{"my_vector": [7.3, 8.3], "price": 19.1}
{"index": {"_index": "knn-index-test", "_id": "4"}}
{"my_vector": [6.5, 8.8], "price": 1.2}
{"index": {"_index": "knn-index-test", "_id": "5"}}
{"my_vector": [5.7, 7.9], "price": 16.5}

Radial search with a max distance and a filter

The following example shows a radial search with a max_distance and a filter:

GET knn-index-test/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": [7.1, 8.3],
        "max_distance": 2,
        "filter": {
          "range": {
            "price": {
              "gte": 1,
              "lte": 5
            }
          }
        }
      }
    }
  }
}

All documents that fall within the squared Euclidean distance (L22) of 2 and whose price is within the 1–5 range are returned in the results:

{
    ...
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.98039204,
        "hits": [
            {
                "_index": "knn-index-test",
                "_id": "1",
                "_score": 0.98039204,
                "_source": {
                    "my_vector": [7.0,8.2],
                    "price": 4.4
                }
            },
            {
                "_index": "knn-index-test",
                "_id": "4",
                "_score": 0.62111807,
                "_source": {
                    "my_vector": [6.5,8.8],
                    "price": 1.2
                }
            }
        ]
    }
}

Radial search with a min score and a filter

The following example shows a radial search with a min_score and a response filter:

GET knn-index-test/_search
{
    "query": {
        "knn": {
            "my_vector": {
                "vector": [7.1, 8.3],
                "min_score": 0.95,
                "filter": {
                    "range": {
                        "price": {
                            "gte": 1,
                            "lte": 5
                        }
                    }
                }
            }
        }
    }
}

All documents with a score of 0.9 or higher and whose price is within the 1–5 range are returned in the results:

{
    ...
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.98039204,
        "hits": [
            {
                "_index": "knn-index-test",
                "_id": "1",
                "_score": 0.98039204,
                "_source": {
                    "my_vector": [7.0, 8.2],
                    "price": 4.4
                }
            }
        ]
    }
}

Neural search also supports radial search. Here are some radial search neural query examples.

The following example shows a search with a k value of 100 and a filter that includes a range query and a term query:

GET /my-nlp-index/_search
{
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "Hi world",
        "query_image": "iVBORw0KGgoAAAAN...",
        "k": 100,
        "filter": {
          "bool": {
            "must": [
              {
                "range": {
                  "rating": {
                    "gte": 8,
                    "lte": 10
                  }
                }
              },
              {
                "term": {
                  "parking": "true"
                }
              }
            ]
          }
        }
      }
    }
  }
}

The following example shows a k-NN radial search with a min_score of 0.95 and the same filter as the preceding query:

GET /my-nlp-index/_search
{
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "Hi world",
        "query_image": "iVBORw0KGgoAAAAN...",
        "min_score": 0.95,
        "filter": {
          "bool": {
            "must": [
              {
                "range": {
                  "rating": {
                    "gte": 8,
                    "lte": 10
                  }
                }
              },
              {
                "term": {
                  "parking": "true"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Benchmarks

We ran radial search benchmarks using the following cluster configuration.

OpenSearch version Leader nodes Leader node type Leader node disk space Data nodes Data node type Data node disk space Primary shard count Replica count Availability Zone Test client type
2.14 3 c6g.xlarge 50 GB 3 r6g.2xlarge 100 GB 3 1 us-east-1 m5.2xlarge

Benchmark tool

For benchmarking, we used OpenSearch Benchmark.

Dataset

We used the following dataset for benchmarking.

Name Dimension Doc size Query size Space type
cohere-wikipedia-22-12-en-embeddings 768 1M 10k Inner product

The dataset was updated to include the top K threshold values and true neighbors for the radial threshold. In our benchmarks, top K refers to the number of nearest neighbors considered when setting the minimum score and maximum distance thresholds for radial search. For instance, a top K 100 setting means that we used the 100th closest document to a query point to determine these thresholds. The following table summarizes the threshold configuration.

Threshold name Min score threshold Median num of true neighbors Average num of true neighbors Num of query targets with 0 neighbors
threshold1 161 118 83.1888 4048
threshold2 156 1186 421.8486 1661
threshold3 154 2959 778.0673 941

Algorithm

The algorithm used was configured as follows.

Algorithm name ef construction ef search hnsw m
HNSW 256 256 16

Results

The following table presents the benchmarking results.

Query threshold Engine type Query type Search: 50th percentile service time (ms) Search: 90th percentile service time (ms) Search: 99th percentile service time (ms) Recall
min score threshold1 Faiss min_score 6.24 7.33 11.11 0.99
min score threshold2 Faiss min_score 7.76 13.58 26.83 0.98
min score threshold3 Faiss min_score 6.89 12.13 25.59 0.98
min score threshold1 Lucene min_score 4.42 17.01 50.14 0.85
min score threshold2 Lucene min_score 13.02 54.16 118.65 0.90
min score threshold3 Lucene min_score 22.56 84.5 161.62 0.92

Summary

OpenSearch 2.14 introduced vector radial search, significantly expanding the capabilities of the k-NN plugin. This powerful feature allows searches based on distance or score thresholds, offering greater flexibility for various applications. Supported by both the Lucene and Faiss engines, radial search caters to diverse use cases.

Try the examples in this blog post to improve your searches and boost performance using vector radial search. For more information about vector radial search, see the Radial search documentation.