Skip to main content
search
Technical

Understanding vector radial search in OpenSearch

By , , June 24, 2024March 26th, 2025No Comments

Before OpenSearch version 2.14, the OpenSearch k-NN plugin only provided top K queries for approximate vector similarity search. OpenSearch 2.14 introduced a new type of vector search—radial search. Radial search is now supported in both k-NN search and neural search.

Radial search enhances the capabilities of the OpenSearch k-NN plugin beyond approximate top K searches. With radial search, you can search for all points in a vector space that are within a specified maximum distance or minimum score threshold from a query point. This provides increased flexibility and utility in search operations.

For instance, if you want to search for documents most similar to a given document, you can run a query to retrieve the top K most similar documents (where K = top N). However, if K is not chosen optimally, you might receive irrelevant results or miss relevant ones. With vector radial search, you can run a query to retrieve all highly similar documents by using a threshold (for example, similarity score > 0.95).

Defining the search radius

You can define the search radius in the following ways:

  • Maximum distance: Specify a physical distance threshold within the vector space, identifying all points that are within this distance from the query point. This approach is particularly useful for applications requiring spatial proximity or absolute distance measurements.The following image shows an example of radial search with the L2 space type and max_distance = 25.0. The number within each circle represents the distance of the point from the query target. All points within the max_distance of 25.0 are included in the results.Radial search with max distance
  • Minimum score: Define a similarity score threshold, retrieving all points that meet or exceed that score in relation to your query point. This is useful when relative similarity, based on a specific metric, is more critical than physical proximity.The following image shows an example of radial search with min_score= 0.90. The number within each circle represents the point’s OpenSearch score relative to the query target. All points whose score is 0.90 or above are included in the results.Radial search with min score

Use radial search instead of top K vector search in the following scenarios:

  • Proximity-based filtering: When you need to find all items within a specific distance from a query point. Unlike traditional k-NN searches that return a fixed number of top results, radial search allows you to find all items that fall within a specified distance threshold.
  • Threshold-specific queries: When you need to make sure that only items that meet your criteria are included in the results. Radial search lets you define specific similarity or distance thresholds, which is essential for tasks like anomaly detection and geospatial searches.
  • Dynamic range adjustments: When the acceptable similarity or distance range can vary. Radial search offers more flexible search results.

Supported configurations

Starting in OpenSearch 2.14, you can perform radial search with either the Lucene or Faiss engines using the OpenSearch k-NN plugin. The following table summarizes radial search use cases by engine.

EngineFilter supportedNested field supportedSearch type
LuceneYesNoApproximate
FaissYesYesApproximate

Spaces

A space corresponds to the function used to measure the distance between two points in order to determine the k-nearest neighbors. For more information about how distance and score are calculated for various spaces, see Spaces.

The following examples can help you get started with radial search.

Prerequisites

To use a k-NN index with radial search, create a k-NN index by setting index.knn to true. Specify one or more fields of the knn_vector data type:

PUT knn-index-test
{
  "settings": {
    "number_of_shards": 1,
    "number_of_replicas": 1,
    "index.knn": true
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 2,
        "method": {
            "name": "hnsw",
            "space_type": "l2",
            "engine": "faiss",
            "parameters": {
              "ef_construction": 100,
              "m": 16,
              "ef_search": 100
            }
          }
      }
    }
  }
}

After you create the index, add some data to it:

PUT _bulk?refresh=true
{"index": {"_index": "knn-index-test", "_id": "1"}}
{"my_vector": [7.0, 8.2], "price": 4.4}
{"index": {"_index": "knn-index-test", "_id": "2"}}
{"my_vector": [7.1, 7.4], "price": 14.2}
{"index": {"_index": "knn-index-test", "_id": "3"}}
{"my_vector": [7.3, 8.3], "price": 19.1}
{"index": {"_index": "knn-index-test", "_id": "4"}}
{"my_vector": [6.5, 8.8], "price": 1.2}
{"index": {"_index": "knn-index-test", "_id": "5"}}
{"my_vector": [5.7, 7.9], "price": 16.5}

Radial search with a max distance and a filter

The following example shows a radial search with a max_distance and a filter:

GET knn-index-test/_search
{
  "query": {
    "knn": {
      "my_vector": {
        "vector": [7.1, 8.3],
        "max_distance": 2,
        "filter": {
          "range": {
            "price": {
              "gte": 1,
              "lte": 5
            }
          }
        }
      }
    }
  }
}

All documents that fall within the squared Euclidean distance (L22) of 2 and whose price is within the 1–5 range are returned in the results:

{
    ...
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.98039204,
        "hits": [
            {
                "_index": "knn-index-test",
                "_id": "1",
                "_score": 0.98039204,
                "_source": {
                    "my_vector": [7.0,8.2],
                    "price": 4.4
                }
            },
            {
                "_index": "knn-index-test",
                "_id": "4",
                "_score": 0.62111807,
                "_source": {
                    "my_vector": [6.5,8.8],
                    "price": 1.2
                }
            }
        ]
    }
}

Radial search with a min score and a filter

The following example shows a radial search with a min_score and a response filter:

GET knn-index-test/_search
{
    "query": {
        "knn": {
            "my_vector": {
                "vector": [7.1, 8.3],
                "min_score": 0.95,
                "filter": {
                    "range": {
                        "price": {
                            "gte": 1,
                            "lte": 5
                        }
                    }
                }
            }
        }
    }
}

All documents with a score of 0.9 or higher and whose price is within the 1–5 range are returned in the results:

{
    ...
    "hits": {
        "total": {
            "value": 1,
            "relation": "eq"
        },
        "max_score": 0.98039204,
        "hits": [
            {
                "_index": "knn-index-test",
                "_id": "1",
                "_score": 0.98039204,
                "_source": {
                    "my_vector": [7.0, 8.2],
                    "price": 4.4
                }
            }
        ]
    }
}

Neural search also supports radial search. Here are some radial search neural query examples.

The following example shows a search with a k value of 100 and a filter that includes a range query and a term query:

GET /my-nlp-index/_search
{
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "Hi world",
        "query_image": "iVBORw0KGgoAAAAN...",
        "k": 100,
        "filter": {
          "bool": {
            "must": [
              {
                "range": {
                  "rating": {
                    "gte": 8,
                    "lte": 10
                  }
                }
              },
              {
                "term": {
                  "parking": "true"
                }
              }
            ]
          }
        }
      }
    }
  }
}

The following example shows a k-NN radial search with a min_score of 0.95 and the same filter as the preceding query:

GET /my-nlp-index/_search
{
  "query": {
    "neural": {
      "passage_embedding": {
        "query_text": "Hi world",
        "query_image": "iVBORw0KGgoAAAAN...",
        "min_score": 0.95,
        "filter": {
          "bool": {
            "must": [
              {
                "range": {
                  "rating": {
                    "gte": 8,
                    "lte": 10
                  }
                }
              },
              {
                "term": {
                  "parking": "true"
                }
              }
            ]
          }
        }
      }
    }
  }
}

Benchmarks

We ran radial search benchmarks using the following cluster configuration.

OpenSearch versionLeader nodesLeader node typeLeader node disk spaceData nodesData node typeData node disk spacePrimary shard countReplica countAvailability ZoneTest client type
2.143c6g.xlarge50 GB3r6g.2xlarge100 GB31us-east-1m5.2xlarge

Benchmark tool

For benchmarking, we used OpenSearch Benchmark.

Dataset

We used the following dataset for benchmarking.

NameDimensionDoc sizeQuery sizeSpace type
cohere-wikipedia-22-12-en-embeddings7681M10kInner product

The dataset was updated to include the top K threshold values and true neighbors for the radial threshold. In our benchmarks, top K refers to the number of nearest neighbors considered when setting the minimum score and maximum distance thresholds for radial search. For instance, a top K 100 setting means that we used the 100th closest document to a query point to determine these thresholds. The following table summarizes the threshold configuration.

Threshold nameMin score thresholdMedian num of true neighborsAverage num of true neighborsNum of query targets with 0 neighbors
threshold116111883.18884048
threshold21561186421.84861661
threshold31542959778.0673941

Algorithm

The algorithm used was configured as follows.

Algorithm nameef constructionef searchhnsw m
HNSW25625616

Results

The following table presents the benchmarking results.

Query thresholdEngine typeQuery typeSearch: 50th percentile service time (ms)Search: 90th percentile service time (ms)Search: 99th percentile service time (ms)Recall
min score threshold1Faissmin_score6.247.3311.110.99
min score threshold2Faissmin_score7.7613.5826.830.98
min score threshold3Faissmin_score6.8912.1325.590.98
min score threshold1Lucenemin_score4.4217.0150.140.85
min score threshold2Lucenemin_score13.0254.16118.650.90
min score threshold3Lucenemin_score22.5684.5161.620.92

Summary

OpenSearch 2.14 introduced vector radial search, significantly expanding the capabilities of the k-NN plugin. This powerful feature allows searches based on distance or score thresholds, offering greater flexibility for various applications. Supported by both the Lucene and Faiss engines, radial search caters to diverse use cases.

Try the examples in this blog post to improve your searches and boost performance using vector radial search. For more information about vector radial search, see the Radial search documentation.

Authors

  • Junqiu Lei is a software development engineer at Amazon Web Services working on vector search (k-NN) and map visualizations for the OpenSearch Project.

    View all posts
  • Vamshi Vijay Nakkirtha is a software engineering manager working on the OpenSearch Project and Amazon OpenSearch Service. His primary interests include distributed systems. He is an active contributor to various plugins, like k-NN, GeoSpatial, and dashboard-maps.

    View all posts
  • Dylan Tong leads OpenSearch AI and machine learning product initiatives at AWS.

    View all posts
Close Menu