Spaces
In vector search, a space defines how the distance (or similarity) between two vectors is calculated. The choice of space affects how nearest neighbors are determined during search operations.
Distance calculation
A space defines the function used to measure the distance between two points in order to determine the k-nearest neighbors. In k-NN search, a lower score equates to a closer and better result. This is the opposite of how OpenSearch scores results, where a higher score equates to a better result. OpenSearch supports the following spaces.
Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine in the method documentation.
Space type | Search type | Distance function (\(d\) ) | OpenSearch score |
---|---|---|---|
l1 | Approximate, exact | \(d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n \lvert x_i - y_i \rvert\) | \(score = {1 \over {1 + d} }\) |
l2 | Approximate, exact | \(d(\mathbf{x}, \mathbf{y}) = \sum_{i=1}^n (x_i - y_i)^2\) | \(score = {1 \over 1 + d }\) |
linf | Approximate, exact | \(d(\mathbf{x}, \mathbf{y}) = max(\lvert x_i - y_i \rvert)\) | \(score = {1 \over 1 + d }\) |
cosinesimil | Approximate, exact | \(d(\mathbf{x}, \mathbf{y}) = 1 - cos { \theta } = 1 - {\mathbf{x} \cdot \mathbf{y} \over \lVert \mathbf{x}\rVert \cdot \lVert \mathbf{y}\rVert}\)\(= 1 - {\sum_{i=1}^n x_i y_i \over \sqrt{\sum_{i=1}^n x_i^2} \cdot \sqrt{\sum_{i=1}^n y_i^2}}\), where \(\lVert \mathbf{x}\rVert\) and \(\lVert \mathbf{y}\rVert\) represent the norms of vectors \(\mathbf{x}\) and \(\mathbf{y}\), respectively. | \(score = {2 - d \over 2}\) |
innerproduct (supported for Lucene in OpenSearch version 2.13 and later) | Approximate | NMSLIB and Faiss: \(d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i\) Lucene: \(d(\mathbf{x}, \mathbf{y}) = {\mathbf{x} \cdot \mathbf{y}} = \sum_{i=1}^n x_i y_i\) | NMSLIB and Faiss: \(\text{If} d \ge 0, score = {1 \over 1 + d }\) \(\text{If} d < 0, score = −d + 1\) Lucene: \(\text{If} d > 0, score = d + 1\) \(\text{If} d \le 0, score = {1 \over 1 + (-1 \cdot d) }\) |
innerproduct (supported for Lucene in OpenSearch version 2.13 and later) | Exact | \(d(\mathbf{x}, \mathbf{y}) = - {\mathbf{x} \cdot \mathbf{y}} = - \sum_{i=1}^n x_i y_i\) | \(\text{If} d \ge 0, score = {1 \over 1 + d }\) \(\text{If} d < 0, score = −d + 1\) |
hamming (supported for binary vectors in OpenSearch version 2.16 and later) | Approximate, exact | \(d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\) | \(score = {1 \over 1 + d }\) |
hammingbit (supported for binary and long vectors) | Exact | \(d(\mathbf{x}, \mathbf{y}) = \text{countSetBits}(\mathbf{x} \oplus \mathbf{y})\) | \(score = {1 \over 1 + d }\) |
The cosine similarity formula does not include the 1 -
prefix. However, because similarity search libraries equate lower scores with closer results, they return 1 - cosineSimilarity
for the cosine similarity space—this is why 1 -
is included in the distance function.
With cosine similarity, it is not valid to pass a zero vector ([0, 0, ...]
) as input. This is because the magnitude of such a vector is 0, which raises a divide by 0
exception in the corresponding formula. Requests containing the zero vector will be rejected, and a corresponding exception will be thrown.
The hamming
space type is supported for binary vectors in OpenSearch version 2.16 and later. For more information, see Binary k-NN vectors.
Specifying the space type
The space type is specified when creating an index.
You can specify the space type at the top level of the field mapping:
PUT /test-index
{
"settings": {
"index": {
"knn": true
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 3,
"space_type": "l2"
}
}
}
}
Alternatively, you can specify the space type within the method
object if defining a method:
PUT test-index
{
"settings": {
"index": {
"knn": true,
"knn.algo_param.ef_search": 100
}
},
"mappings": {
"properties": {
"my_vector1": {
"type": "knn_vector",
"dimension": 1024,
"method": {
"name": "hnsw",
"space_type": "l2",
"engine": "nmslib",
"parameters": {
"ef_construction": 128,
"m": 24
}
}
}
}
}
}