Link Search Menu Expand Document Documentation Menu

k-NN vector

Introduced 1.0

The knn_vector data type allows you to ingest vectors into an OpenSearch index and perform different kinds of vector search. The knn_vector field is highly configurable and can serve many different vector workloads. In general, a knn_vector field can be built either by providing a method definition or specifying a model ID.

Example

To map my_vector as a knn_vector, use the following request:

PUT /test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2"
      }
    }
  }
}

Optimizing vector storage

To optimize vector storage, you can specify a vector workload mode as in_memory (which optimizes for lowest latency) or on_disk (which optimizes for lowest cost). The on_disk mode reduces memory usage. Optionally, you can specify a compression_level to fine-tune the vector memory consumption:

PUT test-index
{
  "settings": {
    "index": {
      "knn": true
    }
  },
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "knn_vector",
        "dimension": 3,
        "space_type": "l2",
        "mode": "on_disk",
        "compression_level": "16x"
      }
    }
  }
}

Method definitions

Method definitions are used when the underlying approximate k-NN (ANN) algorithm does not require training. For example, the following knn_vector field specifies that a Faiss implementation of HNSW should be used for ANN search. During indexing, Faiss builds the corresponding HNSW segment files:

PUT test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 1024,
        "method": {
          "name": "hnsw",
          "space_type": "l2",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 100,
            "m": 16
          }
        }
      }
    }
  }
}

You can also specify the space_type at the top level:

PUT test-index
{
  "settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 100
    }
  },
  "mappings": {
    "properties": {
      "my_vector1": {
        "type": "knn_vector",
        "dimension": 1024,
        "space_type": "l2",
        "method": {
          "name": "hnsw",
          "engine": "faiss",
          "parameters": {
            "ef_construction": 100,
            "m": 16
          }
        }
      }
    }
  }
}

Model IDs

Model IDs are used when the underlying ANN algorithm requires a training step. As a prerequisite, the model must be created using the Train API. The model contains the information needed to initialize the native library segment files. To configure a model for a vector field, specify the model_id:

"my_vector": {
  "type": "knn_vector",
  "model_id": "my-model"
}

However, if you intend to use Painless scripting or a k-NN score script, you only need to pass the dimension:

"my_vector": {
   "type": "knn_vector",
   "dimension": 128
 }

For more information, see Building a vector index from a model.

Parameters

The following table lists the parameters accepted by k-NN vector field types.

Parameter Data type Description
type String The vector field type. Must be knn_vector. Required.
dimension Integer The size of the vectors used. Valid values are in the [1, 16,000] range. Required.
data_type String The data type of the vector elements. Valid values are binary, byte, and float. Optional. Default is float.
space_type String The vector space used to calculate the distance between vectors. Valid values are l1, l2, linf, cosinesimil, innerproduct, hamming, and hammingbit. Not every method/engine combination supports each of the spaces. For a list of supported spaces, see the section for a specific engine. Note: This value can also be specified within the method. Optional. For more information, see Spaces.
mode String Sets appropriate default values for k-NN parameters based on your priority: either low latency or low cost. Valid values are in_memory and on_disk. Optional. Default is in_memory. For more information, see Memory-optimized vectors.
compression_level String Selects a quantization encoder that reduces vector memory consumption by the given factor. Valid values are 1x, 2x, 4x, 8x, 16x, and 32x. Optional. For more information, see Memory-optimized vectors.
method Object The algorithm used for organizing vector data at indexing time and searching it at search time. Used when the ANN algorithm does not require training. Optional. For more information, see Methods and engines.
model_id String The model ID of a trained model. Used when the ANN algorithm requires training. See Model IDs. Optional.

Next steps


Related articles

350 characters left

Have a question? .

Want to contribute? or .