Ingesting data into a vector index
After creating a vector index, you need to either ingest raw vector data or convert data to embeddings while ingesting it.
Comparison of ingestion methods
The following table compares the two ingestion methods.
Feature | Data format | Ingest pipeline | Vector generation | Additional fields |
---|---|---|---|---|
Raw vector ingestion | Pre-generated vectors | Not required | External | Optional metadata |
Converting data to embeddings during ingestion | Text or image data | Required | Internal (during ingestion) | Original data + embeddings |
Raw vector ingestion
When working with raw vectors or embeddings generated outside of OpenSearch, you directly ingest vector data into the knn_vector
field. No pipeline is required because the vectors are already generated:
PUT /my-raw-vector-index/_doc/1
{
"my_vector": [0.1, 0.2, 0.3],
"metadata": "Optional additional information"
}
You can also use the Bulk API to ingest multiple vectors efficiently:
PUT /_bulk
{"index": {"_index": "my-raw-vector-index", "_id": 1}}
{"my_vector": [0.1, 0.2, 0.3], "metadata": "First item"}
{"index": {"_index": "my-raw-vector-index", "_id": 2}}
{"my_vector": [0.2, 0.3, 0.4], "metadata": "Second item"}
Converting data to embeddings during ingestion
After you have configured an ingest pipeline that automatically generates embeddings, you can ingest text data directly into your index:
PUT /my-ai-search-index/_doc/1
{
"input_text": "Example: AI search description"
}
The pipeline automatically generates and stores the embeddings in the output_embedding
field.
You can also use the Bulk API to ingest multiple documents efficiently:
PUT /_bulk
{"index": {"_index": "my-ai-search-index", "_id": 1}}
{"input_text": "Example AI search description"}
{"index": {"_index": "my-ai-search-index", "_id": 2}}
{"input_text": "Bulk API operation description"}
Working with sparse vectors
OpenSearch also supports sparse vectors. For more information, see Neural sparse search.
Text chunking
For information about splitting large documents into smaller passages before generating embeddings during dense or sparse AI search, see Text chunking.