Link Search Menu Expand Document Documentation Menu

Ingesting data into a vector index

After creating a vector index, you need to either ingest raw vector data or convert data to embeddings while ingesting it.

Comparison of ingestion methods

The following table compares the two ingestion methods.

Feature Data format Ingest pipeline Vector generation Additional fields
Raw vector ingestion Pre-generated vectors Not required External Optional metadata
Converting data to embeddings during ingestion Text or image data Required Internal (during ingestion) Original data + embeddings

Raw vector ingestion

When working with raw vectors or embeddings generated outside of OpenSearch, you directly ingest vector data into the knn_vector field. No pipeline is required because the vectors are already generated:

PUT /my-raw-vector-index/_doc/1
{
  "my_vector": [0.1, 0.2, 0.3],
  "metadata": "Optional additional information"
}

You can also use the Bulk API to ingest multiple vectors efficiently:

PUT /_bulk
{"index": {"_index": "my-raw-vector-index", "_id": 1}}
{"my_vector": [0.1, 0.2, 0.3], "metadata": "First item"}
{"index": {"_index": "my-raw-vector-index", "_id": 2}}
{"my_vector": [0.2, 0.3, 0.4], "metadata": "Second item"}

Converting data to embeddings during ingestion

After you have configured an ingest pipeline that automatically generates embeddings, you can ingest text data directly into your index:

PUT /my-ai-search-index/_doc/1
{
  "input_text": "Example: AI search description"
}

The pipeline automatically generates and stores the embeddings in the output_embedding field.

You can also use the Bulk API to ingest multiple documents efficiently:

PUT /_bulk
{"index": {"_index": "my-ai-search-index", "_id": 1}}
{"input_text": "Example AI search description"}
{"index": {"_index": "my-ai-search-index", "_id": 2}}
{"input_text": "Bulk API operation description"}

Working with sparse vectors

OpenSearch also supports sparse vectors. For more information, see Neural sparse search.

Text chunking

For information about splitting large documents into smaller passages before generating embeddings during dense or sparse AI search, see Text chunking.

Next steps

350 characters left

Have a question? .

Want to contribute? or .