You're viewing version 2.16 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Using raw vectors for neural sparse search
If you’re using self-hosted sparse embedding models, you can ingest raw sparse vectors and use neural sparse search.
Tutorial
This tutorial consists of the following steps:
Step 1: Ingest sparse vectors
Once you have generated sparse vector embeddings, you can directly ingest them into OpenSearch.
Step 1(a): Create an index
In order to ingest documents containing raw sparse vectors, create a rank features index:
PUT /my-nlp-index
{
"mappings": {
"properties": {
"id": {
"type": "text"
},
"passage_embedding": {
"type": "rank_features"
},
"passage_text": {
"type": "text"
}
}
}
}
Step 1(b): Ingest documents into the index
To ingest documents into the index created in the previous step, send the following request:
PUT /my-nlp-index/_doc/1
{
"passage_text": "Hello world",
"id": "s1",
"passage_embedding": {
"hi" : 4.338913,
"planets" : 2.7755864,
"planet" : 5.0969057,
"mars" : 1.7405145,
"earth" : 2.6087382,
"hello" : 3.3210192
}
}
Step 2: Search the data using a sparse vector
To search the documents using a sparse vector, provide the sparse embeddings in the neural_sparse
query:
GET my-nlp-index/_search
{
"query": {
"neural_sparse": {
"passage_embedding": {
"query_tokens": {
"hi" : 4.338913,
"planets" : 2.7755864,
"planet" : 5.0969057,
"mars" : 1.7405145,
"earth" : 2.6087382,
"hello" : 3.3210192
}
}
}
}
}
Accelerating neural sparse search
To learn more about improving retrieval time for neural sparse search, see Accelerating neural sparse search.