You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Reranking search results
Introduced 2.12
You can rerank search results using a cross-encoder reranker in order to improve search relevance. To implement reranking, you need to configure a search pipeline that runs at search time. The search pipeline intercepts search results and applies the rerank
processor to them. The rerank
processor evaluates the search results and sorts them based on the new scores provided by the cross-encoder model.
PREREQUISITE
Before configuring a reranking pipeline, you must set up a cross-encoder model. For information about using an OpenSearch-provided model, see Cross-encoder models. For information about using a custom model, see Custom local models.
Running a search with reranking
To run a search with reranking, follow these steps:
- Configure a search pipeline.
- Create an index for ingestion.
- Ingest documents into the index.
- Search using reranking.
Step 1: Configure a search pipeline
Next, configure a search pipeline with a rerank
processor.
The following example request creates a search pipeline with an ml_opensearch
rerank processor. In the request, provide a model ID for the cross-encoder model and the document fields to use as context:
PUT /_search/pipeline/my_pipeline
{
"description": "Pipeline for reranking with a cross-encoder",
"response_processors": [
{
"rerank": {
"ml_opensearch": {
"model_id": "gnDIbI0BfUsSoeNT_jAw"
},
"context": {
"document_fields": [
"passage_text"
]
}
}
}
]
}
For more information about the request fields, see Request fields.
Step 2: Create an index for ingestion
In order to use the rerank processor defined in your pipeline, create an OpenSearch index and add the pipeline created in the previous step as the default pipeline:
PUT /my-index
{
"settings": {
"index.search.default_pipeline" : "my_pipeline"
},
"mappings": {
"properties": {
"passage_text": {
"type": "text"
}
}
}
}
Step 3: Ingest documents into the index
To ingest documents into the index created in the previous step, send the following bulk request:
POST /_bulk
{ "index": { "_index": "my-index" } }
{ "passage_text" : "I said welcome to them and we entered the house" }
{ "index": { "_index": "my-index" } }
{ "passage_text" : "I feel welcomed in their family" }
{ "index": { "_index": "my-index" } }
{ "passage_text" : "Welcoming gifts are great" }
Step 4: Search using reranking
To perform reranking search on your index, use any OpenSearch query and provide an additional ext.rerank
field:
POST /my-index/_search
{
"query": {
"match": {
"passage_text": "how to welcome in family"
}
},
"ext": {
"rerank": {
"query_context": {
"query_text": "how to welcome in family"
}
}
}
}
Alternatively, you can provide the full path to the field containing the context. For more information, see Rerank processor example.
Using rerank and normalization processors together
When you use a rerank processor in conjunction with a normalization processor and a hybrid query, the rerank processor alters the final document scores. This is because the rerank processor operates after the normalization processor in the search pipeline.
The processing order is as follows:
- Normalization processor: This processor normalizes the document scores based on the configured normalization method. For more information, see Normalization processor.
- Rerank processor: Following normalization, the rerank processor further adjusts the document scores. This adjustment can significantly impact the final ordering of search results.
This processing order has the following implications:
- Score modification: The rerank processor modifies the scores that were initially adjusted by the normalization processor, potentially leading to different ranking results than initially expected.
- Hybrid queries: In the context of hybrid queries, where multiple types of queries and scoring mechanisms are combined, this behavior is particularly noteworthy. The combined scores from the initial query are normalized first and then reranked, resulting in a two-stage scoring modification.