You're viewing version 2.16 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Truncate hits processor
Introduced 2.12
The truncate_hits
response processor discards returned search hits after a given hit count is reached. The truncate_hits
processor is designed to work with the oversample
request processor but may be used on its own.
The target_size
parameter (which specifies where to truncate) is optional. If it is not specified, then OpenSearch uses the original_size
variable set by the oversample
processor (if available).
The following is a common usage pattern:
- Add the
oversample
processor to a request pipeline to fetch a larger set of results. - In the response pipeline, apply a reranking processor (which may promote results from beyond the originally requested top N) or the
collapse
processor (which may discard results after deduplication). - Apply the
truncate
processor to return (at most) the originally requested number of hits.
Request fields
The following table lists all request fields.
Field | Data type | Description |
---|---|---|
target_size | Integer | The maximum number of search hits to return (>=0). If not specified, the processor will try to read the original_size variable and will fail if it is not available. Optional. |
context_prefix | String | May be used to read the original_size variable from a specific scope in order to avoid collisions. Optional. |
tag | String | The processor’s identifier. Optional. |
description | String | A description of the processor. Optional. |
ignore_failure | Boolean | If true , OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false . |
Example
The following example demonstrates using a search pipeline with a truncate
processor.
Setup
Create an index named my_index
containing many documents:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "doc": { "title" : "document 1" }}
{ "create":{"_index":"my_index","_id":2}}
{ "doc": { "title" : "document 2" }}
{ "create":{"_index":"my_index","_id":3}}
{ "doc": { "title" : "document 3" }}
{ "create":{"_index":"my_index","_id":4}}
{ "doc": { "title" : "document 4" }}
{ "create":{"_index":"my_index","_id":5}}
{ "doc": { "title" : "document 5" }}
{ "create":{"_index":"my_index","_id":6}}
{ "doc": { "title" : "document 6" }}
{ "create":{"_index":"my_index","_id":7}}
{ "doc": { "title" : "document 7" }}
{ "create":{"_index":"my_index","_id":8}}
{ "doc": { "title" : "document 8" }}
{ "create":{"_index":"my_index","_id":9}}
{ "doc": { "title" : "document 9" }}
{ "create":{"_index":"my_index","_id":10}}
{ "doc": { "title" : "document 10" }}
Creating a search pipeline
The following request creates a search pipeline named my_pipeline
with a truncate_hits
response processor that discards hits after the first five:
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"truncate_hits" : {
"tag" : "truncate_1",
"description" : "This processor will discard results after the first 5.",
"target_size" : 5
}
}
]
}
Using a search pipeline
Search for documents in my_index
without a search pipeline:
POST /my_index/_search
{
"size": 8
}
The response contains eight hits:
Response
{
"took" : 13,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
},
{
"_index" : "my_index",
"_id" : "6",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 6"
}
}
},
{
"_index" : "my_index",
"_id" : "7",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 7"
}
}
},
{
"_index" : "my_index",
"_id" : "8",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 8"
}
}
}
]
}
}
To search with a pipeline, specify the pipeline name in the search_pipeline
query parameter:
POST /my_index/_search?search_pipeline=my_pipeline
{
"size": 8
}
The response contains only 5 hits, even though 8 were requested and 10 were available:
Response
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 1"
}
}
},
{
"_index" : "my_index",
"_id" : "2",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 2"
}
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 3"
}
}
},
{
"_index" : "my_index",
"_id" : "4",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 4"
}
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"doc" : {
"title" : "document 5"
}
}
}
]
}
}
Oversample, collapse, and truncate hits
The following is a more realistic example in which you use oversample
to request many candidate documents, use collapse
to remove documents that duplicate a particular field (to get more diverse results), and then use truncate
to return the originally requested document count (to avoid returning a large result payload from the cluster).
Setup
Create many documents containing a field that you’ll use for collapsing:
POST /_bulk
{ "create":{"_index":"my_index","_id":1}}
{ "title" : "document 1", "color":"blue" }
{ "create":{"_index":"my_index","_id":2}}
{ "title" : "document 2", "color":"blue" }
{ "create":{"_index":"my_index","_id":3}}
{ "title" : "document 3", "color":"red" }
{ "create":{"_index":"my_index","_id":4}}
{ "title" : "document 4", "color":"red" }
{ "create":{"_index":"my_index","_id":5}}
{ "title" : "document 5", "color":"yellow" }
{ "create":{"_index":"my_index","_id":6}}
{ "title" : "document 6", "color":"yellow" }
{ "create":{"_index":"my_index","_id":7}}
{ "title" : "document 7", "color":"orange" }
{ "create":{"_index":"my_index","_id":8}}
{ "title" : "document 8", "color":"orange" }
{ "create":{"_index":"my_index","_id":9}}
{ "title" : "document 9", "color":"green" }
{ "create":{"_index":"my_index","_id":10}}
{ "title" : "document 10", "color":"green" }
Create a pipeline that collapses only on the color
field:
PUT /_search/pipeline/collapse_pipeline
{
"response_processors": [
{
"collapse" : {
"field": "color"
}
}
]
}
Create another pipeline that oversamples, collapses, and then truncates results:
PUT /_search/pipeline/oversampling_collapse_pipeline
{
"request_processors": [
{
"oversample": {
"sample_factor": 3
}
}
],
"response_processors": [
{
"collapse" : {
"field": "color"
}
},
{
"truncate_hits": {
"description": "Truncates back to the original size before oversample increased it."
}
}
]
}
Collapse without oversample
In this example, you request the top three documents before collapsing on the color
field. Because the first two documents have the same color
, the second one is discarded, and the request returns the first and third documents:
POST /my_index/_search?search_pipeline=collapse_pipeline
{
"size": 3
}
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}
Oversample, collapse, and truncate
Now you will use the oversampling_collapse_pipeline
, which requests the top 9 documents (multiplying the size by 3), deduplicates by color
, and then returns the top 3 hits:
POST /my_index/_search?search_pipeline=oversampling_collapse_pipeline
{
"size": 3
}
Response
{
"took" : 2,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 10,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "my_index",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"title" : "document 1",
"color" : "blue"
}
},
{
"_index" : "my_index",
"_id" : "3",
"_score" : 1.0,
"_source" : {
"title" : "document 3",
"color" : "red"
}
},
{
"_index" : "my_index",
"_id" : "5",
"_score" : 1.0,
"_source" : {
"title" : "document 5",
"color" : "yellow"
}
}
]
},
"profile" : {
"shards" : [ ]
}
}