Limit token filter
The limit
token filter is used to limit the number of tokens passed through the analysis chain.
Parameters
The limit
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
max_token_count | Optional | Integer | The maximum number of tokens to be generated. Default is 1 . |
consume_all_tokens | Optional | Boolean | (Expert-level setting) Uses all tokens from the tokenizer, even if the result exceeds max_token_count . When this parameter is set, the output still only contains the number of tokens specified by max_token_count . However, all tokens generated by the tokenizer are processed. Default is false . |
Example
The following example request creates a new index named my_index
and configures an analyzer with a limit
filter:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"three_token_limit": {
"tokenizer": "standard",
"filter": [ "custom_token_limit" ]
}
},
"filter": {
"custom_token_limit": {
"type": "limit",
"max_token_count": 3
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my_index/_analyze
{
"analyzer": "three_token_limit",
"text": "OpenSearch is a powerful and flexible search engine."
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "OpenSearch",
"start_offset": 0,
"end_offset": 10,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "is",
"start_offset": 11,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "a",
"start_offset": 14,
"end_offset": 15,
"type": "<ALPHANUM>",
"position": 2
}
]
}