Length token filter
The length
token filter is used to remove tokens that don’t meet specified length criteria (minimum and maximum values) from the token stream.
Parameters
The length
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
min | Optional | Integer | The minimum token length. Default is 0 . |
max | Optional | Integer | The maximum token length. Default is Integer.MAX_VALUE (2147483647 ). |
Example
The following example request creates a new index named my_index
and configures an analyzer with a length
filter:
PUT my_index
{
"settings": {
"analysis": {
"analyzer": {
"only_keep_4_to_10_characters": {
"tokenizer": "whitespace",
"filter": [ "length_4_to_10" ]
}
},
"filter": {
"length_4_to_10": {
"type": "length",
"min": 4,
"max": 10
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my_index/_analyze
{
"analyzer": "only_keep_4_to_10_characters",
"text": "OpenSearch is a great tool!"
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "OpenSearch",
"start_offset": 0,
"end_offset": 10,
"type": "word",
"position": 0
},
{
"token": "great",
"start_offset": 16,
"end_offset": 21,
"type": "word",
"position": 3
},
{
"token": "tool!",
"start_offset": 22,
"end_offset": 27,
"type": "word",
"position": 4
}
]
}