Link Search Menu Expand Document Documentation Menu

Length token filter

The length token filter is used to remove tokens that don’t meet specified length criteria (minimum and maximum values) from the token stream.

Parameters

The length token filter can be configured with the following parameters.

Parameter Required/Optional Data type Description
min Optional Integer The minimum token length. Default is 0.
max Optional Integer The maximum token length. Default is Integer.MAX_VALUE (2147483647).

Example

The following example request creates a new index named my_index and configures an analyzer with a length filter:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "only_keep_4_to_10_characters": {
          "tokenizer": "whitespace",
          "filter": [ "length_4_to_10" ]
        }
      },
      "filter": {
        "length_4_to_10": {
          "type": "length",
          "min": 4,
          "max": 10
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /my_index/_analyze
{
  "analyzer": "only_keep_4_to_10_characters",
  "text": "OpenSearch is a great tool!"
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "OpenSearch",
      "start_offset": 0,
      "end_offset": 10,
      "type": "word",
      "position": 0
    },
    {
      "token": "great",
      "start_offset": 16,
      "end_offset": 21,
      "type": "word",
      "position": 3
    },
    {
      "token": "tool!",
      "start_offset": 22,
      "end_offset": 27,
      "type": "word",
      "position": 4
    }
  ]
}
350 characters left

Have a question? .

Want to contribute? or .