Link Search Menu Expand Document Documentation Menu

Keep words token filter

The keep_words token filter is designed to keep only certain words during the analysis process. This filter is useful if you have a large body of text but are only interested in certain keywords or terms.

Parameters

The keep_words token filter can be configured with the following parameters.

Parameter Required/Optional Data type Description
keep_words Required if keep_words_path is not configured List of strings The list of words to keep.
keep_words_path Required if keep_words is not configured String The path to the file containing the list of words to keep.
keep_words_case Optional Boolean Whether to lowercase all words during comparison. Default is false.

Example

The following example request creates a new index named my_index and configures an analyzer with a keep_words filter:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_keep_word": {
          "tokenizer": "standard",
          "filter": [ "keep_words_filter" ]
        }
      },
      "filter": {
        "keep_words_filter": {
          "type": "keep",
          "keep_words": ["example", "world", "opensearch"],
          "keep_words_case": true
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /my_index/_analyze
{
  "analyzer": "custom_keep_word",
  "text": "Hello, world! This is an OpenSearch example."
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "world",
      "start_offset": 7,
      "end_offset": 12,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "OpenSearch",
      "start_offset": 25,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 5
    },
    {
      "token": "example",
      "start_offset": 36,
      "end_offset": 43,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}
350 characters left

Have a question? .

Want to contribute? or .