Keep words token filter

The keep_words token filter is designed to keep only certain words during the analysis process. This filter is useful if you have a large body of text but are only interested in certain keywords or terms.

Parameters

The keep_words token filter can be configured with the following parameters.

Parameter	Required/Optional	Data type	Description
`keep_words`	Required if `keep_words_path` is not configured	List of strings	The list of words to keep.
`keep_words_path`	Required if `keep_words` is not configured	String	The path to the file containing the list of words to keep.
`keep_words_case`	Optional	Boolean	Whether to lowercase all words during comparison. Default is `false`.

Example

The following example request creates a new index named my_index and configures an analyzer with a keep_words filter:

PUT my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_keep_word": {
          "tokenizer": "standard",
          "filter": [ "keep_words_filter" ]
        }
      },
      "filter": {
        "keep_words_filter": {
          "type": "keep",
          "keep_words": ["example", "world", "opensearch"],
          "keep_words_case": true
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /my_index/_analyze
{
  "analyzer": "custom_keep_word",
  "text": "Hello, world! This is an OpenSearch example."
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "world",
      "start_offset": 7,
      "end_offset": 12,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "OpenSearch",
      "start_offset": 25,
      "end_offset": 35,
      "type": "<ALPHANUM>",
      "position": 5
    },
    {
      "token": "example",
      "start_offset": 36,
      "end_offset": 43,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}

Parameters
Example
Generated tokens

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Keep words token filter

Parameters

Example

Generated tokens

OpenSearch Links

Get Involved

Resources

Contact Us