Link Search Menu Expand Document Documentation Menu

Keyword marker token filter

The keyword_marker token filter is used to prevent certain tokens from being altered by stemmers or other filters. The keyword_marker token filter does this by marking the specified tokens as keywords, which prevents any stemming or other processing. This ensures that specific words remain in their original form.

Parameters

The keyword_marker token filter can be configured with the following parameters.

Parameter Required/Optional Data type Description
ignore_case Optional Boolean Whether to ignore the letter case when matching keywords. Default is false.
keywords Required if either keywords_path or keywords_pattern is not set List of strings The list of tokens to mark as keywords.
keywords_path Required if either keywords or keywords_pattern is not set String The path (relative to the config directory or absolute) to the list of keywords.
keywords_pattern Required if either keywords or keywords_path is not set String A regular expression used for matching tokens to be marked as keywords.

Example

The following example request creates a new index named my_index and configures an analyzer with a keyword_marker filter. The filter marks the word example as a keyword:

PUT /my_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": ["lowercase", "keyword_marker_filter", "stemmer"]
        }
      },
      "filter": {
        "keyword_marker_filter": {
          "type": "keyword_marker",
          "keywords": ["example"]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /my_index/_analyze
{
  "analyzer": "custom_analyzer",
  "text": "Favorite example"
}

The response contains the generated tokens. Note that while the word favorite was stemmed, the word example was not stemmed because it was marked as a keyword:

{
  "tokens": [
    {
      "token": "favorit",
      "start_offset": 0,
      "end_offset": 8,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "example",
      "start_offset": 9,
      "end_offset": 16,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

You can further examine the impact of the keyword_marker token filter by adding the following parameters to the _analyze query:

GET /my_index/_analyze
{
  "analyzer": "custom_analyzer",
  "text": "This is an OpenSearch example demonstrating keyword marker.",
  "explain": true,
  "attributes": "keyword"
}

This will produce additional details in the response similar to the following:

{
    "name": "porter_stem",
    "tokens": [
      ...
      {
        "token": "example",
        "start_offset": 22,
        "end_offset": 29,
        "type": "<ALPHANUM>",
        "position": 4,
        "keyword": true
      },
      {
        "token": "demonstr",
        "start_offset": 30,
        "end_offset": 43,
        "type": "<ALPHANUM>",
        "position": 5,
        "keyword": false
      },
      ...
    ]
}
350 characters left

Have a question? .

Want to contribute? or .