Keyword repeat token filter

The keyword_repeat token filter emits the keyword version of a token into a token stream. This filter is typically used when you want to retain both the original token and its modified version after further token transformations, such as stemming or synonym expansion. The duplicated tokens allow the original, unchanged version of the token to remain in the final analysis alongside the modified versions.

The keyword_repeat token filter should be placed before stemming filters. Stemming is not applied to every token, thus you may have duplicate tokens in the same position after stemming. To remove duplicate tokens, use the remove_duplicates token filter after the stemmer.

Example

The following example request creates a new index named my_index and configures an analyzer with a keyword_repeat filter:

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_kstem": {
          "type": "kstem"
        },
        "my_lowercase": {
          "type": "lowercase"
        }
      },
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "my_lowercase",
            "keyword_repeat",
            "my_kstem"
          ]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

POST /my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Stopped quickly"
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "stopped",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "stop",
      "start_offset": 0,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "quickly",
      "start_offset": 8,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "quick",
      "start_offset": 8,
      "end_offset": 15,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}

You can further examine the impact of the keyword_repeat token filter by adding the following parameters to the _analyze query:

POST /my_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "Stopped quickly",
  "explain": true,
  "attributes": "keyword"
}

The response includes detailed information, such as tokenization, filtering, and the application of specific token filters:

{
  "detail": {
    "custom_analyzer": true,
    "charfilters": [],
    "tokenizer": {
      "name": "standard",
      "tokens": [
        {"token": "OpenSearch","start_offset": 0,"end_offset": 10,"type": "<ALPHANUM>","position": 0},
        {"token": "helped","start_offset": 11,"end_offset": 17,"type": "<ALPHANUM>","position": 1},
        {"token": "many","start_offset": 18,"end_offset": 22,"type": "<ALPHANUM>","position": 2},
        {"token": "employers","start_offset": 23,"end_offset": 32,"type": "<ALPHANUM>","position": 3}
      ]
    },
    "tokenfilters": [
      {
        "name": "lowercase",
        "tokens": [
          {"token": "opensearch","start_offset": 0,"end_offset": 10,"type": "<ALPHANUM>","position": 0},
          {"token": "helped","start_offset": 11,"end_offset": 17,"type": "<ALPHANUM>","position": 1},
          {"token": "many","start_offset": 18,"end_offset": 22,"type": "<ALPHANUM>","position": 2},
          {"token": "employers","start_offset": 23,"end_offset": 32,"type": "<ALPHANUM>","position": 3}
        ]
      },
      {
        "name": "keyword_marker_filter",
        "tokens": [
          {"token": "opensearch","start_offset": 0,"end_offset": 10,"type": "<ALPHANUM>","position": 0,"keyword": true},
          {"token": "helped","start_offset": 11,"end_offset": 17,"type": "<ALPHANUM>","position": 1,"keyword": false},
          {"token": "many","start_offset": 18,"end_offset": 22,"type": "<ALPHANUM>","position": 2,"keyword": false},
          {"token": "employers","start_offset": 23,"end_offset": 32,"type": "<ALPHANUM>","position": 3,"keyword": false}
        ]
      },
      {
        "name": "kstem_filter",
        "tokens": [
          {"token": "opensearch","start_offset": 0,"end_offset": 10,"type": "<ALPHANUM>","position": 0,"keyword": true},
          {"token": "help","start_offset": 11,"end_offset": 17,"type": "<ALPHANUM>","position": 1,"keyword": false},
          {"token": "many","start_offset": 18,"end_offset": 22,"type": "<ALPHANUM>","position": 2,"keyword": false},
          {"token": "employer","start_offset": 23,"end_offset": 32,"type": "<ALPHANUM>","position": 3,"keyword": false}
        ]
      }
    ]
  }
}

Example
Generated tokens

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Keyword repeat token filter

Example

Generated tokens

OpenSearch Links

Get Involved

Resources

Contact Us