Link Search Menu Expand Document Documentation Menu

You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

KStem token filter

The kstem token filter is a stemming filter used to reduce words to their root forms. The filter is a lightweight algorithmic stemmer designed for the English language that performs the following stemming operations:

  • Reduces plurals to their singular form.
  • Converts different verb tenses to their base form.
  • Removes common derivational endings, such as “-ing” or “-ed”.

The kstem token filter is equivalent to the a stemmer filter configured with a light_english language. It provides a more conservative stemming compared to other stemming filters like porter_stem.

The kstem token filter is based on the Lucene KStemFilter. For more information, see the Lucene documentation.

Example

The following example request creates a new index named my_kstem_index and configures an analyzer with a kstem filter:

PUT /my_kstem_index
{
  "settings": {
    "analysis": {
      "filter": {
        "kstem_filter": {
          "type": "kstem"
        }
      },
      "analyzer": {
        "my_kstem_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "kstem_filter"
          ]
        }
      }
    }
  },
  "mappings": {
    "properties": {
      "content": {
        "type": "text",
        "analyzer": "my_kstem_analyzer"
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

POST /my_kstem_index/_analyze
{
  "analyzer": "my_kstem_analyzer",
  "text": "stops stopped"
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "stop",
      "start_offset": 0,
      "end_offset": 5,
      "type": "<ALPHANUM>",
      "position": 0
    },
    {
      "token": "stop",
      "start_offset": 6,
      "end_offset": 13,
      "type": "<ALPHANUM>",
      "position": 1
    }
  ]
}
350 characters left

Have a question? .

Want to contribute? or .