You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
KStem token filter
The kstem
token filter is a stemming filter used to reduce words to their root forms. The filter is a lightweight algorithmic stemmer designed for the English language that performs the following stemming operations:
- Reduces plurals to their singular form.
- Converts different verb tenses to their base form.
- Removes common derivational endings, such as “-ing” or “-ed”.
The kstem
token filter is equivalent to the a stemmer
filter configured with a light_english
language. It provides a more conservative stemming compared to other stemming filters like porter_stem
.
The kstem
token filter is based on the Lucene KStemFilter. For more information, see the Lucene documentation.
Example
The following example request creates a new index named my_kstem_index
and configures an analyzer with a kstem
filter:
PUT /my_kstem_index
{
"settings": {
"analysis": {
"filter": {
"kstem_filter": {
"type": "kstem"
}
},
"analyzer": {
"my_kstem_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"kstem_filter"
]
}
}
}
},
"mappings": {
"properties": {
"content": {
"type": "text",
"analyzer": "my_kstem_analyzer"
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
POST /my_kstem_index/_analyze
{
"analyzer": "my_kstem_analyzer",
"text": "stops stopped"
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "stop",
"start_offset": 0,
"end_offset": 5,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "stop",
"start_offset": 6,
"end_offset": 13,
"type": "<ALPHANUM>",
"position": 1
}
]
}