You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Keyword marker token filter
The keyword_marker
token filter is used to prevent certain tokens from being altered by stemmers or other filters. The keyword_marker
token filter does this by marking the specified tokens as keywords
, which prevents any stemming or other processing. This ensures that specific words remain in their original form.
Parameters
The keyword_marker
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
ignore_case | Optional | Boolean | Whether to ignore the letter case when matching keywords. Default is false . |
keywords | Required if either keywords_path or keywords_pattern is not set | List of strings | The list of tokens to mark as keywords. |
keywords_path | Required if either keywords or keywords_pattern is not set | String | The path (relative to the config directory or absolute) to the list of keywords. |
keywords_pattern | Required if either keywords or keywords_path is not set | String | A regular expression used for matching tokens to be marked as keywords. |
Example
The following example request creates a new index named my_index
and configures an analyzer with a keyword_marker
filter. The filter marks the word example
as a keyword:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "keyword_marker_filter", "stemmer"]
}
},
"filter": {
"keyword_marker_filter": {
"type": "keyword_marker",
"keywords": ["example"]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my_index/_analyze
{
"analyzer": "custom_analyzer",
"text": "Favorite example"
}
The response contains the generated tokens. Note that while the word favorite
was stemmed, the word example
was not stemmed because it was marked as a keyword:
{
"tokens": [
{
"token": "favorit",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "example",
"start_offset": 9,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 1
}
]
}
You can further examine the impact of the keyword_marker
token filter by adding the following parameters to the _analyze
query:
GET /my_index/_analyze
{
"analyzer": "custom_analyzer",
"text": "This is an OpenSearch example demonstrating keyword marker.",
"explain": true,
"attributes": "keyword"
}
This will produce additional details in the response similar to the following:
{
"name": "porter_stem",
"tokens": [
...
{
"token": "example",
"start_offset": 22,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 4,
"keyword": true
},
{
"token": "demonstr",
"start_offset": 30,
"end_offset": 43,
"type": "<ALPHANUM>",
"position": 5,
"keyword": false
},
...
]
}