Keyword marker token filter
The keyword_marker
token filter is used to prevent certain tokens from being altered by stemmers or other filters. The keyword_marker
token filter does this by marking the specified tokens as keywords
, which prevents any stemming or other processing. This ensures that specific words remain in their original form.
Parameters
The keyword_marker
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
ignore_case | Optional | Boolean | Whether to ignore the letter case when matching keywords. Default is false . |
keywords | Required if either keywords_path or keywords_pattern is not set | List of strings | The list of tokens to mark as keywords. |
keywords_path | Required if either keywords or keywords_pattern is not set | String | The path (relative to the config directory or absolute) to the list of keywords. |
keywords_pattern | Required if either keywords or keywords_path is not set | String | A regular expression used for matching tokens to be marked as keywords. |
Example
The following example request creates a new index named my_index
and configures an analyzer with a keyword_marker
filter. The filter marks the word example
as a keyword:
PUT /my_index
{
"settings": {
"analysis": {
"analyzer": {
"custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": ["lowercase", "keyword_marker_filter", "stemmer"]
}
},
"filter": {
"keyword_marker_filter": {
"type": "keyword_marker",
"keywords": ["example"]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my_index/_analyze
{
"analyzer": "custom_analyzer",
"text": "Favorite example"
}
The response contains the generated tokens. Note that while the word favorite
was stemmed, the word example
was not stemmed because it was marked as a keyword:
{
"tokens": [
{
"token": "favorit",
"start_offset": 0,
"end_offset": 8,
"type": "<ALPHANUM>",
"position": 0
},
{
"token": "example",
"start_offset": 9,
"end_offset": 16,
"type": "<ALPHANUM>",
"position": 1
}
]
}
You can further examine the impact of the keyword_marker
token filter by adding the following parameters to the _analyze
query:
GET /my_index/_analyze
{
"analyzer": "custom_analyzer",
"text": "This is an OpenSearch example demonstrating keyword marker.",
"explain": true,
"attributes": "keyword"
}
This will produce additional details in the response similar to the following:
{
"name": "porter_stem",
"tokens": [
...
{
"token": "example",
"start_offset": 22,
"end_offset": 29,
"type": "<ALPHANUM>",
"position": 4,
"keyword": true
},
{
"token": "demonstr",
"start_offset": 30,
"end_offset": 43,
"type": "<ALPHANUM>",
"position": 5,
"keyword": false
},
...
]
}