Phonetic token filter
The phonetic
token filter transforms tokens into their phonetic representations, enabling more flexible matching of words that sound similar but are spelled differently. This is particularly useful for searching names, brands, or other entities that users might spell differently but pronounce similarly.
The phonetic
token filter is not included in OpenSearch distributions by default. To use this token filter, you must first install the analysis-phonetic
plugin as follows and then restart OpenSearch:
./bin/opensearch-plugin install analysis-phonetic
For more information about installing plugins, see Installing plugins.
Parameters
The phonetic
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
encoder | Optional | String | Specifies the phonetic algorithm to use. Valid values are: - metaphone (default)- double_metaphone - soundex - refined_soundex - caverphone1 - caverphone2 - cologne - nysiis - koelnerphonetik - haasephonetik - beider_morse - daitch_mokotoff |
replace | Optional | Boolean | Whether to replace the original token. If false , the original token is included in the output along with the phonetic encoding. Default is true . |
Example
The following example request creates a new index named names_index
and configures an analyzer with a phonetic
filter:
PUT /names_index
{
"settings": {
"analysis": {
"filter": {
"my_phonetic_filter": {
"type": "phonetic",
"encoder": "double_metaphone",
"replace": true
}
},
"analyzer": {
"phonetic_analyzer": {
"tokenizer": "standard",
"filter": [
"my_phonetic_filter"
]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated for the names Stephen
and Steven
using the analyzer:
POST /names_index/_analyze
{
"text": "Stephen",
"analyzer": "phonetic_analyzer"
}
POST /names_index/_analyze
{
"text": "Steven",
"analyzer": "phonetic_analyzer"
}
In both cases, the response contains the same generated token:
{
"tokens": [
{
"token": "STFN",
"start_offset": 0,
"end_offset": 6,
"type": "<ALPHANUM>",
"position": 0
}
]
}