Phonetic token filter

The phonetic token filter transforms tokens into their phonetic representations, enabling more flexible matching of words that sound similar but are spelled differently. This is particularly useful for searching names, brands, or other entities that users might spell differently but pronounce similarly.

The phonetic token filter is not included in OpenSearch distributions by default. To use this token filter, you must first install the analysis-phonetic plugin as follows and then restart OpenSearch:

./bin/opensearch-plugin install analysis-phonetic

For more information about installing plugins, see Installing plugins.

Parameters

The phonetic token filter can be configured with the following parameters.

Parameter	Required/Optional	Data type	Description
`encoder`	Optional	String	Specifies the phonetic algorithm to use. Valid values are: - `metaphone` (default) - `double_metaphone` - `soundex` - `refined_soundex` - `caverphone1` - `caverphone2` - `cologne` - `nysiis` - `koelnerphonetik` - `haasephonetik` - `beider_morse` - `daitch_mokotoff`
`replace`	Optional	Boolean	Whether to replace the original token. If `false`, the original token is included in the output along with the phonetic encoding. Default is `true`.

Example

The following example request creates a new index named names_index and configures an analyzer with a phonetic filter:

PUT /names_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_phonetic_filter": {
          "type": "phonetic",
          "encoder": "double_metaphone",
          "replace": true
        }
      },
      "analyzer": {
        "phonetic_analyzer": {
          "tokenizer": "standard",
          "filter": [
            "my_phonetic_filter"
          ]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated for the names Stephen and Steven using the analyzer:

POST /names_index/_analyze
{
  "text": "Stephen",
  "analyzer": "phonetic_analyzer"
}

POST /names_index/_analyze
{
  "text": "Steven",
  "analyzer": "phonetic_analyzer"
}

In both cases, the response contains the same generated token:

{
  "tokens": [
    {
      "token": "STFN",
      "start_offset": 0,
      "end_offset": 6,
      "type": "<ALPHANUM>",
      "position": 0
    }
  ]
}

Parameters
Example
Generated tokens

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Phonetic token filter

Parameters

Example

Generated tokens

OpenSearch Links

Get Involved

Resources

Contact Us