Link Search Menu Expand Document Documentation Menu

You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Stop token filter

The stop token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a or for. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.

The default list of English stopwords includes the following words: a, an, and, are, as, at, be, but, by, for, if, in, into, is, it, no, not, of, on, or, such, that, the, their, then, there, these, they, this, to, was, will, and with.

Parameters

The stop token filter can be configured with the following parameters.

Parameter Required/Optional Data type Description
stopwords Optional String Specifies either a custom array of stopwords or a language for which to fetch the predefined Lucene stopword list:

- _arabic_
- _armenian_
- _basque_
- _bengali_
- _brazilian_ (Brazilian Portuguese)
- _bulgarian_
- _catalan_
- _cjk_ (Chinese, Japanese, and Korean)
- _czech_
- _danish_
- _dutch_
- _english_ (Default)
- _estonian_
- _finnish_
- _french_
- _galician_
- _german_
- _greek_
- _hindi_
- _hungarian_
- _indonesian_
- _irish_
- _italian_
- _latvian_
- _lithuanian_
- _norwegian_
- _persian_
- _portuguese_
- _romanian_
- _russian_
- _sorani_
- _spanish_
- _swedish_
- _thai_
- _turkish_
stopwords_path Optional String Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords.
ignore_case Optional Boolean If true, stopwords will be matched regardless of their case. Default is false.
remove_trailing Optional Boolean If true, trailing stopwords will be removed during analysis. Default is true.

Example

The following example request creates a new index named my-stopword-index and configures an analyzer with a stop filter that uses the predefined stopword list for the English language:

PUT /my-stopword-index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_stop_filter": {
          "type": "stop",
          "stopwords": "_english_"
        }
      },
      "analyzer": {
        "my_stop_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase",
            "my_stop_filter"
          ]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /my-stopword-index/_analyze
{
  "analyzer": "my_stop_analyzer",
  "text": "A quick dog jumps over the turtle"
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "quick",
      "start_offset": 2,
      "end_offset": 7,
      "type": "<ALPHANUM>",
      "position": 1
    },
    {
      "token": "dog",
      "start_offset": 8,
      "end_offset": 11,
      "type": "<ALPHANUM>",
      "position": 2
    },
    {
      "token": "jumps",
      "start_offset": 12,
      "end_offset": 17,
      "type": "<ALPHANUM>",
      "position": 3
    },
    {
      "token": "over",
      "start_offset": 18,
      "end_offset": 22,
      "type": "<ALPHANUM>",
      "position": 4
    },
    {
      "token": "turtle",
      "start_offset": 27,
      "end_offset": 33,
      "type": "<ALPHANUM>",
      "position": 6
    }
  ]
}
350 characters left

Have a question? .

Want to contribute? or .