You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Stop token filter
The stop
token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a
or for
. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.
The default list of English stopwords includes the following words: a
, an
, and
, are
, as
, at
, be
, but
, by
, for
, if
, in
, into
, is
, it
, no
, not
, of
, on
, or
, such
, that
, the
, their
, then
, there
, these
, they
, this
, to
, was
, will
, and with
.
Parameters
The stop
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
stopwords | Optional | String | Specifies either a custom array of stopwords or a language for which to fetch the predefined Lucene stopword list: - _arabic_ - _armenian_ - _basque_ - _bengali_ - _brazilian_ (Brazilian Portuguese) - _bulgarian_ - _catalan_ - _cjk_ (Chinese, Japanese, and Korean)- _czech_ - _danish_ - _dutch_ - _english_ (Default)- _estonian_ - _finnish_ - _french_ - _galician_ - _german_ - _greek_ - _hindi_ - _hungarian_ - _indonesian_ - _irish_ - _italian_ - _latvian_ - _lithuanian_ - _norwegian_ - _persian_ - _portuguese_ - _romanian_ - _russian_ - _sorani_ - _spanish_ - _swedish_ - _thai_ - _turkish_ |
stopwords_path | Optional | String | Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords. |
ignore_case | Optional | Boolean | If true , stopwords will be matched regardless of their case. Default is false . |
remove_trailing | Optional | Boolean | If true , trailing stopwords will be removed during analysis. Default is true . |
Example
The following example request creates a new index named my-stopword-index
and configures an analyzer with a stop
filter that uses the predefined stopword list for the English language:
PUT /my-stopword-index
{
"settings": {
"analysis": {
"filter": {
"my_stop_filter": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stop_filter"
]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my-stopword-index/_analyze
{
"analyzer": "my_stop_analyzer",
"text": "A quick dog jumps over the turtle"
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "quick",
"start_offset": 2,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "dog",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "jumps",
"start_offset": 12,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "over",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "turtle",
"start_offset": 27,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 6
}
]
}