Stop token filter
The stop
token filter is used to remove common words (also known as stopwords) from a token stream during analysis. Stopwords are typically articles and prepositions, such as a
or for
. These words are not significantly meaningful in search queries and are often excluded to improve search efficiency and relevance.
The default list of English stopwords includes the following words: a
, an
, and
, are
, as
, at
, be
, but
, by
, for
, if
, in
, into
, is
, it
, no
, not
, of
, on
, or
, such
, that
, the
, their
, then
, there
, these
, they
, this
, to
, was
, will
, and with
.
Parameters
The stop
token filter can be configured with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
stopwords | Optional | String | Specifies either a custom array of stopwords or a language for which to fetch the predefined Lucene stopword list: - _arabic_ - _armenian_ - _basque_ - _bengali_ - _brazilian_ (Brazilian Portuguese) - _bulgarian_ - _catalan_ - _cjk_ (Chinese, Japanese, and Korean)- _czech_ - _danish_ - _dutch_ - _english_ (Default)- _estonian_ - _finnish_ - _french_ - _galician_ - _german_ - _greek_ - _hindi_ - _hungarian_ - _indonesian_ - _irish_ - _italian_ - _latvian_ - _lithuanian_ - _norwegian_ - _persian_ - _portuguese_ - _romanian_ - _russian_ - _sorani_ - _spanish_ - _swedish_ - _thai_ - _turkish_ |
stopwords_path | Optional | String | Specifies the file path (absolute or relative to the config directory) of the file containing custom stopwords. |
ignore_case | Optional | Boolean | If true , stopwords will be matched regardless of their case. Default is false . |
remove_trailing | Optional | Boolean | If true , trailing stopwords will be removed during analysis. Default is true . |
Example
The following example request creates a new index named my-stopword-index
and configures an analyzer with a stop
filter that uses the predefined stopword list for the English language:
PUT /my-stopword-index
{
"settings": {
"analysis": {
"filter": {
"my_stop_filter": {
"type": "stop",
"stopwords": "_english_"
}
},
"analyzer": {
"my_stop_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"my_stop_filter"
]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
GET /my-stopword-index/_analyze
{
"analyzer": "my_stop_analyzer",
"text": "A quick dog jumps over the turtle"
}
The response contains the generated tokens:
{
"tokens": [
{
"token": "quick",
"start_offset": 2,
"end_offset": 7,
"type": "<ALPHANUM>",
"position": 1
},
{
"token": "dog",
"start_offset": 8,
"end_offset": 11,
"type": "<ALPHANUM>",
"position": 2
},
{
"token": "jumps",
"start_offset": 12,
"end_offset": 17,
"type": "<ALPHANUM>",
"position": 3
},
{
"token": "over",
"start_offset": 18,
"end_offset": 22,
"type": "<ALPHANUM>",
"position": 4
},
{
"token": "turtle",
"start_offset": 27,
"end_offset": 33,
"type": "<ALPHANUM>",
"position": 6
}
]
}