You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Standard analyzer
The standard
analyzer is the default analyzer used when no other analyzer is specified. It is designed to provide a basic and efficient approach to generic text processing.
This analyzer consists of the following tokenizers and token filters:
standard
tokenizer: Removes most punctuation and splits text on spaces and other common delimiters.lowercase
token filter: Converts all tokens to lowercase, ensuring case-insensitive matching.stop
token filter: Removes common stopwords, such as “the”, “is”, and “and”, from the tokenized output.
Example
Use the following command to create an index named my_standard_index
with a standard
analyzer:
PUT /my_standard_index
{
"mappings": {
"properties": {
"my_field": {
"type": "text",
"analyzer": "standard"
}
}
}
}
Parameters
You can configure a standard
analyzer with the following parameters.
Parameter | Required/Optional | Data type | Description |
---|---|---|---|
max_token_length | Optional | Integer | Sets the maximum length of the produced token. If this length is exceeded, the token is split into multiple tokens at the length configured in max_token_length . Default is 255 . |
stopwords | Optional | String or list of strings | A string specifying a predefined list of stopwords (such as _english_ ) or an array specifying a custom list of stopwords. Default is _none_ . |
stopwords_path | Optional | String | The path (absolute or relative to the config directory) to the file containing a list of stop words. |
Configuring a custom analyzer
Use the following command to configure an index with a custom analyzer that is equivalent to the standard
analyzer:
PUT /my_custom_index
{
"settings": {
"analysis": {
"analyzer": {
"my_custom_analyzer": {
"type": "custom",
"tokenizer": "standard",
"filter": [
"lowercase",
"stop"
]
}
}
}
}
}
Generated tokens
Use the following request to examine the tokens generated using the analyzer:
POST /my_custom_index/_analyze
{
"analyzer": "my_custom_analyzer",
"text": "The slow turtle swims away"
}
The response contains the generated tokens:
{
"tokens": [
{"token": "slow","start_offset": 4,"end_offset": 8,"type": "<ALPHANUM>","position": 1},
{"token": "turtle","start_offset": 9,"end_offset": 15,"type": "<ALPHANUM>","position": 2},
{"token": "swims","start_offset": 16,"end_offset": 21,"type": "<ALPHANUM>","position": 3},
{"token": "away","start_offset": 22,"end_offset": 26,"type": "<ALPHANUM>","position": 4}
]
}