You're viewing version 2.18 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Standard analyzer

The standard analyzer is the default analyzer used when no other analyzer is specified. It is designed to provide a basic and efficient approach to generic text processing.

This analyzer consists of the following tokenizers and token filters:

standard tokenizer: Removes most punctuation and splits text on spaces and other common delimiters.
lowercase token filter: Converts all tokens to lowercase, ensuring case-insensitive matching.
stop token filter: Removes common stopwords, such as “the”, “is”, and “and”, from the tokenized output.

Example

Use the following command to create an index named my_standard_index with a standard analyzer:

PUT /my_standard_index
{
  "mappings": {
    "properties": {
      "my_field": {
        "type": "text",
        "analyzer": "standard"  
      }
    }
  }
}

Parameters

You can configure a standard analyzer with the following parameters.

Parameter	Required/Optional	Data type	Description
`max_token_length`	Optional	Integer	Sets the maximum length of the produced token. If this length is exceeded, the token is split into multiple tokens at the length configured in `max_token_length`. Default is `255`.
`stopwords`	Optional	String or list of strings	A string specifying a predefined list of stopwords (such as `_english_`) or an array specifying a custom list of stopwords. Default is `_none_`.
`stopwords_path`	Optional	String	The path (absolute or relative to the config directory) to the file containing a list of stop words.

Configuring a custom analyzer

Use the following command to configure an index with a custom analyzer that is equivalent to the standard analyzer:

PUT /my_custom_index
{
  "settings": {
    "analysis": {
      "analyzer": {
        "my_custom_analyzer": {
          "type": "custom",
          "tokenizer": "standard",
          "filter": [
            "lowercase", 
            "stop"
          ]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

POST /my_custom_index/_analyze
{
  "analyzer": "my_custom_analyzer",
  "text": "The slow turtle swims away"
}

The response contains the generated tokens:

{
  "tokens": [
    {"token": "slow","start_offset": 4,"end_offset": 8,"type": "<ALPHANUM>","position": 1},
    {"token": "turtle","start_offset": 9,"end_offset": 15,"type": "<ALPHANUM>","position": 2},
    {"token": "swims","start_offset": 16,"end_offset": 21,"type": "<ALPHANUM>","position": 3},
    {"token": "away","start_offset": 22,"end_offset": 26,"type": "<ALPHANUM>","position": 4}
  ]
}

Example
Parameters
Configuring a custom analyzer
Generated tokens

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Standard analyzer

Example

Parameters

Configuring a custom analyzer

Generated tokens

OpenSearch Links

Get Involved

Resources

Contact Us