You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Delimited payload token filter

The delimited_payload token filter is used to parse tokens containing payloads during the analysis process. For example, the string red|1.5 fast|2.0 car|1.0 is parsed into the tokens red (with a payload of 1.5), fast (with a payload of 2.0), and car (with a payload of 1.0). This is particularly useful when your tokens include additional associated data (like weights, scores, or other numeric values) that you can use for scoring or custom query logic. The filter can handle different types of payloads, including integers, floats, and strings, and attach payloads (extra metadata) to tokens.

When analyzing text, the delimited_payload token filter parses each token, extracts the payload, and attaches it to the token. This payload can later be used in queries to influence scoring, boosting, or other custom behaviors.

Payloads are stored as Base64-encoded strings. By default, payloads are not returned in the query response along with the tokens. To return the payloads, you must configure additional parameters. For more information, see Example with a stored payload.

Parameters

The delimited_payload token filter has two parameters.

Parameter	Required/Optional	Data type	Description
`encoding`	Optional	String	Specifies the data type of the payload attached to the tokens. This determines how the payload data is interpreted during analysis and querying. Valid values are: - `float`: The payload is interpreted as a 32-bit floating-point number using IEEE 754 format (for example, `2.5` in `car\|2.5`). - `identity`: The payload is interpreted as a sequence of characters (for example, in `user\|admin`, `admin` is interpreted as a string). - `int`: The payload is interpreted as a 32-bit integer (for example, `1` in `priority\|1`). Default is `float`.
`delimiter`	Optional	String	Specifies the character that separates the token from its payload in the input text. Default is the pipe character (`\|`).

Example without a stored payload

The following example request creates a new index named my_index and configures an analyzer with a delimited_payload filter:

PUT /my_index
{
  "settings": {
    "analysis": {
      "filter": {
        "my_payload_filter": {
          "type": "delimited_payload",
          "delimiter": "|",
          "encoding": "float"
        }
      },
      "analyzer": {
        "my_analyzer": {
          "type": "custom",
          "tokenizer": "whitespace",
          "filter": ["my_payload_filter"]
        }
      }
    }
  }
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

POST /my_index/_analyze
{
  "analyzer": "my_analyzer",
  "text": "red|1.5 fast|2.0 car|1.0"
}

The response contains the generated tokens:

{
  "tokens": [
    {
      "token": "red",
      "start_offset": 0,
      "end_offset": 7,
      "type": "word",
      "position": 0
    },
    {
      "token": "fast",
      "start_offset": 8,
      "end_offset": 16,
      "type": "word",
      "position": 1
    },
    {
      "token": "car",
      "start_offset": 17,
      "end_offset": 24,
      "type": "word",
      "position": 2
    }
  ]
}

Example with a stored payload

To configure the payload to be returned in the response, create an index that stores term vectors and set term_vector to with_positions_payloads or with_positions_offsets_payloads in the index mappings. For example, the following index is configured to store term vectors:

PUT /visible_payloads
{
  "mappings": {
    "properties": {
      "text": {
        "type": "text",
        "term_vector": "with_positions_payloads",
        "analyzer": "custom_analyzer"
      }
    }
  },
  "settings": {
    "analysis": {
      "filter": {
        "my_payload_filter": {
          "type": "delimited_payload",
          "delimiter": "|",
          "encoding": "float"
        }
      },
      "analyzer": {
        "custom_analyzer": {
          "tokenizer": "whitespace",
          "filter": [ "my_payload_filter" ]
        }
      }
    }
  }
}

You can index a document into this index using the following request:

PUT /visible_payloads/_doc/1
{
  "text": "red|1.5 fast|2.0 car|1.0"
}

Generated tokens

Use the following request to examine the tokens generated using the analyzer:

GET /visible_payloads/_termvectors/1
{
  "fields": ["text"]
}

The response contains the generated tokens, which include payloads:

{
  "_index": "visible_payloads",
  "_id": "1",
  "_version": 1,
  "found": true,
  "took": 3,
  "term_vectors": {
    "text": {
      "field_statistics": {
        "sum_doc_freq": 3,
        "doc_count": 1,
        "sum_ttf": 3
      },
      "terms": {
        "brown": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 1,
              "start_offset": 10,
              "end_offset": 19,
              "payload": "QEAAAA=="
            }
          ]
        },
        "fox": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 2,
              "start_offset": 20,
              "end_offset": 27,
              "payload": "P8AAAA=="
            }
          ]
        },
        "quick": {
          "term_freq": 1,
          "tokens": [
            {
              "position": 0,
              "start_offset": 0,
              "end_offset": 9,
              "payload": "QCAAAA=="
            }
          ]
        }
      }
    }
  }
}

Parameters
Example without a stored payload
Generated tokens
Example with a stored payload
Generated tokens

WAS THIS PAGE HELPFUL?

✔ Yes ✖ No

Tell us why

350 characters left

Have a question? Ask us on the OpenSearch forum.

Want to contribute? Edit this page or create an issue.

Delimited payload token filter

Parameters

Example without a stored payload

Generated tokens

Example with a stored payload

Generated tokens

OpenSearch Links

Get Involved

Resources

Contact Us