Link Search Menu Expand Document Documentation Menu

You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Split processor

Introduced 2.17

The split processor splits a string field into an array of substrings based on a specified delimiter.

Request body fields

The following table lists all available request fields.

Field Data type Description
field String The field containing the string to be split. Required.
separator String The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required.
preserve_trailing Boolean If set to true, preserves empty trailing fields (for example, '') in the resulting array. If set to false, then empty trailing fields are removed from the resulting array. Default is false.
target_field String The field in which the array of substrings is stored. If not specified, then the field is updated in place.
tag String The processor’s identifier.
description String A description of the processor.
ignore_failure Boolean If true, then OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false.

Example

The following example demonstrates using a search pipeline with a split processor.

Setup

Create an index named my_index and index a document containing the field message:

POST /my_index/_doc/1
{
  "message": "ingest, search, visualize, and analyze data",
  "visibility": "public"
}

Creating a search pipeline

The following request creates a search pipeline with a split response processor that splits the message field and stores the results in the split_message field:

PUT /_search/pipeline/my_pipeline
{
  "response_processors": [
    {
      "split": {
        "field": "message",
        "separator": ", ",
        "target_field": "split_message"
      }
    }
  ]
}

Using a search pipeline

Search for documents in my_index without a search pipeline:

GET /my_index/_search

The response contains the field message:

Response
{
  "took": 3,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "message": "ingest, search, visualize, and analyze data",
          "visibility": "public"
        }
      }
    ]
  }
}

To search with a pipeline, specify the pipeline name in the search_pipeline query parameter:

GET /my_index/_search?search_pipeline=my_pipeline

The message field is split and the results are stored in the split_message field:

Response
{
  "took": 6,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "visibility": "public",
          "message": "ingest, search, visualize, and analyze data",
          "split_message": [
            "ingest",
            "search",
            "visualize",
            "and analyze data"
          ]
        }
      }
    ]
  }
}

You can also use the fields option to search for specific fields in a document:

POST /my_index/_search?pretty&search_pipeline=my_pipeline
{
    "fields": ["visibility", "message"]
}

In the response, the message field is split and the results are stored in the split_message field:

Response
{
  "took": 7,
  "timed_out": false,
  "_shards": {
    "total": 1,
    "successful": 1,
    "skipped": 0,
    "failed": 0
  },
  "hits": {
    "total": {
      "value": 1,
      "relation": "eq"
    },
    "max_score": 1,
    "hits": [
      {
        "_index": "my_index",
        "_id": "1",
        "_score": 1,
        "_source": {
          "visibility": "public",
          "message": "ingest, search, visualize, and analyze data",
          "split_message": [
            "ingest",
            "search",
            "visualize",
            "and analyze data"
          ]
        },
        "fields": {
          "visibility": [
            "public"
          ],
          "message": [
            "ingest, search, visualize, and analyze data"
          ],
          "split_message": [
            "ingest",
            "search",
            "visualize",
            "and analyze data"
          ]
        }
      }
    ]
  }
}
350 characters left

Have a question? .

Want to contribute? or .