You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.
Split processor
Introduced 2.17
The split
processor splits a string field into an array of substrings based on a specified delimiter.
Request body fields
The following table lists all available request fields.
Field | Data type | Description |
---|---|---|
field | String | The field containing the string to be split. Required. |
separator | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required. |
preserve_trailing | Boolean | If set to true , preserves empty trailing fields (for example, '' ) in the resulting array. If set to false , then empty trailing fields are removed from the resulting array. Default is false . |
target_field | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place. |
tag | String | The processor’s identifier. |
description | String | A description of the processor. |
ignore_failure | Boolean | If true , then OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false . |
Example
The following example demonstrates using a search pipeline with a split
processor.
Setup
Create an index named my_index
and index a document containing the field message
:
POST /my_index/_doc/1
{
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
Creating a search pipeline
The following request creates a search pipeline with a split
response processor that splits the message
field and stores the results in the split_message
field:
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"split": {
"field": "message",
"separator": ", ",
"target_field": "split_message"
}
}
]
}
Using a search pipeline
Search for documents in my_index
without a search pipeline:
GET /my_index/_search
The response contains the field message
:
Response
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
}
]
}
}
To search with a pipeline, specify the pipeline name in the search_pipeline
query parameter:
GET /my_index/_search?search_pipeline=my_pipeline
The message
field is split and the results are stored in the split_message
field:
Response
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}
You can also use the fields
option to search for specific fields in a document:
POST /my_index/_search?pretty&search_pipeline=my_pipeline
{
"fields": ["visibility", "message"]
}
In the response, the message
field is split and the results are stored in the split_message
field:
Response
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
},
"fields": {
"visibility": [
"public"
],
"message": [
"ingest, search, visualize, and analyze data"
],
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}