Split processor
Introduced 2.17
The split
processor splits a string field into an array of substrings based on a specified delimiter.
Request body fields
The following table lists all available request fields.
Field | Data type | Description |
---|---|---|
field | String | The field containing the string to be split. Required. |
separator | String | The delimiter used to split the string. Specify either a single separator character or a regular expression pattern. Required. |
preserve_trailing | Boolean | If set to true , preserves empty trailing fields (for example, '' ) in the resulting array. If set to false , then empty trailing fields are removed from the resulting array. Default is false . |
target_field | String | The field in which the array of substrings is stored. If not specified, then the field is updated in place. |
tag | String | The processor’s identifier. |
description | String | A description of the processor. |
ignore_failure | Boolean | If true , then OpenSearch ignores any failure of this processor and continues to run the remaining processors in the search pipeline. Optional. Default is false . |
Example
The following example demonstrates using a search pipeline with a split
processor.
Setup
Create an index named my_index
and index a document containing the field message
:
POST /my_index/_doc/1
{
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
Creating a search pipeline
The following request creates a search pipeline with a split
response processor that splits the message
field and stores the results in the split_message
field:
PUT /_search/pipeline/my_pipeline
{
"response_processors": [
{
"split": {
"field": "message",
"separator": ", ",
"target_field": "split_message"
}
}
]
}
Using a search pipeline
Search for documents in my_index
without a search pipeline:
GET /my_index/_search
The response contains the field message
:
Response
{
"took": 3,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"message": "ingest, search, visualize, and analyze data",
"visibility": "public"
}
}
]
}
}
To search with a pipeline, specify the pipeline name in the search_pipeline
query parameter:
GET /my_index/_search?search_pipeline=my_pipeline
The message
field is split and the results are stored in the split_message
field:
Response
{
"took": 6,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}
You can also use the fields
option to search for specific fields in a document:
POST /my_index/_search?pretty&search_pipeline=my_pipeline
{
"fields": ["visibility", "message"]
}
In the response, the message
field is split and the results are stored in the split_message
field:
Response
{
"took": 7,
"timed_out": false,
"_shards": {
"total": 1,
"successful": 1,
"skipped": 0,
"failed": 0
},
"hits": {
"total": {
"value": 1,
"relation": "eq"
},
"max_score": 1,
"hits": [
{
"_index": "my_index",
"_id": "1",
"_score": 1,
"_source": {
"visibility": "public",
"message": "ingest, search, visualize, and analyze data",
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
},
"fields": {
"visibility": [
"public"
],
"message": [
"ingest, search, visualize, and analyze data"
],
"split_message": [
"ingest",
"search",
"visualize",
"and analyze data"
]
}
}
]
}
}