Announcing Data Prepper 2.3.0
Data Prepper 2.3.0 is now available for download! This release introduces a number of changes that improve Data Prepper’s ability to process complex expressions with functions, arithmetic operations, and string operations.
Enhancements to Data Prepper expressions
Data Prepper 2.3 supports using functions in expressions. A list of supported functions is available in the Expression syntax documentation.
Data Prepper 2.3 supports three types of expressions; conditional, arithmetic, and string.
Conditional expressions
Conditional expressions evaluate to a result of Boolean type, and the expressions can now have functions in them. For example, length(/message) > 20
would evaluate to true if the length of the message field in the event is greater than 20. Otherwise it evaluates to false.
Arithmetic expressions
Arithmetic expressions evaluate to a result of integer or float type. The expressions can have simple arithmetic operators like +,-,*,/
, with functions, JSON pointers, or literals as operands. For example, the following expression adds the length of message
field in the event with the value of integerField
in the event metadata and subtracts 1 from it.
length(/message) + getMetadata("integerField") - 1
String expressions
String expressions evaluate to a result of string type. The string concatenation operator is supported, and it uses functions, JSON pointers, or literals as operands. For example, the following expression concatenates the message1
field of type string in the event with the message2
field in the event and appends “suffix” to it:
/message1 + /message2 + "suffix"
Event tagging
Data Prepper 2.3 supports tagging events while using the grok
processor. Events can be tagged optionally be tagged using the following configuration:
processor:
- grok:
match:
message: <pattern>
tags_on_match_failure: ["grok_match_fail", "another_tag"]
The presence of tags can be checked in conditional expressions in different processors or routing using the hasTags()
function to perform conditional ingestion. For example, a conditional expression checking for the presence of tag grok_match_fail
would be hasTags("grok_match_fail")
.
Enhancements to add_entries
processor
The add_entries
processor has been enhanced to support values with expressions and to set event metadata keys.
Expression-based value
The add_entries
processor has been enhanced to support adding values based on an expression where the return type can be Boolean, integer or float, or string. The following example shows an add_entries
processor configuration with a value_expression
option:
processor:
- add_entries:
entries:
- key: "request_len"
value_expression: "length(/request)"
This configuration adds an entry with the key request_len
with a value equal to the length of the request field in the key. The value expression can be any of the expressions supported. See Supported operators for more details.
Setting event metadata keys
The add_entries
processor can add entries in the event’s metadata instead in the event itself. The following example shows an add_entries
processor configuration for adding an entry to metadata:
processor:
- add_entries:
entries:
- metadata_key: "request_len"
value_expression: "length(/request)"
This configuration adds an attribute to event metadata with the attribute key request_len
with value equal to the length of the request field in the key. The value can be set using the value, format, or value_expression
options of the entries field.
Amazon S3 sink
Data Prepper now supports saving data to Amazon Simple Storage Service (Amazon S3) sinks as Newline delimited JSON (ndjson). Amazon S3 is a popular choice for storing large volumes of data reliably and cost effectively.
Ingesting data into Amazon S3 offers many possibilities for your data pipelines, including some of the following:
- Noisy or uninteresting data can be routed into Amazon S3 and not to OpenSearch in order to reduce load on your OpenSearch cluster. This can help you save on compute and storage costs.
- Ingestion data into Amazon S3 to have normalized data on-hand that you can use for future processing.
Tail sampling
Data Prepper 2.3.0 supports tail sampling to limit the number of events that are sent to a sink, similar to the tail sampling support provided by OpenTelemetry. For information about tail sampling in OpenTelemetry, see the blog post “Tail Sampling with OpenTelemetry: Why it’s useful, how to do it, and what to consider”.
Tail processing in Data Prepper is supported as an action to the aggregate
processor. The events are stored in the aggregate
processor beyond the group_duration
time until no events are received in the last wait_period
time.
For example, the following configuration sends all traces with errors to the sink and non-error events are sampled with the user-specified percent
probabilistic sampler:
trace-normal-pipeline:
source:
otel_trace_source:
ssl: false
processor:
- trace_peer_forwarder:
processor:
- aggregate:
identification_keys: ["traceId"]
action:
tail_sampler:
percent: 20
wait_period: "15s"
error_condition: "/traceGroupFields/statusCode == 2"
group_duration: "30s"
sink:
- opensearch
hosts: ["https://opensearch:9200"]
insecure: true
username: "admin"
password: "admin"
index: sampled-traces
Obfuscate processor:
Data Prepper 2.3.0 supports obfuscation of sensitive data by replacing specified field’s value with mask character(s). For example, the following configuration obfuscates pattern matching values for the key specified in the field source
. If no pattern is specified, the entire value corresponding to the key specified in the field source
is obfuscated.
processor:
- obfuscate:
source: "log"
target: "new_log"
patterns:
- "[A-Za-z0-9+_.-]+@([\\w-]+\\.)+[\\w-]{2,4}"
action:
mask:
mask_character: "#"
mask_character_length: 6
- obfuscate:
source: "phone"
Getting started
- To download Data Prepper, see the OpenSearch downloads page.
- For instructions on how to get started with Data Prepper, see Getting started with Data Prepper.
- To learn more about the work in progress for Data Prepper 2.4, see the Data Prepper roadmap.
Thanks to our contributors!
The following community members contributed to this release. Thank you!
- ajeeshakd - Ajeesh Gopalakrishnakurup
- ashoktelukuntla - Ashok Telukuntla
- asifsmohammed - Asif Sohail Mohammed
- chenqi0805 - Qi Chen
- cmanning09 - Christopher Manning
- daixba - Aiden Dai
- deepaksahu562 - Deepak Sahu
- dlvenable - David Venable
- engechas - Chase Engelbrecht
- graytaylor0 - Taylor Gray
- kkondaka - Krishna Kondaka
- oeyh - Hai Yan
- rajeshLovesToCode - Rajesh
- tmonty12 - Thomas Montfort
- udaych20 - Uday Chintala
- umayr-codes - Umair Husain
- wanghd89