Announcing Data Prepper 2.3.0

Tue, Jun 06, 2023 · Krishna Kondaka, David Venable

Data Prepper 2.3.0 is now available for download! This release introduces a number of changes that improve Data Prepper’s ability to process complex expressions with functions, arithmetic operations, and string operations.

Enhancements to Data Prepper expressions

Data Prepper 2.3 supports using functions in expressions. A list of supported functions is available in the Expression syntax documentation.

Data Prepper 2.3 supports three types of expressions; conditional, arithmetic, and string.

Conditional expressions

Conditional expressions evaluate to a result of Boolean type, and the expressions can now have functions in them. For example, length(/message) > 20 would evaluate to true if the length of the message field in the event is greater than 20. Otherwise it evaluates to false.

Arithmetic expressions

Arithmetic expressions evaluate to a result of integer or float type. The expressions can have simple arithmetic operators like +,-,*,/, with functions, JSON pointers, or literals as operands. For example, the following expression adds the length of message field in the event with the value of integerField in the event metadata and subtracts 1 from it.

length(/message) + getMetadata("integerField") - 1

String expressions

String expressions evaluate to a result of string type. The string concatenation operator is supported, and it uses functions, JSON pointers, or literals as operands. For example, the following expression concatenates the message1 field of type string in the event with the message2 field in the event and appends “suffix” to it:

/message1 + /message2 + "suffix"

Event tagging

Data Prepper 2.3 supports tagging events while using the grok processor. Events can be tagged optionally be tagged using the following configuration:

processor:
   - grok:
        match:
          message: <pattern>
        tags_on_match_failure: ["grok_match_fail", "another_tag"]

The presence of tags can be checked in conditional expressions in different processors or routing using the hasTags() function to perform conditional ingestion. For example, a conditional expression checking for the presence of tag grok_match_fail would be hasTags("grok_match_fail").

Enhancements to add_entries processor

The add_entries processor has been enhanced to support values with expressions and to set event metadata keys.

Expression-based value

The add_entries processor has been enhanced to support adding values based on an expression where the return type can be Boolean, integer or float, or string. The following example shows an add_entries processor configuration with a value_expression option:

processor:
   - add_entries:
       entries:
         - key: "request_len"
           value_expression: "length(/request)"

This configuration adds an entry with the key request_len with a value equal to the length of the request field in the key. The value expression can be any of the expressions supported. See Supported operators for more details.

Setting event metadata keys

The add_entries processor can add entries in the event’s metadata instead in the event itself. The following example shows an add_entries processor configuration for adding an entry to metadata:

processor:
   - add_entries:
       entries:
         - metadata_key: "request_len"
           value_expression: "length(/request)"

This configuration adds an attribute to event metadata with the attribute key request_len with value equal to the length of the request field in the key. The value can be set using the value, format, or value_expression options of the entries field.

Amazon S3 sink

Data Prepper now supports saving data to Amazon Simple Storage Service (Amazon S3) sinks as Newline delimited JSON (ndjson). Amazon S3 is a popular choice for storing large volumes of data reliably and cost effectively.

Ingesting data into Amazon S3 offers many possibilities for your data pipelines, including some of the following:

  • Noisy or uninteresting data can be routed into Amazon S3 and not to OpenSearch in order to reduce load on your OpenSearch cluster. This can help you save on compute and storage costs.
  • Ingestion data into Amazon S3 to have normalized data on-hand that you can use for future processing.

Tail sampling

Data Prepper 2.3.0 supports tail sampling to limit the number of events that are sent to a sink, similar to the tail sampling support provided by OpenTelemetry. For information about tail sampling in OpenTelemetry, see the blog post “Tail Sampling with OpenTelemetry: Why it’s useful, how to do it, and what to consider”.

Tail processing in Data Prepper is supported as an action to the aggregate processor. The events are stored in the aggregate processor beyond the group_duration time until no events are received in the last wait_period time.

For example, the following configuration sends all traces with errors to the sink and non-error events are sampled with the user-specified percent probabilistic sampler:

trace-normal-pipeline:
  source:
    otel_trace_source:
      ssl: false
  processor:
    - trace_peer_forwarder:
  processor:
    - aggregate:
        identification_keys: ["traceId"]
        action:
          tail_sampler:
            percent: 20
            wait_period: "15s"
            error_condition: "/traceGroupFields/statusCode == 2"
        group_duration: "30s"
 sink:
    - opensearch
         hosts: ["https://opensearch:9200"]
         insecure: true
         username: "admin"
         password: "admin"
         index: sampled-traces

Obfuscate processor:

Data Prepper 2.3.0 supports obfuscation of sensitive data by replacing specified field’s value with mask character(s). For example, the following configuration obfuscates pattern matching values for the key specified in the field source. If no pattern is specified, the entire value corresponding to the key specified in the field source is obfuscated.

processor:
  - obfuscate:
      source: "log"
      target: "new_log"
      patterns:
        - "[A-Za-z0-9+_.-]+@([\\w-]+\\.)+[\\w-]{2,4}"
      action:
        mask:
          mask_character: "#"
          mask_character_length: 6
  - obfuscate:
      source: "phone"

Getting started

Thanks to our contributors!

The following community members contributed to this release. Thank you!