Skip to main content
search

The OpenSearch Data Prepper maintainers are happy to announce the release of Data Prepper 2.13. This release includes a number of improvements and new capabilities that make Data Prepper easier to use.

Prometheus sink

Data Prepper now supports Prometheus as a sink—initially, only Amazon Managed Service for Prometheus is supported as the external Prometheus sink. This enables you to export metric data processed within Data Prepper pipelines to the Prometheus ecosystem and allows Data Prepper to serve as a bridge between various metric sources (like OpenTelemetry, Logstash, or Amazon Simple Storage Service [Amazon S3]) and Prometheus-compatible monitoring systems.

A core aspect of the Prometheus sink is its handling of different metric types. The implementation ensures that Data Prepper’s internal metric representations are correctly mapped to Prometheus time series families:

  • Counters: For Sum metrics with cumulative aggregation temporality and monotonically increasing values, the sink generates a single time series using the metric name. The value represents the cumulative count.
  • Gauges: Similar to counters, Gauge metrics are mapped to a single time series with the current value and also for Sum metrics that are not mapped to counters.
  • Summaries: Summary metrics are converted into a time series with quantile labels, along with corresponding \_sum and \_count series.
  • Histograms: Support for histograms is more complex. The sink generates many distinct types of time series for each histogram metric to fully represent the distribution, including bucketssumcountmin, and max.
  • Exponential histograms: Support for histograms is more complex. The sink generates many distinct types of time series for each histogram metric to fully represent the distribution, including scalezero thresholdzero countsumcountmin, and max.

In addition to mapping metrics, the sink handles attribute labeling and name sanitization, creating labels for all metric, resource, and scope attributes.

It can be easily configured for Amazon Managed Service for Prometheus as follows:

sink:
  - prometheus:
      url: <amp workspace remote-write api url>
      aws:
         region: <region>
         sts_role_arn: <role-arn>

OpenSearch data stream support

Data Prepper now supports OpenSearch data streams natively in the opensearch sink. With this change, Data Prepper will look up the index to determine whether it is a data stream. If so, it will configure the bulk writes to the sink so that they work directly with data streams.

Prior to this feature, Data Prepper pipeline authors would need to make manual adjustments to the sink configuration to write to data stream indexes. Now users can create a minimal sink configuration that will set up the sink correctly. Additionally, Data Prepper will automatically set the @timestamp field to the time received by Data Prepper if the pipeline does not already set this value.

For example, the configuration could be as simple as the following:

sink:
  - opensearch:
      hosts: [ "https://localhost:9200" ]
      index: my-log-index

Cross-Region s3 source

The s3 source is a popular Data Prepper feature for ingesting data from S3 buckets. This source can read from S3 buckets using Amazon Simple Queue Service (Amazon SQS) notifications or scan multiple S3 buckets. It is common for users to have S3 buckets in multiple AWS Regions that they want to read in a single pipeline. For example, some teams may want to get VPC flow logs from multiple Regions and consolidate them into a single OpenSearch cluster. Now Data Prepper users can read from multiple buckets in different Regions. And there is no need to create a custom configuration for this feature—Data Prepper will handle this for customers.

Other great changes

  • The maintainers have invested in performance improvements for expressions and core processors. Our benchmarking indicates that this has improved throughput by over 20% when using expressions.
  • The dynamodb source now fully checkpoints within shards. This change reduces duplicate processing from Amazon DynamoDB tables when failures occur. Before this change, when restarting reading from a DynamoDB shard, Data Prepper would start from the beginning of the shard. With this change, a Data Prepper node will start from the last successfully processed event in the shard.
  • The delete_entries and select_entries processors now support regex patterns to determine whether to delete or select fields to help pipeline authors clean up their events.
  • The rename_keys processor can now normalize keys, allowing pipeline authors to write simple pipelines to get data into OpenSearch.

Getting started

Thanks to our contributors!

Thanks to the following community members who contributed to this release!

Authors

  • Krishna is a senior software engineer working on observability in OpenSearch at Amazon Web Services. He is also a contributor to the Data Prepper project. Prior to joining AWS, Krishna worked on development of AI infrastructure and caching services at Facebook. In addition, he has significant experience in developing networking products from his time at Cisco Systems and VMWare.

    View all posts
  • David is a senior software engineer working on observability in OpenSearch at Amazon Web Services. He is a maintainer on the Data Prepper project. Prior to working at Amazon, he was the CTO at Allogy Interactive - a start-up creating mobile-learning solutions for healthcare.

    View all posts