The OpenSearch Data Prepper maintainers are happy to announce the release of Data Prepper 2.14. This version expands support for observability use cases with a new application performance monitoring (APM) service map and improved Prometheus support.

APM service map

The otel_apm_service_map processor analyzes OpenTelemetry trace spans to automatically generate APM service map relationships and metrics. It creates structured events that can be visualized as service topology graphs, showing how services communicate with each other and their performance characteristics.

Key features include:

  • Automatic service relationship discovery: Identifies service-to-service interactions from OpenTelemetry spans.
  • APM metrics generation: Creates latency, throughput, and error rate metrics for service interactions using three-window processing with sliding time windows to ensure complete trace context.
  • Environment awareness: Derives new attributes from existing span attributes to support service environment grouping and custom attributes. It includes environment detection capabilities for Amazon Elastic Compute Cloud (Amazon EC2), Amazon Elastic Container Service (Amazon ECS), Amazon Elastic Kubernetes Service (Amazon EKS), AWS Lambda, and Amazon API Gateway and can be extended to support other cloud providers.
  • Service map snapshots: Enables users to view service connections for specific time periods with customizable resource attribute filtering.

Improved Prometheus sink support

The Prometheus sink now ensures compliance with remote write requirements through integrated sorting and deduplication logic. It chronologically organizes incoming events and strips duplicate samples for identical series/timestamps before transmission, preventing broker-side rejections.

To further handle data ingestion challenges, the new out_of_order_time_window option allows a configurable grace period for late-arriving data. This window enables the sink to accept and re-sort samples that arrive out of sequence, significantly improving pipeline resilience in distributed environments where perfectly ordered delivery is difficult to maintain.

AWS Lambda streaming

One of AWS Lambda’s features is response streaming, which allows functions to stream data back to clients. This reduces latency for the first responses and supports larger payloads, up to 200 MB.

In Data Prepper 2.14, you can now configure the aws_lambda processor to use streaming invocations. This allows you to receive responses larger than 6 MB, making it especially useful when the output exceeds the size of the input data.

Cross-region s3 sink

Data Prepper’s s3 sink now supports writing to Amazon Simple Storage Service (Amazon S3) buckets across multiple AWS Regions.

Previously, a single s3 sink could only write to buckets in one Region, which limited the use of one of its key features—dynamic bucket names.

With this enhancement, you can specify dynamic bucket names that adapt to different Regions. For example, you can define a bucket like myorganization-${/aws/region}. Data Prepper will then write to buckets such as myorganization-us-east-2 and myorganization-eu-central-1.

forward_to pipelines

In certain workflows, you may need to send data to sinks in a specific order or use the output from one sink as input for another.

The opensearch sink now supports the forward_to configuration. This allows you to define a target pipeline that receives events after they are written to OpenSearch. The forwarded events include the document ID field.

ARM architecture support

Data Prepper now provides a multi-architecture Docker image with support for both ARM and x86.

As many organizations adopt ARM to reduce compute costs, this change allows you to pull Data Prepper images directly on ARM systems without relying on emulation.

Additionally, Data Prepper offers ARM archive files, making it easier to run on ARM systems that do not use Docker.

Other notable changes

  • The Data Prepper Docker image is now 46% smaller and has fewer layers, improving Docker pull times.
  • The AWS Lambda processor now supports improved timeout configuration.
  • The aggregate processor now has enhanced support for end-to-end acknowledgments and configurations for disabling acknowledgments.
  • Data Prepper provides several new metrics for observing pipeline health.

Getting started

Thanks to our contributors!

Thanks to the following community members who contributed to this release!

Authors

  • Krishna is a senior software engineer working on observability in OpenSearch at Amazon Web Services. He is also a contributor to the Data Prepper project. Prior to joining AWS, Krishna worked on development of AI infrastructure and caching services at Facebook. In addition, he has significant experience in developing networking products from his time at Cisco Systems and VMWare.

    View all posts