The OpenSearch Data Prepper maintainers are happy to announce the release of Data Prepper 2.15. With this version, you can ingest data from Apache Iceberg. This release also extends Prometheus support with a remote-write source and the ability to send data to open-source Prometheus.
Apache Iceberg source
Apache Iceberg is an open table format widely used for lakehouse architectures. Iceberg tables often serve as the single source of truth for curated, transformed data. A common need is to keep OpenSearch synchronized with these tables for performing search and powering real-time dashboards. Until now, doing so required you to build custom ingestion jobs on a distributed compute engine to read Iceberg changelogs and write to OpenSearch, adding operational complexity for what is essentially a data-movement task.
Data Prepper 2.15 introduces an experimental iceberg source plugin that captures row-level changes from Iceberg tables and ingests them into sink targets such as OpenSearch. The plugin first exports the full table state and then continuously polls for new snapshots and processes incremental INSERT, UPDATE, and DELETE operations.
The following example pipeline reads the changes from an Iceberg table using a REST catalog and writes them to OpenSearch:
iceberg-cdc-pipeline:
source:
iceberg:
tables:
- table_name: "my_database.my_table"
catalog:
type: rest
uri: "http://iceberg-rest-catalog:8181"
io-impl: "org.apache.iceberg.aws.s3.S3FileIO"
client.region: "us-east-1"
identifier_columns: ["id"]
sink:
- opensearch:
hosts: ["https://localhost:9200"]
index: "my-index"
action: "${getMetadata(\"bulk_action\")}"
document_id: "${getMetadata(\"document_id\")}"
Open source Prometheus as a sink
Data Prepper 2.14 introduced the ability to write metrics to Amazon Managed Service for Prometheus. With Data Prepper 2.15, the Prometheus sink additionally supports open-source Prometheus. You can send metrics to any Prometheus-compatible endpoint without AWS authentication by using either no authentication or HTTP basic authentication.
The following example pipeline writes metrics to an open-source Prometheus instance using basic authentication:
prometheus-pipeline:
source:
otel_metrics_source:
sink:
- prometheus:
url: "http://my-prometheus-server:9090/api/v1/write"
authentication:
http_basic:
username: "myuser"
password: "mypassword"
Prometheus source
Data Prepper 2.15 introduces a new Prometheus source that ingests metrics through the Prometheus Remote-Write protocol. This allows Prometheus servers to forward metrics directly to Data Prepper, which then converts them into OpenTelemetry-compatible metric events for downstream processing.
The source accepts Snappy-compressed, Protobuf-encoded Remote-Write requests over HTTP and supports all standard Prometheus metric types, including counters, gauges, histograms, and summaries. You can use it alongside the Prometheus sink to build end-to-end Prometheus metric pipelines in Data Prepper.
The following example pipeline receives metrics from a Prometheus server and writes them to OpenSearch:
prometheus-pipeline:
source:
prometheus:
port: 9090
path: "/api/v1/write"
sink:
- opensearch:
hosts: ["https://localhost:9200"]
index: prometheus-metrics
Composable functions
You can use Data Prepper expressions to make your pipelines dynamic and tailored to your needs. Expressions let you route data, mutate events, and apply conditional logic within your pipeline configuration. You may already be using functions within expressions to build rich conditions. Data Prepper 2.15 takes this further by letting you compose functions for even more advanced expressions.
For example, you can calculate the approximate size of an event by converting the event to a JSON string and obtaining the string length:
- add_entries:
entries:
- key: "approximateSize"
value_expression: 'length(toJsonString())'
Improved application performance monitoring
Data Prepper 2.15 fixes an issue in the APM service map processor in which latency metrics exported to Prometheus had a duplicate _seconds suffix in the metric name, resulting in latency_seconds_seconds. The metric name is now correctly exported as latency_seconds.
Other notable changes
This release includes the following additional improvements:
- The
s3sink now supports custom KMS keys for server-side encryption. - A new
s3_enrichprocessor lets you enrich events with data stored in S3 buckets. - Data Prepper expressions now support new substring functions:
substringAfter,substringBefore,substringAfterLast, andsubstringBeforeLast. - The
sqssink is no longer experimental and is ready for production use.
Getting started
Use the following resources to get up and running with Data Prepper 2.15:
- To learn about all the changes see the 2.15.0 release notes
- To download Data Prepper, visit the Download & Get Started page.
- For information about getting started with Data Prepper, see Getting started with OpenSearch Data Prepper.
- To learn more about upcoming work for Data Prepper, see the Data Prepper Project Roadmap.
Thanks to our contributors!
Thanks to the following community members who contributed to this release:
- bagmarnikhil — Nikhil Bagmar
- BhattacharyaSumit — Sumit Bhattacharya
- Davidding4718 — Siqi Ding
- dinujoh — Dinu John
- divbok — Divyansh Bokadia
- dlvenable — David Venable
- enuraju — Raju Enugula
- graytaylor0 — Taylor Gray
- JongminChung
- kaimst — Kai Sternad
- Keyur-S-Patel — Keyur Patel
- kkondaka — Krishna Kondaka
- kylehounslow — Kyle Hounslow
- lawofcycles — Sotaro Hikita
- oeyh — Hai Yan
- ps48 — Shenoy Pratik
- srikanthpadakanti — Srikanth Padakanti
- TomasLongo — Tomas
- vamsimanohar — Vamsi Manohar
- Zhangxunmt — Xun Zhang