Most Recent Articles
OpenSearchCon 2024: North America -- The blog post	Dec 09
Give back and go forward: Driving community contributions from vendor led to ...	Nov 29
OpenSearch Project update: A look at performance progress through version 2.17	Nov 27
Introducing byte vector support for Faiss in the OpenSearch vector engine	Nov 26
Optimize your OpenSearch costs using binary vectors	Nov 25
Boosting k-NN exact search performance	Nov 19
Introducing OpenSearch nightly playgrounds: Preview new OpenSearch features live	Nov 07
Get started with OpenSearch 2.18	Nov 06
Step-by-step: Creating a new database integration using Data Prepper	Nov 05
Gain deeper insights with OpenSearch Query Insights	Oct 31

Announcing Data Prepper 2.6.0

Tue, Nov 28, 2023 · David Venable

Data Prepper 2.6.0 is now available for download. Now you can now ingest data from DynamoDB, improve data durability by using the new Kafka buffer, and automatically connect to Amazon OpenSearch Serverless collections.

DynamoDB source

Amazon DynamoDB is a high-scale, high-performance key-value database. Generally, developers will query DynamoDB on the primary index and secondary indexes. However, many teams would also like to search and analyze data in DynamoDB. Now you can use Data Prepper and OpenSearch to search and analyze DynamoDB data.

Data Prepper’s new dynamodb source ingests items from a DynamoDB table so that you can index those items in OpenSearch. You can import existing data that was backed up by DynamoDB’s point-in-time recovery. For new data, Data Prepper can read from DynamoDB Streams to keep your OpenSearch cluster’s data up-to-date with DynamoDB.

The feature supports change data capture (CDC), so it will keep the OpenSearch index up-to-date with DynamoDB. You can add items, update items, and delete them. Data Prepper will handle the complicated work of moving this data for you.

Kafka buffer

Data Prepper provides end-to-end acknowledgements to ensure that data from pull-based sources reaches OpenSearch. For push-based sources, Data Prepper currently has an in-memory buffer, but there is some risk of losing data when the node crashes. For these sources, we can improve durability by storing data in an external system instead of locally on the Data Prepper node.

Apache Kafka is an open-source event streaming platform. It is highly durable and can store events for as long as you configure them. This makes it a great choice for durable storage of events in Data Prepper.

Data Prepper now has a new kafka buffer type that uses Kafka to store data in flight. You can use this feature to send data directly to Data Prepper and hold the data in Kafka before Data Prepper saves it to OpenSearch.

Now existing clients such as the OpenTelemetry Collector and Fluent Bit can send data to Data Prepper just as they do now, but with better durability. You can abstract the internals of how you store data in Data Prepper and won’t need to change those client configurations.

Additionally, Data Prepper’s kafka buffer supports per-event encryption so that you can perform client-side encryption if needed.

Amazon OpenSearch Serverless improvements

Data Prepper improves integration with Amazon OpenSearch Serverless with new options to update the network policy. With this feature, you can configure Data Prepper to create an OpenSearch Serverless network policy to your VPC-based collections. This simplifies some of the setup for developers who have the necessary permissions to create this policy. This new configuration is available for both the OpenSearch sink and source.

Other features

Data Prepper’s s3 source provides duplication protection by extending the visibility timeout for Amazon Simple Queue Service (SQS) messages. We encourage users to add the necessary permissions and use this feature to avoid data duplication.
The opensearch source now allows for configuring a distribution_version to connect with ElasticSearch 7 clusters.

Getting started

To download Data Prepper, see the OpenSearch downloads page.
For instructions on how to get started with Data Prepper, see Getting started with Data Prepper.
To learn more about the work in progress for Data Prepper 2.7, see the Data Prepper roadmap.

Thanks to our contributors!

The following people contributed to this release. Thank you!

asuresh8 - Adi Suresh
asifsmohammed - Asif Sohail Mohammed
chenqi0805 - Qi Chen
daixba - Aiden Dai
dinujoh - Dinu John
dlvenable - David Venable
engechas - Chase Engelbrecht
graytaylor0 - Taylor Gray
hshardeesi – Hardeep Singh
KarstenSchnitter - Karsten Schnitter
kkondaka - Krishna Kondaka
mallikagogoi7
oeyh - Hai Yan
Periecle - Roman Kvasnytskyi
reta - Andriy Redko
wanghd89

« OpenSearch adds a new generative AI assistant toolkit OpenSearch Sessions at re:Invent »

Blog

Announcing Data Prepper 2.6.0

DynamoDB source

Kafka buffer

Amazon OpenSearch Serverless improvements

Other features

Getting started

Thanks to our contributors!

Participate

Providers

Resources

Platform

Capabilities

Community

Documentation

Blog

Announcing Data Prepper 2.6.0

DynamoDB source

Kafka buffer

Amazon OpenSearch Serverless improvements

Other features

Getting started

Thanks to our contributors!

David Venable