Announcing Data Prepper 2.7.0

Data Prepper 2.7.0 is now available for download. This release supports extracting geographic locations from IP addresses, supports injectable secrets, and adds many new processors.

GeoIP processor

Data Prepper can now enrich events with geographical location data from an IP address using the new geoip processor. The geoip processor uses the MaxMind GeoLite2 databases to provide geographical location data from IP addresses.

Many OpenSearch and Data Prepper users want to enrich their data by adding geographical locations to events. There are a number of reasons this data can be valuable. Some examples include customer analytics, looking for anomalies in network access, understanding load across geographies, and more. An industry solution for determining a geographical location is through the use of IP addresses.

One example scenario is locating users of a web server. Data Prepper already supports parsing Apache Common Log Format for Apache HTTP servers in the grok processor. The following example shows how you can now locate the client making requests using the clientip property extracted from the grok processor:

processor:
  - grok:
      match:
        log: [ "%{COMMONAPACHELOG_DATATYPED}" ]

  - geoip:
      entries:
        - source: clientip
          target: clientlocation
          include_fields: [latitude, longitude, location, postal_code, country_name, city_name]

When ingesting data using this pipeline, the OpenSearch index will now contain the geolocation fields expressed above, such as the latitude and city_name. Additionally, you can configure template mappings in OpenSearch so that you can display these events in OpenSearch Dashboards using the Maps feature.

AWS Secrets Manager support

Data Prepper now supports AWS Secrets Manager as an extension plugin applicable to pipeline plugins (source, buffer, processor, sink). Users are allowed to configure the AWS Secrets Manager extension through extensions in data-prepper-config.yaml.

The following example shows how you can configure your secrets:

extensions:
  aws:
    secrets:
      host-secret-config:
        secret_id: <YOUR_SECRET_ID_1>
        region: <YOUR_REGION_1>
        sts_role_arn: <YOUR_STS_ROLE_ARN_1>
        refresh_interval: <YOUR_REFRESH_INTERVAL_1>
      credential-secret-config:
        secret_id: <YOUR_SECRET_ID_2>
        region: <YOUR_REGION_2>
        sts_role_arn: <YOUR_STS_ROLE_ARN_2>
        refresh_interval: <YOUR_REFRESH_INTERVAL_2>

Users can also configure secrets in the pipeline_configurations section of a pipeline YAML file.

The credential-secret-config term in the example above is a user-supplied secret configuration ID. Pipeline authors can reference secrets within pipeline plugin settings using the pattern $aws_secrets:<<my-defined-secret>>`. The following example shows how to configure an OpenSearch sink with secret values:


source:
  - opensearch:
      hosts: [ "${{aws_secrets:host-secret-config}}" ]
      username: "${{aws_secrets:credential-secret-config:username}}"
      password: "${{aws_secrets:credential-secret-config:password}}"

In this example, secrets under credential-secret-config are assumed to be stored as the following JSON key-value pairs:

{
  "username": <YOUR_USERNAME>
  "password": <YOUR_PASSWORD>
}

The secret under host-secret-config is assumed to be stored as plaintext. To support secret rotation for OpenSearch, the opensearch source automatically refreshes its basic credentials, (username/password) according to the refresh_interval by polling the latest secret values.

For more information, please see the aws extension documentation.

Note that this feature is currently experimental, and we are working to add support for refreshing and dynamically updating certain fields. In particular, the opensearch sink and the kafka plugins do not automatically refresh secrets.

Other features

Data Prepper can now parse XML data in fields using the parse_xml processor.
The new parse_ion processor can parse fields in the Amazon Ion format.
Some users have fields that are gzip-compressed at the field level. These users can decompress those fields using the decompress processor.
Data Prepper can now join strings from multiple strings, including with a delimiter.
The new select_entries processor allows users to select only the necessary fields from events. This can simplify how users filter unnecessary data.
Users who wish to reduce the size of fields in OpenSearch can use the truncate processor, which truncates strings to a configurable maximum length.
The file source now supports codecs. This can help you test a pipeline locally before using the s3 source.

Getting started

To download Data Prepper, see the OpenSearch downloads page.
For instructions on how to get started with Data Prepper, see Getting started with Data Prepper.
To learn more about the work in progress for Data Prepper 2.8 and other releases, see the Data Prepper roadmap.

Thanks to our contributors!

The following people contributed to this release. Thank you!

asifsmohammed – Asif Sohail Mohammed
asuresh8 – Adi Suresh
chenqi0805 – Qi Chen
derek-ho – Derek Ho
dinujoh – Dinu John
dlvenable – David Venable
emmachase – Emma
graytaylor0 – Taylor Gray
GumpacG – Guian Gumpac
kkondaka – Krishna Kondaka
mallikagogoi7 – None
oeyh – Hai Yan
rajeshLovesToCode – None
shaavanga – Prathyusha Vangala
srikanthjg – Srikanth Govindarajan
travisbenedict – Travis Benedict
Utkarsh-Aga – None
venkataraopasyavula – venkataraopasyavula
wanghd89 – None

Authors

David Venable

David is a senior software engineer working on observability in OpenSearch at Amazon Web Services. He is a maintainer on the Data Prepper project. Prior to working at Amazon, he was the CTO at Allogy Interactive - a start-up creating mobile-learning solutions for healthcare.
View all posts
George Chen

George is a Software Development Engineer at Amazon Web Services working on Observability. He is a maintainer of the Data Prepper project.
View all posts

Announcing Data Prepper 2.7.0

GeoIP processor

AWS Secrets Manager support

Other features

Getting started

Thanks to our contributors!

Authors

OpenSearch is a community-driven, Apache 2.0-licensed open source search and analytics suite that makes it easy to ingest, search, visualize, and analyze data.

Participate

Providers

Resources