Skip to main content
search
Technical

Data Prepper 1.2.1 Performance Testing

By February 1, 2022March 26th, 2025No Comments

Following the launch of log pipeline in Data Prepper 1.2, the Data Prepper Team are excited to share the results of the Data Prepper performance testing suite. The goal is to create a tool that can simulate a set of real-world scenarios in any environment while maintaining compatibility with popular log ingestion applications. In all performance test results discussed below, the test environments and configurations are identical, except where the same option is not available for all applications. See testing limitations section for additional details.

TL;DR Data Prepper 1.2.1 vs Logstash 7.13.2 Results

  • Data Prepper 1.2.1 has 88% better sustained throughput than Logstash 7.13.2
  • Data Prepper 1.2.1 has 46% lower mean response latency than Logstash 7.13.2. Response latency here is, the amount of time it takes from when a request is made by the client to the time it takes for Data Prepper’s or Logstash’s response to get back to that client.

Throughout the test, Data Prepper can consistently maintain a higher throughput.

Logstash hit peak latency of 7,382ms
Data Prepper’s peak latency was 5,276ms

 Data Prepper 1.2.1Logstash 7.13.2
Throughput3.73 MB/s1.98 MB/s
Mean response (milliseconds)53 ms99 ms
Total logs processed within 30 minute test68,166,00036,206,800

Data Prepper 1.2.1 processed 88% more logs than Logstash 7.13.2 within the 30-minute test limit.

Performance testing setup

Data Prepper Environment

Logstash Environment

Comparing the performance of the latest release of Data Prepper 1.2.1 against Logstash 7.13.2. The tests are configured to simulate 10 clients to send requests as frequently as possible. Each request was 19.2 KB and contained a batch of 200 logs. The test ran for 30 minutes, measuring the latency and throughput.

Application Configurations

PropertyData PrepperLogstash
HTTP Source Thread44
Max Connection Count2,000n/a *
Request Timeout10,000n/a *
Buffer Size2,000,000n/a *
Batch Size5,0005,000
Worker Thread1212
SSLDisabledDisabled

[*] Note, Max Connection Count, Request Timeout, Buffer Size are not configurable with Logstash 7.13.2

Data Prepper Scaling

In this simulation clients are ramped up from 1 to 60 over one hour to measure the impact concurrent clients have on performance.

As the number of clients increases Data Preppers throughput remains constant, processing an average of 10,473 MB/m.

Limitation of Testing

It’s important to note that Data Prepper 1.2.1 and Logstash 7.13.2 support different feature sets, and the performance test suite is targeted at common functionality. As Data Prepper adds more processors and sources, test cases will be added. The Data Prepper team will continue update the community with performance benchmarks.

Candidates for future performance testing scenarios and improvements

In a real-world deployment, if Data Prepper is unable to keep pace with the logs generated by the source application. Data Prepper will become a bottleneck causing backpressure on the source application. In the future performance tests will be enhanced to simulate backpressure and measure the impact.

The scope of this initial performance testing scenarios was focused on a common scenario, http sourcegrok processor, and an OpenSearch sink. As new features are added to Data Prepper in upcoming releases, performance testing simulations will be added to cover core functionality.

Running Data Prepper Performance tests

Performance suite source code is available on GitHub. To run the full test suite execute ./gradlew –rerun-tasks gatlingRun -Dhost=<target url>. After all tests have completed HTML reports will be created in ./build/reports/gatling/<simulation-name>-<unix-timestamp>/index.html. Further instructions on running performance tests and Gatling are available in the repository readme.

Summary

On identical hardware, Data Prepper 1.2.1 maintained 88% faster throughput in the scenarios simulated and offered lower latencies. If there is a performance critical scenario you would like to see the results of consider opening a feature request. If you are interested in trying Data Prepper for yourself checkout the getting started guide.

Table [1] – AWS Environment Details

NameEC2 Instance TypeInstance CountvCPUMemory (GiB)JVM Memory Limit (GiB)
Data Prepperm5.xlarge14164
Data Prepper Prometheus + Grafanam5.xlarge1416 
Data Prepper OpenSearch Clusteri3.xlarge3430.5 
Logstashm5.xlarge14164
Logstash Prometheus + Grafanam5.xlarge1416 
Logstash OpenSearch Clusteri3.xlarge3430.5 
Gatlingm5.2xlarge1832 

Latest Performance Test Results

Follow this link to see the Latest Performance Test Results

Author

  • Steven Bayer is a Software Development Engineer at AWS working in search services. He is a maintainer on the Data Prepper project.

    View all posts
Close Menu