Data Prepper 1.2.1 Performance Testing
Following the launch of log pipeline in Data Prepper 1.2, the Data Prepper Team are excited to share the results of the Data Prepper performance testing suite. The goal is to create a tool that can simulate a set of real-world scenarios in any environment while maintaining compatibility with popular log ingestion applications. In all performance test results discussed below, the test environments and configurations are identical, except where the same option is not available for all applications. See testing limitations section for additional details.
TL;DR Data Prepper 1.2.1 vs Logstash 7.13.2 Results
- Data Prepper 1.2.1 has 88% better sustained throughput than Logstash 7.13.2
- Data Prepper 1.2.1 has 46% lower mean response latency than Logstash 7.13.2. Response latency here is, the amount of time it takes from when a request is made by the client to the time it takes for Data Prepper’s or Logstash’s response to get back to that client.
Throughout the test, Data Prepper can consistently maintain a higher throughput.
Logstash hit peak latency of 7,382ms
Data Prepper’s peak latency was 5,276ms
Data Prepper 1.2.1 | Logstash 7.13.2 | |
---|---|---|
Throughput | 3.73 MB/s | 1.98 MB/s |
Mean response (milliseconds) | 53 ms | 99 ms |
Total logs processed within 30 minute test | 68,166,000 | 36,206,800 |
Data Prepper 1.2.1 processed 88% more logs than Logstash 7.13.2 within the 30-minute test limit.
Performance testing setup
Data Prepper Environment
Logstash Environment
Comparing the performance of the latest release of Data Prepper 1.2.1 against Logstash 7.13.2. The tests are configured to simulate 10 clients to send requests as frequently as possible. Each request was 19.2 KB and contained a batch of 200 logs. The test ran for 30 minutes, measuring the latency and throughput.
Application Configurations
Property | Data Prepper | Logstash |
---|---|---|
HTTP Source Thread | 4 | 4 |
Max Connection Count | 2,000 | n/a * |
Request Timeout | 10,000 | n/a * |
Buffer Size | 2,000,000 | n/a * |
Batch Size | 5,000 | 5,000 |
Worker Thread | 12 | 12 |
SSL | Disabled | Disabled |
[*] Note, Max Connection Count, Request Timeout, Buffer Size are not configurable with Logstash 7.13.2
Data Prepper Scaling
In this simulation clients are ramped up from 1 to 60 over one hour to measure the impact concurrent clients have on performance.
As the number of clients increases Data Preppers throughput remains constant, processing an average of 10,473 MB/m.
Limitation of Testing
It’s important to note that Data Prepper 1.2.1 and Logstash 7.13.2 support different feature sets, and the performance test suite is targeted at common functionality. As Data Prepper adds more processors and sources, test cases will be added. The Data Prepper team will continue update the community with performance benchmarks.
Candidates for future performance testing scenarios and improvements
In a real-world deployment, if Data Prepper is unable to keep pace with the logs generated by the source application. Data Prepper will become a bottleneck causing backpressure on the source application. In the future performance tests will be enhanced to simulate backpressure and measure the impact.
The scope of this initial performance testing scenarios was focused on a common scenario, http source, grok processor, and an OpenSearch sink. As new features are added to Data Prepper in upcoming releases, performance testing simulations will be added to cover core functionality.
Running Data Prepper Performance tests
Performance suite source code is available on GitHub. To run the full test suite execute ./gradlew –rerun-tasks gatlingRun -Dhost=<target url>
. After all tests have completed HTML reports will be created in ./build/reports/gatling/<simulation-name>-<unix-timestamp>/index.html
. Further instructions on running performance tests and Gatling are available in the repository readme.
Summary
On identical hardware, Data Prepper 1.2.1 maintained 88% faster throughput in the scenarios simulated and offered lower latencies. If there is a performance critical scenario you would like to see the results of consider opening a feature request. If you are interested in trying Data Prepper for yourself checkout the getting started guide.
Table [1] - AWS Environment Details
Name | EC2 Instance Type | Instance Count | vCPU | Memory (GiB) | JVM Memory Limit (GiB) |
---|---|---|---|---|---|
Data Prepper | m5.xlarge | 1 | 4 | 16 | 4 |
Data Prepper Prometheus + Grafana | m5.xlarge | 1 | 4 | 16 | |
Data Prepper OpenSearch Cluster | i3.xlarge | 3 | 4 | 30.5 | |
Logstash | m5.xlarge | 1 | 4 | 16 | 4 |
Logstash Prometheus + Grafana | m5.xlarge | 1 | 4 | 16 | |
Logstash OpenSearch Cluster | i3.xlarge | 3 | 4 | 30.5 | |
Gatling | m5.2xlarge | 1 | 8 | 32 |
Latest Performance Test Results
Follow this link to see the Latest Performance Test Results