Link Search Menu Expand Document Documentation Menu

Choosing a workload

The opensearch-benchmark-workloads repository contains a list of workloads that you can use to run your benchmarks. Using a workload similar to your cluster’s use cases can save you time and effort when assessing your cluster’s performance.

For example, say you’re a system architect at a rideshare company. As a rideshare company, you collect and store data based on trip times, locations, and other data related to each rideshare. Instead of building a custom workload and using your own data, which requires additional time, effort, and cost, you can use the nyc_taxis workload to benchmark your cluster because the data inside the workload is similar to the data that you collect.

Criteria for choosing a workload

Consider the following criteria when deciding which workload would work best for benchmarking your cluster:

  • The cluster’s use case.
  • The data types that your cluster uses compared to the data structure of the documents contained in the workload. Each workload contains an example document so that you can compare data types, or you can view the index mappings and data types in the index.json file.
  • The query types most commonly used inside your cluster. The operations/default.json file contains information about the query types and workload operations.

General search clusters

For benchmarking clusters built for general search use cases, start with the [nyc_taxis](https://github.com/opensearch-project/opensearch-benchmark-workloads/tree/main/nyc_taxis) workload. This workload contains data about the rides taken in yellow taxis in New York City in 2015.

Log data

For benchmarking clusters built for indexing and search with log data, use the http_logs workload. This workload contains data about the 1998 World Cup.

350 characters left

Have a question? .

Want to contribute? or .