Link Search Menu Expand Document Documentation Menu

You're viewing version 2.17 of the OpenSearch documentation. This version is no longer maintained. For the latest version, see the current documentation. For information about OpenSearch version maintenance, see Release Schedule and Maintenance Policy.

Hot shard identification

Hot shard identification root cause analysis (RCA) lets you identify a hot shard within an index. A hot shard is an outlier that consumes more resources than other shards and may lead to poor indexing and search performance. The hot shard identification RCA monitors the following metrics:

  • CPU utilization
  • Heap allocation rate

Shards may become hot because of the nature of your workload. When you use a _routing parameter or a custom document ID, a specific shard or several shards within the cluster receive frequent updates, consuming more CPU and heap resources than other shards.

The hot shard identification RCA compares the CPU utilization and heap allocation rates against their threshold values. If the usage for either metric is greater than the threshold, the shard is considered to be hot.

For more information about the hot shard identification RCA implementation, see Hot Shard RCA.

Example request

The following query requests hot shard identification:

GET _plugins/_performanceanalyzer/rca?name=HotShardClusterRca

Example response

The response contains a list of unhealthy shards:

"HotShardClusterRca": [{
  "rca_name": "HotShardClusterRca",
  "timestamp": 1680721367563,
  "state": "unhealthy",
  "HotClusterSummary": [
    {
      "number_of_nodes": 3,
      "number_of_unhealthy_nodes": 1,
      "HotNodeSummary": [
        {
          "node_id": "7kosAbpASsqBoHmHkVXxmw",
          "host_address": "192.168.80.4",
          "HotResourceSummary": [
            {
              "resource_type": "cpu usage",
              "resource_metric": "cpu usage(num of cores)",
              "threshold": 0.027397981341796683,
              "value": 0.034449630200405396,
              "time_period_seconds": 60,
              "meta_data": "ssZw1WRUSHS5DZCW73BOJQ index9 4"
            },
            {
              "resource_type": "heap",
              "resource_metric": "heap alloc rate(heap alloc rate in bytes per second)",
              "threshold": 7605441.367010161,
              "value": 10872119.748328414,
              "time_period_seconds": 60,
              "meta_data": "ssZw1WRUSHS5DZCW73BOJQ index9 4"
            },
            {
              "resource_type": "heap",
              "resource_metric": "heap alloc rate(heap alloc rate in bytes per second)",
              "threshold": 7605441.367010161,
              "value": 8019622.354388569,
              "time_period_seconds": 60,
              "meta_data": "QRF4rBM7SNCDr1g3KU6HyA index9 0"
            }
          ]
        }
      ]
    }
  ]
}]

Response body fields

The following table lists the response fields.

Field Type Description
rca_name String The name of the RCA. In this case, “HotShardClusterRca”.
timestamp Integer The timestamp of the RCA.
state Object The state of the cluster determined by the RCA. The state can be healthy, unhealthy, or unknown.
HotClusterSummary.HotNodeSummary.number_of_nodes Integer The number of nodes in the cluster.
HotClusterSummary.HotNodeSummary.number_of_unhealthy_nodes Integer The number of nodes found to be in an unhealthy state.
HotClusterSummary.HotNodeSummary.HotResourceSummary.resource_type Object The type of resource causing the unhealthy state, either “cpu usage” or “heap”.
HotClusterSummary.HotNodeSummary.HotResourceSummary.resource_metric String The definition of the resource_type. Either “cpu usage(num of cores)” or “heap alloc rate(heap alloc rate in bytes per second)”.
HotClusterSummary.HotNodeSummary.HotResourceSummary.threshold Float The value that determines whether a resource is contended.
HotClusterSummary.HotNodeSummary.HotResourceSummary.value Float The current value of the resource.
HotClusterSummary.HotNodeSummary.HotResourceSummary.time_period_seconds Time The amount of time that a shard was monitored before its state was declared to be healthy or unhealthy.
HotClusterSummary.HotNodeSummary.HotResourceSummary.meta_data String The metadata associated with the resource_type.

In the preceding example response, meta_data is QRF4rBM7SNCDr1g3KU6HyA index9 0. The meta_data string consists of three fields:

  • Node name: QRF4rBM7SNCDr1g3KU6HyA
  • Index name: index9
  • Shard ID: 0

This means that shard 0 of index index9 on node QRF4rBM7SNCDr1g3KU6HyA is hot.

350 characters left

Have a question? .

Want to contribute? or .