Skip to main content
search
Error Logs

Error log: “node is closing” – The shutdown race condition

By November 21, 2025No Comments

Error log: You’ll see this error in your client application’s logs (like Logstash) or as a failure in a bulk response. It’s often wrapped in other exceptions.

None
[WARN ][o.o.a.b.TransportBulkAction] [your-node-name] 

  [index-name][0] failed to execute bulk item (index) ...

org.opensearch.transport.NodeClosingException: node is closing

Or as a root cause in a JSON response:

JSON

None
"root_cause": [

  {

    "type": "illegal_state_exception",

    "reason": "node is closing"

  }

]

Why… is this happening? This is a race condition and usually not a critical bug, but rather a sign that your clients need to be more resilient.

It happens when:

  1. An OpenSearch node begins its graceful shutdown process (e.g., you are performing a rolling restart, or a cloud auto-scaling event is terminating the instance).
  2. The node’s internal state is set to “closing.”
  3. A client (or a load balancer) sends a new request (like a search or bulk item) to that specific node after it has started shutting down.
  4. The node, being in a “closing” state, immediately rejects the new request with NodeClosingException or IllegalStateException: node is closing.

This is normal behavior during a rolling restart or any node shutdown.

Best practice:

  1. Implement client-side retries: This is the most important fix. Your client application (Logstash, Filebeat, your custom code) must be configured to catch this error and retry the request. A good retry policy with exponential backoff will simply send the request to another, healthy node in the cluster.
  2. Use a smart load balancer: If you are connecting through a load balancer (e.g., AWS ALB/NLB, Nginx), ensure its health checks are configured correctly. A node that is “closing” should fail its health check, and the load balancer should stop sending it new traffic, which minimizes these errors.
  3. Drain nodes before shutdown: For planned maintenance, you can use cluster allocation exclusion to “drain” a node. This tells OpenSearch to move all shards off that node before you shut it down, which gives it time to stop receiving traffic.

Bash

None

PUT _cluster/settings

{

  "transient": {

    "cluster.routing.allocation.exclude._ip": "10.0.0.123"

  }

}

What else can I do? If you’re seeing these errors when you are not performing a restart, it could mean your nodes are unstable and randomly shutting down. In that case, check the opensearch.log for OutOfMemoryError or other fatal errors. For help building a resilient client, ask the community or contact us in The OpenSearch Slack Channel in #General.

Author