Error log: You’ll see this error in your client application’s logs (like Logstash) or as a failure in a bulk response. It’s often wrapped in other exceptions.
None
[WARN ][o.o.a.b.TransportBulkAction] [your-node-name]
[index-name][0] failed to execute bulk item (index) ...
org.opensearch.transport.NodeClosingException: node is closing
Or as a root cause in a JSON response:
JSON
None
"root_cause": [
{
"type": "illegal_state_exception",
"reason": "node is closing"
}
]
Why… is this happening? This is a race condition and usually not a critical bug, but rather a sign that your clients need to be more resilient.
It happens when:
- An OpenSearch node begins its graceful shutdown process (e.g., you are performing a rolling restart, or a cloud auto-scaling event is terminating the instance).
- The node’s internal state is set to “closing.”
- A client (or a load balancer) sends a new request (like a search or bulk item) to that specific node after it has started shutting down.
- The node, being in a “closing” state, immediately rejects the new request with
NodeClosingExceptionorIllegalStateException: node is closing.
This is normal behavior during a rolling restart or any node shutdown.
Best practice:
- Implement client-side retries: This is the most important fix. Your client application (Logstash, Filebeat, your custom code) must be configured to catch this error and retry the request. A good retry policy with exponential backoff will simply send the request to another, healthy node in the cluster.
- Use a smart load balancer: If you are connecting through a load balancer (e.g., AWS ALB/NLB, Nginx), ensure its health checks are configured correctly. A node that is “closing” should fail its health check, and the load balancer should stop sending it new traffic, which minimizes these errors.
- Drain nodes before shutdown: For planned maintenance, you can use cluster allocation exclusion to “drain” a node. This tells OpenSearch to move all shards off that node before you shut it down, which gives it time to stop receiving traffic.
Bash
None
PUT _cluster/settings
{
"transient": {
"cluster.routing.allocation.exclude._ip": "10.0.0.123"
}
}
What else can I do? If you’re seeing these errors when you are not performing a restart, it could mean your nodes are unstable and randomly shutting down. In that case, check the opensearch.log for OutOfMemoryError or other fatal errors. For help building a resilient client, ask the community or contact us in The OpenSearch Slack Channel in #General.