Error Log: You’ll frequently see this in node logs when a node is attempting to join a cluster or reconnect:
[WARN ][o.o.c.c.ClusterFormationFailureHelper] [your-node-name] master not discovered yet, this node has not previously joined a bootstrapped cluster, and this node must discover master-eligible nodes {your-master-node-names-or-IPs} to form a new cluster or join a running cluster; discovery will continue using [...]
Or simply:
[ERROR][o.o.c.c.ClusterFormationFailureHelper] [your-node-name] no master-eligible nodes found in the list of seeds
Why… is this happening? The MasterNotDiscoveredException occurs when a node in your cluster cannot find or connect to the elected cluster manager node within a specified timeout. This is a critical issue because the cluster manager node is responsible for managing the cluster state, including shard allocation, index creation, and node management. Without a cluster manager, your cluster cannot function correctly, or individual nodes might fail to join.
Common causes include:
Incorrect discovery.seed_hosts: The list of IP addresses or hostnames in opensearch.yml that master-eligible nodes use to discover each other is incorrect or incomplete.
Network Connectivity Issues: Firewalls, security groups, or network routing problems are preventing nodes from communicating on the discovery port (typically 9300 by default).
Cluster Manager Node Failure: The actual cluster manager node or all cluster manager-eligible nodes have crashed, are unreachable, or are overloaded.
Split Brain Scenario: While less common with modern OpenSearch, improper configuration can lead to a “split brain” where multiple nodes believe they are the cluster managers, leading to instability. This is usually mitigated by correct cluster.initial_master_nodes and discovery.seed_hosts settings.
DNS Resolution Problems: If you’re using hostnames in discovery.seed_hosts, a DNS issue could prevent resolution.
Best Practice:
Verify discovery.seed_hosts: Ensure that every cluster manager-eligible node lists the IP addresses (or resolvable hostnames) of all other cluster manager-eligible nodes in its opensearch.yml file.
Check Network Connectivity: Use tools like ping, telnet 9300, or netcat to verify that nodes can reach each other on the discovery port. Check firewall rules (e.g., iptables, security groups) on both the sender and receiver.
Consistent cluster.initial_master_nodes: For initial cluster bootstrap, ensure all cluster manager-eligible nodes have the exact same list of initial cluster manager nodes. Once the cluster is formed, this setting is ignored.
Monitor Cluster Manager Node Health: Keep an eye on the health and resource utilization of your master-eligible nodes.
Review Logs: Check the logs of all master-eligible nodes and the node reporting the error for related messages that might provide more context.
What else can I do? Is your OpenSearch cluster struggling to find its leader? Don’t let MasterNotDiscoveredException bring your operations to a halt! Join the OpenSearch community forums or contact us for direct assistance at help@opensearchsoftwarefoundation.org. We’re here to help you get your cluster back in sync!