This recipe will cover the procedure to follow in case of a cluster shutdown. We discuss the split brain problem and also explain how to start a cluster without all storage nodes.
This section covers the procedure to follow—both for a partial failure (some nodes fail, but the cluster remains operational) and a complete failure (all nodes fail!).
Partial cluster failure
In the event of a single node failing, you will notice the following:
If the node that fails is a management node—no immediate problem occurs, but other nodes cannot restart and activities requiring a management node (online backups, centralized logging) will not take place
If the node that fails is a storage node—assuming one node remains in each nodegroup, there will be no immediate action (but there is the possibility of a small number of transactions being rolled back)
If the node that fails is a SQL node—any clients connected to that SQL node clearly will either have to use another SQL node or will fail, but no effect on the cluster
To recover from a partial shutdown, carry out the restart procedure in the previous recipe—Restarting a MySQL Cluster without downtime; however, it may not be necessary to kill the existing process.
Complete cluster failure
The following errors can cause a total cluster shutdown:
Catastrophic software bug that causes multiple cluster nodes to fail
Every node in the cluster loosing power (an entire facility failing for example), split brain condition (that will be discussed shortly)
Malicious or mistaken users gaining access to the management node or any storage node
A split brain problem refers to the problem of cutting a cluster suffering some communication problems between nodes. If we have four nodes and split them into two groups of two nodes each (perhaps, through the failure in a switch), there is absolutely no way for either of the pairs to tell if the other node is working or not. In this case, the only safe thing to do is to shut down both the nodes, even though both pairs could have all the data required to carry on working.
Imagine what would happen if your four-data nodes, two-storage nodes cluster continued working as two separate clusters—and then you had to attempt to reconcile two completely different databases!
MySQL Cluster gets around this with the concept of an arbitrator—put simply, the cluster nodes elect a single node to act as the arbitrator while all nodes can still communicate.
In the event of nodes loosing each other's contact, they (as a new group) ask the following questions:
Do we (nodes I can now talk to) have enough nodes to remain viable (one storage node per nodegroup)?
Can I see the previously agreed arbitrator?
Unless the answer is yes for each node, the cluster will shut down with an error similar to the following appearing in the log:
Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and cannot form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
Note
The arbitrator is typically the management node, but can be a SQL node and you can specify
ArbitrationRank=1
inconfig.ini
file to make a node of high priority to become the cluster arbitrator.Recovery of a full cluster shutdown is conceptually simple—we need to start all storage nodes. It is likely that storage nodes would have killed themselves or had been killed by whatever caused the outage. So the procedure is identical to the rolling cluster restart without killing the existing processes. In other words, start the management node (
ndb_mgmd
), start all storage nodes (ndbd
), and start all SQL nodes (start or restartmysqld
).
During a full-cluster start up, the storage nodes will start and will have to use their local copies of data that they stored to disk (It is likely that there will be some data loss after a total cluster shutdown). By default, a running MySQL Cluster will commit a Local Checkpoint (LCP)—a copy of all the local data held on disk—every time 4 MB of changes are made to the cluster (since the previous LCP).
A MySQL Cluster will also take a global checkpoint (all transactions that have occurred since the last LCP) to disk every two seconds. A storage node when starting from a full shutdown will apply all local transactions up to the last LCP, and then apply them up to two seconds of transactions from the latest global checkpoint to get data that is as up-to-date as possible. Because global checkpoints are made consistently across the cluster, this allows for consistent recovery of all nodes.
In the case of a total cluster shutdown, it may happen that a storage node is damaged and cannot be repaired quickly. It is possible to start your cluster with only one storage node per nodegroup. To do this, pass the --nowait-nodes=<NODES>
where <NODES>
is a comma-separated list of nodes not to wait for. For example, in this example cluster:
[ndbd(NDB)] 4 node(s) id=3 @10.0.0.1 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0) id=4 @10.0.0.2 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0, Master) id=5 @10.0.0.3 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1) id=6 @10.0.0.4 (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)
We could potentially start with nodes [ (3
or 4
) and (5
or 6
) ]. In this example, we will start without node3
:
This example assumes that your cluster is already shut down.
Run the following command on the nodes that you want to start:
[root@node4 ~]# ndbd --nowait-nodes=3 2009-07-09 23:32:02 [ndbd] INFO -- Configuration fetched from '10.0.0.5:1186', generation: 1
The cluster should start without node3
:
ndb_mgm> ALL STATUS Node 3: not connected Node 4: started (mysql-5.1.34 ndb-7.0.6) Node 5: started (mysql-5.1.34 ndb-7.0.6) Node 6: started (mysql-5.1.34 ndb-7.0.6)
Clearly, at this point, the cluster no longer has a single point of failure and as quickly as possible node3
should be repaired and started.