In this recipe, we will discuss the initial configuration required to start a MySQL Cluster. A MySQL Cluster has a global configuration file—config.ini
, which resides on all management nodes. This file defines the nodes (processes) that make up the cluster and the parameters that the nodes will use.
Each management node, when it starts, reads the config.ini
file to get information on the structure of the cluster and when other nodes (storage and SQL / API) start, they contact the already-running management node to obtain the details of the cluster architecture.
The creation of this global configuration file—config.ini
, is the first step in building the cluster and this recipe looks at the initial configuration for this file. Later recipes will cover more advanced parameters which you can define (typically to tune a cluster for specific goals, such as performance).
The first step in building a cluster is to create a global cluster configuration file. This file, called config.ini
, by convention, is stored on each management node and is used by the management node process to show the cluster makeup and define variables for each node. In our example, we will store this in the file /usr/local/mysql-cluster/config.ini
, but it can be stored anywhere else.
The file consists of multiple sections. Each section contains parameters that apply to a particular node, for example, the node's IP address or the amount of memory to reserve for data. Each type of node (management, SQL, and data node) has an optional default
section to save duplicating the same parameter in each node. Each individual node that will make up the cluster has its own sections, which inherits the defaults defined for its type and specifies the additional parameters, or overrides the defaults.
This global configuration file is not complex, but is best analyzed with an example, and in this recipe, we will create a simple cluster configuration file for this node. The first line to add in the config.ini
file is a block for this new management node:
[ndb_mgmd]
Now, we specify an ID for the node. This is absolutely not required, but can be useful—particularly if you have multiple management nodes.
Id=1
Now, we specify the IP address or hostname of the management node. It is recommended to use IP addresses in order to avoid a dependency on the DNS:
HostName=10.0.0.5
Note
It is possible to define a node without an IP address, in this case, a starting node can either be told which nodeID it should take when it starts, or the management node will allocate the node to the most suitable free slot.
Finally, we define a directory to store local files (for example, cluster log
files):
DataDir=/var/lib/mysql-cluster
This is all that is required to define a single management node.
Now, we define the storage nodes in our simple cluster. To add storage nodes, it is recommended that we use the default
section to define a data directory (a place for the node to store the files, which the node stores on the disk). It is also mandatory to define the NoOfReplicas
parameter, which was discussed in the There's more… section of the previous recipe.
[<type>_default]
works for all three types of nodes (mgmd
, ndbd
, and mysqld
) and defines a default value to save duplicating a parameter for every node in that section. For example, the DataDir
of the storage nodes can (and should) be defined in the default
section:
[ndbd_default] DataDir=/var/lib/mysql-cluster NoOfReplicas=2
Once we have defined the default
section, then defining the node ID and IP / hostname for the other storage nodes that make up our cluster is a simple matter as follows:
[ndbd] id=3 HostName=10.0.0.1 [ndbd] id=4 HostName=10.0.0.2
Note
You can either use hostnames or IP addresses in config.ini
file. I recommend that you use IP addresses for absolute clarity, but if hostnames are used, it is a good idea to ensure that they are hardcoded in /etc/hosts
on each node in order to ensure that a DNS problem does not cause major issues with your cluster.
Finally for SQL nodes, it is both possible and common to simply define a large number of [mysqld]
sections with no HostName
parameter. This keeps the precise future structure of the cluster flexible (this is not generally recommended for storage and management nodes).
It is a good practice to define the hostnames for essential nodes, and if desired also leave some spare sections for future use (the recipe Taking an online backup of a MySQL Cluster later in Chapter 2, MySQL Cluster Backup and Recovery will explain one of several most common reasons why this will be useful). For example, to define two-cluster SQL nodes (with their servers running mysqld
) with IP addresses 10.0.0.2
and 10.0.0.3
, with two more slots available for any SQL or API nodes to connect to on a first come, first served basis, use the following:
[mysqld] HostName=10.0.0.2 [mysqld] HostName=10.0.0.3 [mysqld] [mysqld]
Now that we have prepared a simple config.ini
file for a cluster, it is potentially possible to move on to installing and starting the cluster's first management node. Recollect where we saved the file (in /usr/local/mysql-cluster/config.ini
) as you will need this information when you start the management node for the first time.
At this stage, we have not yet defined any advanced parameters. It is possible to use the config.ini
file that we have written so far to start a cluster and import a relatively small testing data set (such as the world
database provided by MySQL for testing, which we will use later in this book). However, it is likely that you will need to set a couple of other parameters in the ndbd_default
section of the config.ini
file before you get a cluster in which you can actually import anything more than a tiny amount of data.
Firstly, there is a maximum limit of 32,000 concurrent operations in a MySQL Cluster, by default. The variable MaxNoOfConcurrentOperations
sets the number of records that can be in update phase or locked simultaneously. While this sounds like a lot, it is likely that any significant import of data will exceed this value, so this can be safely increased. The limit is set deliberately low to protect small systems from large transactions. Each operation consumes at least one record, which has an overhead of 1 KB of memory.
Note
The MySQL documentation states the following:
Unfortunately, it is difficult to calculate an exact value for this parameter so set it to a sensible value depending on the expected load on the cluster and monitor for errors when running large transactions (often when importing data):
MaxNoOfConcurrentOperations = 150000
A second extremely common limit to increase is the maximum number of attributes (fields, indexes, and so on) in a cluster which defaults to 1000
. This is also quite low, and in the same way it can normally be increased:
MaxNoOfAttributes = 10000
The maximum number of ordered indexes is low and if you reach it, it will return a slightly cryptic error, Can't create table xxx (errno: 136). Therefore, it is often worth increasing it at the start, if you plan on having a total of more than 128 ordered indexes in your cluster:
MaxNoOfOrderedIndexes=512
Finally, it is almost certain that you will need to define some space for data and indexes on your storage nodes. Note that you should not allocate more storage space than you have to spare on the physical machines running the storage nodes, as a cluster swapping is likely to happen and the cluster will crash!
DataMemory=2G IndexMemory=500M
With these parameters set, you are ready to start a cluster and import a certain amount of data in it.