High Availability MySQL Cookbook

High Availability MySQL Cookbook

By : Alexander Davies

Buy this Book

High Availability MySQL Cookbook

By: Alexander Davies

Buy this Book

Overview of this book

High Availability is something that all web sites hope to achieve, especially those that are linked to big companies.MySQL, an open source relational database management system (RDBMS), can be made highly available to protect from corruption, hardware failure, software crashes, and user error. Running a MySQL setup is quite simple. Things start getting complex when you start thinking about the best way to provide redundancy. There are a large number of techniques available to add 'redundancy' and 'high availability' to MySQL, but most are both poorly understood and documented.This book will provide you with recipes showing how to design, implement, and manage a MySQL Cluster and achieve high availability using MySQL replication, block level replication, shared storage, and the open source Global File System (GFS).This book covers all the major techniques available for increasing availability of your MySQL databases. It demonstrates how to design, implement, troubleshoot and manage a highly available MySQL setup using any one of several techniques, which are shown in different recipes. It is based on MySQL Cluster 7.0, MySQL (for non clustered recipes) 5.0.77, and CentOS / RedHat Enterprise Linux 5.3.The book starts by introducing MySQL Cluster as a technology and explaining how to set up a simple cluster. It will help you to master the options available for backing up and restoring a file in the MySQL Cluster. By following the practical examples in this book, you will learn how to manage the MySQL Cluster. Further, we will discuss some troubleshooting aspects of the MySQL Cluster.We also have a look at achieving high availability for MySQL databases with the techniques of MySQL Replication, block level replication, shared storage (a SAN or NAS), and DRBD.Finally, you will learn the principles of Performance tuning and tune MySQL database for optimal performance.

High Availability MySQL Cookbook

Credits

About the Author

About the Reviewers

Preface

Free Chapter

High Availability with MySQL Cluster

Introduction

Designing a MySQL Cluster

Creating an initial cluster configuration file—config.ini

Installing a management node

Starting a management node

Installing and starting storage nodes

Installing and starting SQL nodes

Creating a MySQL Cluster table

Restarting a MySQL Cluster without downtime

Recovering from a cluster shutdown

MySQL Cluster Backup and Recovery

Introduction

Importing SQL files to a MySQL server and converting them to MySQL Cluster

Taking an online backup of a MySQL Cluster

Restoring from a MySQL Cluster online backup

Restricting write access to a MySQL Cluster with single-user mode

Taking an offline backup with MySQL Cluster

MySQL Cluster Management

Introduction

Configuring multiple management nodes

Obtaining usage information

Adding storage nodes online

Replicating between MySQL Clusters

Replication between clusters with a backup channel

User-defined partitioning

Disk-based tables

Calculating DataMemory and IndexMemory

MySQL Cluster Troubleshooting

Introduction

Single storage node failure

Multiple storage node failures

Storage node partitioning and arbitration

Debugging MySQL Clusters

Seeking help

NIC teaming with MySQL Cluster

High Availability with MySQL Replication

Introduction

Designing a replication setup

Configuring a replication master

Configuring a replication slave without syncing data

Configuring a replication slave, migrating data with a simple SQL dump

Using LVM to reduce downtime on a master when bringing a slave online

Replication safety tricks

Multi Master Replication Manager (MMM): initial installation

Multi Master Replication Manager (MMM): installing the MySQL nodes

Multi Master Replication Manager (MMM): installing monitoring node

Managing and using Multi Master Replication Manager (MMM)

High Availability with MySQL and Shared Storage

Introduction

Preparing a Linux server for shared storage

Configuring two servers for shared storage MySQL

Configuring MySQL on shared storage with Conga

Fencing for high availability

Configuring MySQL with GFS

High Availability with Block Level Replication

Introduction

Installing DRBD on two Linux servers

Manually moving services within a DRBD cluster

Using heartbeat for automatic failover

Performance Tuning

Introduction

Tuning the Linux kernel IO

Tuning MySQL Cluster storage nodes

Tuning MySQL Cluster SQL nodes

Tuning queries within a MySQL Cluster

Tuning GFS on shared storage

MySQL Replication tuning

Base Installation

LVM and MySQL

How to do it...

How it works…

Highly Available Architectures

Single-site architectures

Multi-site architectures

Summary of options

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Recovering from a cluster shutdown

This recipe will cover the procedure to follow in case of a cluster shutdown. We discuss the split brain problem and also explain how to start a cluster without all storage nodes.

How to do it…

This section covers the procedure to follow—both for a partial failure (some nodes fail, but the cluster remains operational) and a complete failure (all nodes fail!).

Partial cluster failure
In the event of a single node failing, you will notice the following:
- If the node that fails is a management node—no immediate problem occurs, but other nodes cannot restart and activities requiring a management node (online backups, centralized logging) will not take place
- If the node that fails is a storage node—assuming one node remains in each nodegroup, there will be no immediate action (but there is the possibility of a small number of transactions being rolled back)
- If the node that fails is a SQL node—any clients connected to that SQL node clearly will either have to use another SQL node or will fail, but no effect on the cluster
To recover from a partial shutdown, carry out the restart procedure in the previous recipe—Restarting a MySQL Cluster without downtime; however, it may not be necessary to kill the existing process.
Note
If you do find a zombied (crashed) process remaining, you should first kill that process and then restart the node.
Complete cluster failure
The following errors can cause a total cluster shutdown:
- Catastrophic software bug that causes multiple cluster nodes to fail
- Every node in the cluster loosing power (an entire facility failing for example), split brain condition (that will be discussed shortly)
- Malicious or mistaken users gaining access to the management node or any storage node
A split brain problem refers to the problem of cutting a cluster suffering some communication problems between nodes. If we have four nodes and split them into two groups of two nodes each (perhaps, through the failure in a switch), there is absolutely no way for either of the pairs to tell if the other node is working or not. In this case, the only safe thing to do is to shut down both the nodes, even though both pairs could have all the data required to carry on working.
Imagine what would happen if your four-data nodes, two-storage nodes cluster continued working as two separate clusters—and then you had to attempt to reconcile two completely different databases!
MySQL Cluster gets around this with the concept of an arbitrator—put simply, the cluster nodes elect a single node to act as the arbitrator while all nodes can still communicate.
In the event of nodes loosing each other's contact, they (as a new group) ask the following questions:
- Do we (nodes I can now talk to) have enough nodes to remain viable (one storage node per nodegroup)?
- Can I see the previously agreed arbitrator?
Unless the answer is yes for each node, the cluster will shut down with an error similar to the following appearing in the log:
Forced node shutdown completed. Caused by error 2305: 'Node lost connection to other nodes and cannot form a unpartitioned cluster, please investigate if there are error(s) on other node(s)(Arbitration error). Temporary error, restart node'.
Note
The arbitrator is typically the management node, but can be a SQL node and you can specify ArbitrationRank=1 in config.ini file to make a node of high priority to become the cluster arbitrator.
Recovery of a full cluster shutdown is conceptually simple—we need to start all storage nodes. It is likely that storage nodes would have killed themselves or had been killed by whatever caused the outage. So the procedure is identical to the rolling cluster restart without killing the existing processes. In other words, start the management node (ndb_mgmd), start all storage nodes (ndbd), and start all SQL nodes (start or restart mysqld).

How it works…

During a full-cluster start up, the storage nodes will start and will have to use their local copies of data that they stored to disk (It is likely that there will be some data loss after a total cluster shutdown). By default, a running MySQL Cluster will commit a Local Checkpoint (LCP)—a copy of all the local data held on disk—every time 4 MB of changes are made to the cluster (since the previous LCP).

A MySQL Cluster will also take a global checkpoint (all transactions that have occurred since the last LCP) to disk every two seconds. A storage node when starting from a full shutdown will apply all local transactions up to the last LCP, and then apply them up to two seconds of transactions from the latest global checkpoint to get data that is as up-to-date as possible. Because global checkpoints are made consistently across the cluster, this allows for consistent recovery of all nodes.

There's more…

In the case of a total cluster shutdown, it may happen that a storage node is damaged and cannot be repaired quickly. It is possible to start your cluster with only one storage node per nodegroup. To do this, pass the --nowait-nodes=<NODES> where <NODES> is a comma-separated list of nodes not to wait for. For example, in this example cluster:

[ndbd(NDB)] 4 node(s)
id=3 @10.0.0.1  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0)
id=4 @10.0.0.2  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 0, Master)
id=5 @10.0.0.3  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)
id=6 @10.0.0.4  (mysql-5.1.34 ndb-7.0.6, Nodegroup: 1)

We could potentially start with nodes [ (3 or 4) and (5 or 6) ]. In this example, we will start without node3:

This example assumes that your cluster is already shut down.

Run the following command on the nodes that you want to start:

[root@node4 ~]# ndbd --nowait-nodes=3
2009-07-09 23:32:02 [ndbd] INFO     -- Configuration fetched from '10.0.0.5:1186', generation: 1

The cluster should start without node3:

ndb_mgm> ALL STATUS
Node 3: not connected
Node 4: started (mysql-5.1.34 ndb-7.0.6)
Node 5: started (mysql-5.1.34 ndb-7.0.6)
Node 6: started (mysql-5.1.34 ndb-7.0.6)

Clearly, at this point, the cluster no longer has a single point of failure and as quickly as possible node3 should be repaired and started.

High Availability MySQL Cookbook

By : Alexander Davies

High Availability MySQL Cookbook

By: Alexander Davies

Overview of this book

Related Content you might be interested in

Current Title:

High Availability MySQL Cookbook

Recovering from a cluster shutdown

How to do it…

Note

Note

How it works…

There's more…