Book Image

Professional SQL Server High Availability and Disaster Recovery

By : Ahmad Osama
Book Image

Professional SQL Server High Availability and Disaster Recovery

By: Ahmad Osama

Overview of this book

Professional SQL Server High Availability and Disaster Recovery explains the high availability and disaster recovery technologies available in SQL Server: Replication, AlwaysOn, and Log Shipping. You’ll learn what they are, how to monitor them, and how to troubleshoot any related problems. You will be introduced to the availability groups of AlwaysOn and learn how to configure them to extend your database mirroring. Through this book, you will be able to explore the technical implementations of high availability and disaster recovery technologies that you can use when you create a highly available infrastructure, including hybrid topologies. Note that this course does not cover SQL Server Failover Cluster Installation with shared storage. By the end of the book, you’ll be equipped with all that you need to know to develop robust and high performance infrastructure.
Table of Contents (9 chapters)
Professional SQL Server High Availability and Disaster Recovery
Preface

HA and DR Terminologies


The following terms are important in the world of HA and DR so that you can correctly choose the best possible HA and DR solutions and for the better understanding of HA and DR concepts.

Availability

Availability or uptime is defined as the percentage that a system or an application should be available for in a given year. Availability is expressed as Number of Nines.

For example, a 90%, one nine, availability means that a system can tolerate a downtime of 36.5 hours in a year, and a 99.999%, five nines, availability means that a system can tolerate a downtime of 5.26 minutes per year.

The following table, taken from https://en.wikipedia.org/wiki/High_availability#"Nines", describes the availability percentages and the downtime for each percentage:

Note

This link also talks about how this is calculated. You can look at it, but a discussion on calculation is out of the scope of this book.

Figure 1.4: Availability table

In the preceding table, you can see that as the Number of Nines increases, the downtime decreases. The business decides the availability, the Number of Nines, required for the system. This plays a vital role in selecting the type of HA and DR solution required for any given system. The higher the Number of Nines, the more rigorous or robust the required solution.

Recovery Time Objective

Recovery time objective, or RTO, is essentially the downtime a business can tolerate without any substantial loss. For example, an RTO of one hour means that an application shouldn't be down for more than one hour. A downtime of more than an hour would result in critical financial, reputation, or data loss.

The choice of HA and DR solution depends on the RTO. If an application has a four-hour RTO, you can recover the database using backups (if backups are being done every two hours or so), and you may not need any HA and DR solution. However, if the RTO is 15 minutes, then backups won't work, and an HA and DR solution will be needed.

Recovery Point Objective

Recovery point objective, or RPO, defines how much data loss a business can tolerate during an outage. For example, an RPO of two hours would mean a data loss of two hours won't cost anything to the business; however, if it goes beyond that, it would have significant financial or reputation impacts.

Essentially, this is the time difference between the last transaction committed before downtime and the first transaction committed after recovery.

The choice of HA and DR solution also depends on the RPO. If an application has 24 hours of RPO, daily full backups are good enough; however, for a business with four hours of RPO, daily full backups are not enough.

To differentiate between RTO and RPO, let's consider a scenario. A company has an RTO of one hour and an RPO of four hours. There's no HA and DR solution, and backups are being done every 12 hours.

In the case of an outage, the company was able to restore the database from the last full backup in one hour, which is within the given RTO of one hour; however, they suffered a data loss as the backups are being done every 12 hours and the RPO is of four hours.