Book Image

Microsoft SharePoint 2013 Disaster Recovery Guide

By : Peter Ward
Book Image

Microsoft SharePoint 2013 Disaster Recovery Guide

By: Peter Ward

Overview of this book

Where does it all go wrong with disaster recovery? Yes, why a disaster recovery plan fails the business and costs IT staff their jobs or a promotion? This book is an easytounderstand guide that explains how to get it right and why it often goes wrong. Given that Microsoft's SharePoint platform has become a missioncritical application where business operations just cannot run without complete uptime of this technology, disaster recovery is one of the most important topics when it comes to SharePoint. Yet, support and an appropriate approach for this technology are still difficult to come by, and are often vulnerable to technical oversight and assumptions. Microsoft SharePoint 2013 Disaster Recovery Guide looks at SharePoint disaster recovery and breaks down the mystery and confusion that surrounds what is a vital activity to any technical deployment. This book provides a holistic approach with practical recipes that will help you to take advantage of the new 2013 functionality and cloud technologies. You will also learn how to plan, test, and deploy a disaster recovery environment using SharePoint, Windows Server, and SQL tools. We will also take a look at datasets and custom development. If you want to have an approach to disaster recovery that gives you peace of mind, then this is the book for you.
Table of Contents (19 chapters)
Microsoft SharePoint 2013 Disaster Recovery Guide
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface
4
Virtual Environment Backup and Restore Procedures
Index

Disaster Recovery – cost versus speed


When choosing a DR approach, organizations rely on the level of service required, as measured by two recovery objectives:

  • Recovery time objective (RTO): This is the amount of time between an outage and the restoration of operations

  • Recovery point objective (RPO): This is the point in time where data is restored and it reflects the amount of data that will be ultimately lost during the recovery process

The preceding objectives are still relevant with SharePoint and the amount of money the business is willing to spend. This is covered in depth in Chapter 2, Creating, Testing, and Maintaining the DR Plan, and Chapter 6, Working with Data Sizing and Data Structure.

With dedicated and shared DR models, organizations are often forced to make trade-offs between cost and speed. As the necessity to achieve high availability and reduce costs continues to increase, organizations can no longer accept trade-offs, that is, a bank, for example, cannot use a cold standby model because it's cheaper, the C-level executives, that is, your CIO is going to want to know why it took 4 or 5 days to recover and why was there loss of data costing your organization possibly thousands of dollars. There is no set rule for this, except how much is your organization willing to pay and how much data loss is acceptable that is the formula.

Most organizations where SharePoint is mission critical use a hot standby; this is a duplicate farm in a DR datacentre. Depending on how much downtime is acceptable to your organization and how much time you want to spend on maintaining both farms synchronized, you would make the following decisions:

  • Just have three servers running and the rest turned off, and in the case of a disaster you would turn on the rest of the servers, and add whatever solutions and patches need to be added.

  • Have all your servers live all the time; this is much faster but obviously more expensive

  • Have all your servers live all the time and use a third-party tool, such as Metalogix Replicator (C) for real time synchronization

I was the lead architect for recovery.gov. They have 45 servers on the AWS cloud in one region and 45 servers in their DR region. Although all the servers are live, it is not an active active environment; it is an active passive environment.

In case of a disaster, they would need to fail over to their DR farm manually, this is about a 1 hour window that is expectable to them. So you see the decision is yours; what is an acceptable loss of data and what is an acceptable amount of down-time?

While DR was originally intended for critical back-office processes, many organizations are now dependent on real-time enterprise applications like SharePoint that handle everything from their internet, intranet and extranet which are primary interfaces for their clients and employees. The cost of a minute of downtime may cost them thousands of dollars.

Standby datacentres are required for scenarios where local redundant systems and backups cannot recover from the outage at the primary datacentre. The time to get a farm up and running in a different location is often known as a hot, warm, or cold standby. Our definitions for these farm recovery datacentres are as follows:

  • Cold standby: A redundancy method that involves having one system as a backup for another identical primary system that can provide availability within hours or days.

  • Warm standby: A redundancy method that involves having one system running in the background of an identical primary system that can provide availability within minutes or hours.

  • Hot standby: A redundant method of having one system running simultaneously with another identical primary system that can provide availability within seconds or minutes.

Each of these standby datacentres have an associated cost to operate and maintain.

  • Cold standby DR strategy: A business ships backups to an offsite storage site regularly, and has contracts in place for emergency server rentals.

    Pros:

    • The cheapest option to maintain, operationally.

    Cons:

    • The slowest option to recover.

    • Often an expensive option to recover, because it requires that physical servers be configured correctly after a disaster has occurred.

    • Some datacentres do not have the SharePoint expertise in house to deploy and configure your farm, so you will need to implement a solution to facilitate this, such as Microsoft's System Center Data Protection Manager or PowerShell script. You may still run into problems such as the hardware not being the same, this can cause all sorts of problems and delays.

  • Warm standby DR strategy: A business ships/uploads backups or virtual machine images to local and regional disaster recovery farms.

    Pros:

    • Often fairly inexpensive to recover, because a virtual server farm can require little configuration upon recovery.

    Cons:

    • Can be very expensive and time consuming to maintain.

    • You pay lots of money in storage fees, that is, if you take a backup of one of your servers and it is 90 GB in size, the virtual machine will be 90 GB in size; multiply that by 6 or 10 servers and the cost of uploading that data every time you send the datacentre a new backup not to mention the cost of having them upload those images and of course test them at least once a month. (Remember: if you haven't tested it and had a successful restore it is not a good DR plan it's a shot in the dark.)

  • Hot standby DR strategy: A business runs multiple datacentres, but serves content and services through only one datacentre.

    Pros:

    • It is often fairly fast to recover. If you are using third-party tools, such as Metalogix Replicator (C), that can synchronize two or more distant SharePoint farms in real time you can ensure that SharePoint content is always available and up-to-date. Bi-directional replication syncs all your SharePoint content; documents, sites, applications, permissions, and workflows with full metadata, versioning, and permissions. Replicator can sync immediately after changes happen or on a regular schedule.

    Cons:

    • Can be very expensive to configure and maintain, that is, you have to add the cost of all the Microsoft licensing and third party tools like Replicator.

    Note

    It does not matter which of the preceding DR solutions you decide to implement, there will probably be some data loss, as seen in the following examples unless you are using third-party tools, such as Metalogix Replicator.

Cold standby recovery

In a cold standby disaster recovery scenario, you have to recover by setting up a new farm in your cold standby datacentre and restore the backups that you have stored there. In this scenario, if your primary farm fails before you get to make the backups to ship out to the cold standby datacentre, you will lose all the data added or changed since your last backup.

Warm standby recovery

In a warm standby disaster recovery scenario, you have to create a duplicate farm in the warm standby datacentre and ensure that it is updated regularly by using full and incremental backups of the farm in the primary datacentre. This requires some continuous monitoring, server maintenance, SharePoint upgrades, and other data activity to keep the environment warm. In the event of a failure, you will lose all the data added or changed since your last backup.

Virtual warm standby environments

Virtualization provides a cost effective option for a warm standby recovery solution. Typically, you can use Hyper-V or VMware as an in-house solution for recovery. This is explained in further detail in Chapter 4, Virtual Environment Backup and Restore Procedures. But even this has its downside. If it takes two days for the VMs or backups to get to the DR datacentre or to upload all the VMs to the DR datacenter, your backups are now two days out of date.

Otherwise, you have to make sure that the virtual images are created often enough to provide the level of farm configuration and content freshness that you must have for recovering the farm at the secondary DR site. You must have an environment available in which you can host the VMs. We will dig a bit deeper into virtualization technologies later in this chapter.

Hot standby recovery

In a hot standby disaster recovery scenario, you have to create a duplicate farm in the hot standby datacentre, so that it can assume production operations almost immediately after the primary farm fails. This requires a third-party tool, such as Metalogix Replicator for real time synchronization.

Note

For more information on Metalogix Replicator visit, http://www.metalogix.com/Products/Replicator/Replicator-for-SharePoint.aspx.

Both the RTO and RPO approaches include shared and dedicated models. These are explained below.

Dedicated model

In a dedicated model, the infrastructure is dedicated to a single organization. Compared to other traditional models, this can offer a faster time for recovery, because the IT infrastructure is mirrored at the disaster recovery site and is ready to be called upon in the event of a disaster. While this model can reduce RTO because the hardware and software are preconfigured, it does not eliminate all delays. You still need to restore the data. This approach is costly because the hardware sits idle when not being used for disaster recovery. Some organizations use the DR infrastructure for development and testing, to mitigate the cost, but that introduces additional risk. When organizations start using their DR site for development or test, it becomes a huge problem because when the time comes to use it for an actual disaster, the farms are not the same; they are drastically different. There are solutions that were not maintained or documented correctly and now you are in a bind.

Note

There are also SharePoint license costs to consider. Yes, you still have to have any servers that are receiving data fully licensed while they are idle.

Shared model

In a shared model, the infrastructure is shared among multiple organizations so it is more cost effective. After a disaster is declared, the hardware, the operating system, and the application software at the disaster site must be configured from the ground up to match the IT site that has declared a disaster. On top of that, the data restoration process must be completed. This can take hours or even days.

This is normally a service provided by the company that is managing your data operations.

Hybrid model

There is a hybrid model, where a certain SharePoint technology such as SQL Server leverages a DR process from another application; this does reduce costs, but of course both DR plans need to be in sync. This can also become very complex; how do you separate the two and when it comes to restoring what is the process? I personally don't like this model because of its complexity, and as a best practice it is never a good idea to add any other database to your SharePoint SQL Server.