Microsoft SharePoint 2013 Disaster Recovery Guide

Microsoft SharePoint 2013 Disaster Recovery Guide

By : Peter Ward

Buy this Book

Microsoft SharePoint 2013 Disaster Recovery Guide

By: Peter Ward

Buy this Book

Overview of this book

Where does it all go wrong with disaster recovery? Yes, why a disaster recovery plan fails the business and costs IT staff their jobs or a promotion? This book is an easytounderstand guide that explains how to get it right and why it often goes wrong. Given that Microsoft's SharePoint platform has become a missioncritical application where business operations just cannot run without complete uptime of this technology, disaster recovery is one of the most important topics when it comes to SharePoint. Yet, support and an appropriate approach for this technology are still difficult to come by, and are often vulnerable to technical oversight and assumptions. Microsoft SharePoint 2013 Disaster Recovery Guide looks at SharePoint disaster recovery and breaks down the mystery and confusion that surrounds what is a vital activity to any technical deployment. This book provides a holistic approach with practical recipes that will help you to take advantage of the new 2013 functionality and cloud technologies. You will also learn how to plan, test, and deploy a disaster recovery environment using SharePoint, Windows Server, and SQL tools. We will also take a look at datasets and custom development. If you want to have an approach to disaster recovery that gives you peace of mind, then this is the book for you.

Microsoft SharePoint 2013 Disaster Recovery Guide

Credits

Foreword

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Planning and Key Concepts – What Not to Forget

Identifying DR scenarios within SharePoint and its associated technology stack

Why disasters happen and what you can do to prevent them?

Success or failure

Inheriting a mission critical environment that has no DR plans

Disaster Recovery – cost versus speed

Thinking of interruptions and not disasters

Four major datacenter outages in 2012 that we can learn from

What is virtualization and how does it help with DR?

Supporting mixed environments more efficiently with virtualized disaster recovery

Building confidence and refining DR plans with frequent testing

Summary

Creating, Testing, and Maintaining the DR Plan

Getting started

Identifying the components of your SharePoint environment

Identifying threats to your SharePoint environment

Creating an effective DR plan

Testing your DR plan

Maintaining your DR plan

Further reading

Summary

Physical Backup and Restore Procedures

Windows Server 2012

System database backup and restore

Non-SharePoint database backup and restore

Point in time backup and restore

Advanced backup techniques

Further reading

Summary

Virtual Environment Backup and Restore Procedures

Virtual environments

Summary

Central Administration and Other Native Backup and Restore Options

Farm backup and restore

Farm configuration backup and restore

Web application backup and restore

Service application backup and restore

Content database backup and restore

Customizations backup and restore

Site collection backup and restore

Apps backup and restore

Sites, lists, and libraries – backup and restore

Summary

Working with Data Sizing and Data Structure

Understanding data sizing architectural choices for DR

DR impact of design decisions

Getting a handle on a farm

Managing growth

Architecting data in SharePoint with DR in mind

Further reading

Summary

Disaster Recovery with Custom Development

The basics

The 3 Cs of SharePoint Development

Accounting for things

Change Management and SharePoint

SharePoint 2013 App Development Model

Summary

Disaster Recovery Techniques for End Users

Why is end user DR training often forgotten?

Useful end user DR practices

Managing end user expectations

Training

Summary

In the Clouds

DR – on-premise versus cloud

DR – cloud versus cloud-native

Common concerns regarding cloud DR

Cloud responsibility

General approaches to cloud DR

Amazon Web Services and HA/DR

Windows Azure and HA/DR

Summary

Where to Start

How to get my organization moving in the right direction

How to sell DR to senior management

I feel the SharePoint end users don't care about SharePoint DR. Is this true?

I have written the DR plan but will it work?

What are the key skills that are required for a DR plan to work?

How do you write up the perfect DR documentation?

Can this whole process be outsourced to an external party?

Further reading

Summary

Appendix

Worst and best practices

Horror stories that the authors have witnessed

How and why assumptions can sink a DR plan

Real-world scenarios for consideration

Useful references

Naming conventions

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Disaster Recovery – cost versus speed

When choosing a DR approach, organizations rely on the level of service required, as measured by two recovery objectives:

Recovery time objective (RTO): This is the amount of time between an outage and the restoration of operations
Recovery point objective (RPO): This is the point in time where data is restored and it reflects the amount of data that will be ultimately lost during the recovery process

The preceding objectives are still relevant with SharePoint and the amount of money the business is willing to spend. This is covered in depth in Chapter 2, Creating, Testing, and Maintaining the DR Plan, and Chapter 6, Working with Data Sizing and Data Structure.

With dedicated and shared DR models, organizations are often forced to make trade-offs between cost and speed. As the necessity to achieve high availability and reduce costs continues to increase, organizations can no longer accept trade-offs, that is, a bank, for example, cannot use a cold standby model because it's cheaper, the C-level executives, that is, your CIO is going to want to know why it took 4 or 5 days to recover and why was there loss of data costing your organization possibly thousands of dollars. There is no set rule for this, except how much is your organization willing to pay and how much data loss is acceptable that is the formula.

Most organizations where SharePoint is mission critical use a hot standby; this is a duplicate farm in a DR datacentre. Depending on how much downtime is acceptable to your organization and how much time you want to spend on maintaining both farms synchronized, you would make the following decisions:

Just have three servers running and the rest turned off, and in the case of a disaster you would turn on the rest of the servers, and add whatever solutions and patches need to be added.
Have all your servers live all the time; this is much faster but obviously more expensive
Have all your servers live all the time and use a third-party tool, such as Metalogix Replicator (C) for real time synchronization

I was the lead architect for recovery.gov. They have 45 servers on the AWS cloud in one region and 45 servers in their DR region. Although all the servers are live, it is not an active active environment; it is an active passive environment.

In case of a disaster, they would need to fail over to their DR farm manually, this is about a 1 hour window that is expectable to them. So you see the decision is yours; what is an acceptable loss of data and what is an acceptable amount of down-time?

While DR was originally intended for critical back-office processes, many organizations are now dependent on real-time enterprise applications like SharePoint that handle everything from their internet, intranet and extranet which are primary interfaces for their clients and employees. The cost of a minute of downtime may cost them thousands of dollars.

Standby datacentres are required for scenarios where local redundant systems and backups cannot recover from the outage at the primary datacentre. The time to get a farm up and running in a different location is often known as a hot, warm, or cold standby. Our definitions for these farm recovery datacentres are as follows:

Cold standby: A redundancy method that involves having one system as a backup for another identical primary system that can provide availability within hours or days.
Warm standby: A redundancy method that involves having one system running in the background of an identical primary system that can provide availability within minutes or hours.
Hot standby: A redundant method of having one system running simultaneously with another identical primary system that can provide availability within seconds or minutes.

Each of these standby datacentres have an associated cost to operate and maintain.

Cold standby DR strategy: A business ships backups to an offsite storage site regularly, and has contracts in place for emergency server rentals.
Pros:
- The cheapest option to maintain, operationally.
Cons:
- The slowest option to recover.
- Often an expensive option to recover, because it requires that physical servers be configured correctly after a disaster has occurred.
- Some datacentres do not have the SharePoint expertise in house to deploy and configure your farm, so you will need to implement a solution to facilitate this, such as Microsoft's System Center Data Protection Manager or PowerShell script. You may still run into problems such as the hardware not being the same, this can cause all sorts of problems and delays.
Warm standby DR strategy: A business ships/uploads backups or virtual machine images to local and regional disaster recovery farms.
Pros:
- Often fairly inexpensive to recover, because a virtual server farm can require little configuration upon recovery.
Cons:
- Can be very expensive and time consuming to maintain.
- You pay lots of money in storage fees, that is, if you take a backup of one of your servers and it is 90 GB in size, the virtual machine will be 90 GB in size; multiply that by 6 or 10 servers and the cost of uploading that data every time you send the datacentre a new backup not to mention the cost of having them upload those images and of course test them at least once a month. (Remember: if you haven't tested it and had a successful restore it is not a good DR plan it's a shot in the dark.)
Hot standby DR strategy: A business runs multiple datacentres, but serves content and services through only one datacentre.
Pros:
- It is often fairly fast to recover. If you are using third-party tools, such as Metalogix Replicator (C), that can synchronize two or more distant SharePoint farms in real time you can ensure that SharePoint content is always available and up-to-date. Bi-directional replication syncs all your SharePoint content; documents, sites, applications, permissions, and workflows with full metadata, versioning, and permissions. Replicator can sync immediately after changes happen or on a regular schedule.
Cons:
- Can be very expensive to configure and maintain, that is, you have to add the cost of all the Microsoft licensing and third party tools like Replicator.
Note
It does not matter which of the preceding DR solutions you decide to implement, there will probably be some data loss, as seen in the following examples unless you are using third-party tools, such as Metalogix Replicator.

Cold standby recovery

In a cold standby disaster recovery scenario, you have to recover by setting up a new farm in your cold standby datacentre and restore the backups that you have stored there. In this scenario, if your primary farm fails before you get to make the backups to ship out to the cold standby datacentre, you will lose all the data added or changed since your last backup.

Warm standby recovery

In a warm standby disaster recovery scenario, you have to create a duplicate farm in the warm standby datacentre and ensure that it is updated regularly by using full and incremental backups of the farm in the primary datacentre. This requires some continuous monitoring, server maintenance, SharePoint upgrades, and other data activity to keep the environment warm. In the event of a failure, you will lose all the data added or changed since your last backup.

Virtual warm standby environments

Virtualization provides a cost effective option for a warm standby recovery solution. Typically, you can use Hyper-V or VMware as an in-house solution for recovery. This is explained in further detail in Chapter 4, Virtual Environment Backup and Restore Procedures. But even this has its downside. If it takes two days for the VMs or backups to get to the DR datacentre or to upload all the VMs to the DR datacenter, your backups are now two days out of date.

Otherwise, you have to make sure that the virtual images are created often enough to provide the level of farm configuration and content freshness that you must have for recovering the farm at the secondary DR site. You must have an environment available in which you can host the VMs. We will dig a bit deeper into virtualization technologies later in this chapter.

Hot standby recovery

In a hot standby disaster recovery scenario, you have to create a duplicate farm in the hot standby datacentre, so that it can assume production operations almost immediately after the primary farm fails. This requires a third-party tool, such as Metalogix Replicator for real time synchronization.

Note

For more information on Metalogix Replicator visit, http://www.metalogix.com/Products/Replicator/Replicator-for-SharePoint.aspx.

Both the RTO and RPO approaches include shared and dedicated models. These are explained below.

Dedicated model

In a dedicated model, the infrastructure is dedicated to a single organization. Compared to other traditional models, this can offer a faster time for recovery, because the IT infrastructure is mirrored at the disaster recovery site and is ready to be called upon in the event of a disaster. While this model can reduce RTO because the hardware and software are preconfigured, it does not eliminate all delays. You still need to restore the data. This approach is costly because the hardware sits idle when not being used for disaster recovery. Some organizations use the DR infrastructure for development and testing, to mitigate the cost, but that introduces additional risk. When organizations start using their DR site for development or test, it becomes a huge problem because when the time comes to use it for an actual disaster, the farms are not the same; they are drastically different. There are solutions that were not maintained or documented correctly and now you are in a bind.

Note

There are also SharePoint license costs to consider. Yes, you still have to have any servers that are receiving data fully licensed while they are idle.

Shared model

In a shared model, the infrastructure is shared among multiple organizations so it is more cost effective. After a disaster is declared, the hardware, the operating system, and the application software at the disaster site must be configured from the ground up to match the IT site that has declared a disaster. On top of that, the data restoration process must be completed. This can take hours or even days.

This is normally a service provided by the company that is managing your data operations.

Hybrid model

There is a hybrid model, where a certain SharePoint technology such as SQL Server leverages a DR process from another application; this does reduce costs, but of course both DR plans need to be in sync. This can also become very complex; how do you separate the two and when it comes to restoring what is the process? I personally don't like this model because of its complexity, and as a best practice it is never a good idea to add any other database to your SharePoint SQL Server.

Microsoft SharePoint 2013 Disaster Recovery Guide

By : Peter Ward

Microsoft SharePoint 2013 Disaster Recovery Guide

By: Peter Ward

Overview of this book

Related Content you might be interested in

Current Title:

Microsoft SharePoint 2013 Disaster Recovery Guide

Disaster Recovery – cost versus speed

Note

Cold standby recovery

Warm standby recovery

Virtual warm standby environments

Hot standby recovery

Note

Dedicated model

Note

Shared model

Hybrid model