Microsoft Exchange Server 2013 High Availability

Before we delve into how to design and deploy a highly available Exchange 2013 infrastructure, we need to look at the architecture changes and improvements made over the previous editions of Exchange.

Note

This is not an extensive list of all the improvements introduced in Exchange 2013. Only the main changes are mentioned as well as those relevant for high availability.

Looking at the past

In the past, Exchange has often been architected and optimized with consideration to a few technological constraints. An example is the key constraint that led to the creation of different server roles in Exchange 2007: CPU performance. A downside of this approach is that server roles in Exchange 2007/2010 are tightly coupled together. They introduce version dependency, geo-affinity (which requires several roles to be present in a specific Active Directory site), session affinity (often requiring a complex and expensive load-balancing solution), and namespace complexity.

Today, memory and CPU are no longer the constraining factors as they are far less expensive. As such, the primary design goals for Exchange 2013 became failure isolation, improved hardware utilization, and simplicity to scale.

Let us start by having a quick look at how things have evolved since Exchange 2000.

Exchange 2000/2003

In Exchange 2000 and 2003, server tasks are distributed among frontend and backend servers, with frontend servers accepting client requests and proxying them for processing by the appropriate backend server. This includes proxying RPC-over-HTTP (known as Outlook Anywhere), HTTP/S (Outlook Web App (OWA)), POP, and IMAP clients. However, internal Outlook clients do not use the frontend servers as they connect directly to the backend servers using MAPI-over-RPC.

An advantage of this architecture over Exchange 5.5 is that it allows the use of a single namespace such as mail.domain.com. This way, users do not need to know the name of the server hosting their mailbox. Another advantage is the offloading of SSL encryption/decryption to the frontend servers, freeing up the backend servers from this processor-intensive task.

While the frontend server is a specially configured Exchange server, there is no special configuration option to designate a server as a backend server.

High availability was achieved on the frontend servers by using Network Load Balancing (NLB) and by configuring backend servers in an active/active or active/passive cluster.

Exchange 2007

While in Exchange 2003 the setup process installs all features regardless of which ones the administrators intend to use, in Exchange 2007, Microsoft dramatically changed the server roles architecture by splitting the Exchange functionality into five different server roles:

Mailbox server (MBX): This role is responsible for hosting mailbox and public folder data. This role also provides MAPI access for Outlook clients.
Client Access Server (CAS): This role is responsible for optimizing the performance of mailbox servers by hosting client protocols such as POP, IMAP, ActiveSync, HTTP/S, and Outlook Anywhere. It also provides the following services: Availability, Autodiscover, and web services. Compared to Exchange 2000 or 2003 frontend servers, this role is not just a proxy server. For example, it processes ActiveSync policies and does OWA segmentation, and it also renders the OWA User Interface. All client connections with the exception of Outlook (MAPI) use the CAS as the connection endpoint, offloading a significant amount of processing that occurs against backend servers in Exchange 2000 or 2003.
Unified Messaging Server: This role is responsible for connecting a Private Branch eXchange (PBX) telephony system to Exchange.
Hub Transport Server (HTS): This role is responsible for routing e-mails within the Exchange organization.
Edge Transport Server: This role is typically placed in the perimeter of an organization's network topology (DMZ) and is responsible for routing e-mails in and out of the Exchange organization.

Each of these server roles logically group specific features and functions, allowing administrators to choose which ones to install on an Exchange server so that they can configure a server the way they intend to use it. This offers other advantages such as reduced attack surface, simpler installation, full customization of servers to support the business needs, and the ability to designate hardware according to the role since each role has different hardware requirements.

This separation of roles also means that high availability and resilience is achieved using different methods depending on the role: by load-balancing CASs (either using NLB or hardware/software load-balancing solutions), by deploying multiple Unified Messaging and Hub Transport Servers per Active Directory site, and at the Mailbox server level by using Local Continuous Replication, Standby Continuous Replication, or through the cluster technologies of Cluster Continuous Replication or Single Copy Cluster.

Exchange 2010

In terms of server roles, Exchange 2010 has the same architecture, but under the hood, it takes a step further by moving Outlook connections to the CAS role as well. This means there will be no more direct connections to Mailbox servers. This way, all data access occurs over a common and single path, bringing several advantages, such as the following:

Improved consistency
Better user experience during switch and fail-over scenarios as Outlook clients are connected to a CAS and not to the mailbox server hosting their mailbox
Support for more mailboxes per mailbox server
Support for more concurrent connections

The downside is that this change greatly increases the complexity involved in load-balancing CASs as these devices now need to load-balance RPC traffic as well.

To enable a quicker reconnect time to a different CAS when the server that a client is connected to fails, Microsoft introduced the Client Access array feature, which is typically an array of all CASs in the Active Directory (AD) site where the array is created. Instead of users connecting to the Fully Qualified Domain Name (FQDN) of a particular CAS, Outlook clients connect to the FQDN of the CAS array itself, which typically has a generic name such as outlook.domain.com.

Exchange 2010 includes many other changes to its core architecture. New features including shadow redundancy and transport dumpster provide increased availability and resilience, but the biggest change of all is the introduction of Database Availability Group (DAG)—the base component of high availability and site resilience of Exchange 2010. A DAG is a group of (up to 16) Mailbox servers that hosts databases and provides automatic database-level recovery by replicating database data between the members of the DAG. As each server in a DAG can host a copy of any database from any other server in the same DAG, each mailbox database is now a unique global object in the Exchange organization, while in Exchange 2007, for example, databases were only unique to the server hosting them. DAGs provide automatic recovery from failures that can go from a single database to an entire server.

The following diagram provides an overview of the evolution from Exchange 2003 to Exchange 2010:

Exchange 2013

While looking back at ways of improving Exchange, Microsoft decided to address three main drawbacks of Exchange 2010:

Functionality scattered across all different server roles, forcing HTS and CASs to be deployed in every Active Directory site where Mailbox servers are deployed.
Versioning between different roles, meaning a lower version of HTS or CAS should not communicate with a higher version of the Mailbox server. Versioning restrictions also indicate that administrators cannot simply upgrade a single server role (such as a Mailbox server) without upgrading all first.
Geographical affinity, where a set of users served by a given Mailbox server is always served by a given set of CAS and HTS servers.

Enter Exchange 2013, which introduces major changes once more, and we seem to be back in the Exchange 2000/2003 era, only in a far improved way as one would expect. In order to address the issues just mentioned, Microsoft introduced impressive architectural changes and investments across the entire product. Hardware expansion was seen as a limiting boundary from a memory and disk perspective. At the same time, CPU power keeps increasing, which a separate role architecture does not take advantage of, thus introducing a potential for the consolidation of server roles once more.

The array of Client Access servers and Database Availability Group are now the only two basic building blocks, each providing high availability and fault tolerance, but now decoupled from one another. When compared with a typical Exchange 2010 design, the differences are clear:

To accomplish this, the number of server roles has been reduced to three, providing an increased simplicity to scale, isolation of failures, and improved hardware utilization:

The Mailbox server role hosts mailbox databases and handles all activity for a given mailbox. In Exchange 2013, it also includes the Transport service (virtually identical to the previous Hub Transport Server role), Client Access protocols, and the Unified Messaging role.
The Client Access Server role includes the new Front End Transport service and provides authentication, proxy, and redirection services without performing any data rendering. This role is now a thin and stateless server which never queues or stores any data. It provides the usual client-access protocols: HTTP (Outlook Anywhere, Web Services, and ActiveSync), IMAP, POP, and SMTP. MAPI is no longer provided as all MAPI connections are now encapsulated using RPC-over-HTTPS.
The Edge server role, added only in Service Pack 1, brings no additional features when compared to a 2010 Edge server. However, it is now only configurable via PowerShell in order to minimize its attack surface, as adding a user interface would require Internet Information Services, virtual directories, opening more ports, and so on.

In this new architecture, the CAS and the Mailbox servers are not as dependent on one another (role affinity) as in the previous versions of Exchange. This is because all mailbox-processing occurs only on the Mailbox server hosting the mailbox. As data rendering is performed local to the active database copy, administrators no longer need to be concerned about incompatibility of different versions between CAS and Mailbox servers (versioning). This also means that a CAS can be upgraded independently to Mailbox servers and in any order.

Encapsulating MAPI connections using RPC-over-HTTPS and changing the CAS role to perform pure connection-proxying means that advanced layer 7 load-balancing is no longer required. As connections are now stateless, CASs only accept connections and forward them to the appropriate mailbox server. Therefore, session affinity is not required at the load-balancer; only layer 4 TCP with source IP load-balancing is required. This means that if a CAS fails, the user session can simply be transferred to another CAS because there is no session affinity to maintain.

Note

All these improvements bring another great advantage: users no longer need to be serviced by CASs located within the same Active Directory site as that of the Mailbox servers hosting their mailboxes.

Other changes introduced in Exchange 2013 include the relegation of RPC. Now, all Outlook connections are established using RPC-over-HTTP (Outlook Anywhere). With this change, CASs no longer need to have the RPC client-access service, resulting in a reduction from two different namespaces normally required for a site-resilient solution. As we will see in Chapter 4, Achieving Site Resilience, a site-resilient Exchange infrastructure is deployed across two or more datacenters in a way that it is able to withstand the failure of a datacenter and still continue to provide messaging services to the users.

Outlook clients no longer connect to the FQDN of a CAS or CAS Array. Outlook uses Autodiscover to create a new connection point involving the mailbox GUID, in addition to the @ symbol and the domain portion of the user's primary SMTP address (for example [email protected]). Because the GUID does not change no matter where the mailbox is replicated, restored, or switched over to, there are no client notifications or changes. The GUID abstracts the backend database name and location from the client, resulting in a near elimination of the message Your administrator has made a change to your mailbox. Please restart Outlook.

The following diagram shows the client protocol architecture of Exchange 2013:

As shown in the preceding diagram, Exchange Unified Messaging (UM) works slightly differently from the other protocols. First, a Session Initiation Protocol (SIP) request is sent to the UM call router residing on a CAS, which answers the request and sends an SIP redirection to the caller, who then connects to the mailbox via SIP and Real-time Transport Protocol (RTP) directly. This is due to the fact that the real-time traffic in the RTP media stream is not suitable for proxying.

There are also other great improvements made to Exchange 2013 at the Mailbox server level. The following are some of them:

Improved archiving and compliance capabilities
Much better user experience across multiple devices provided by OWA
New modern public folders that take advantage of the DAG replication model
50 percent to 70 percent reduction in IOPS compared to Exchange 2010, and around 99 percent compared to Exchange 2003
A new search infrastructure called Search Foundation based on the FAST search engine
The Managed Availability feature and changes made to the Transport pipeline

In terms of high availability, only minor changes have been made to the mailbox component from Exchange 2010 as DAGs are still the technology used. Nonetheless, there have been some big improvements:

The Exchange Store has been fully rewritten
There is a separate process for each database that is running, which allows for isolation of storage issues down to a single database
To do code enhancements around transaction logs and a deeper checkpoint on passive databases, failover times have been reduced

Another great improvement is in terms of site resilience. Exchange 2010 requires multiple namespaces in order for an Exchange environment to be resilient across different sites (such as two datacenters): Internet Protocol, OWA fallback, Autodiscover, RPC Client Access, SMTP, and a legacy namespace while upgrading from Exchange 2003 or Exchange 2007. It does allow the configuration of a single namespace, but it requires a Global Load Balancer and additional configuration at the Exchange level. With Exchange 2013, the minimum number of namespaces was reduced to just two: one for client protocols and one for Autodiscover. While coexisting with Exchange 2007, the legacy hostname is still required, but while coexisting with Exchange 2010, it is not.

This is explored in more depth in Chapter 4, Achieving Site Resilience.

Microsoft Exchange Server 2013 High Availability

By : Nuno Filipe M Mota, Nuno Mota

Microsoft Exchange Server 2013 High Availability

By: Nuno Filipe M Mota, Nuno Mota

Overview of this book

Related Content you might be interested in

Current Title:

Microsoft Exchange Server 2013 High Availability

Introducing the new Exchange architecture

Note

Looking at the past

Exchange 2000/2003

Exchange 2007

Exchange 2010

Exchange 2013

Note