Book Image

Building Hybrid Clouds with Azure Stack

Book Image

Building Hybrid Clouds with Azure Stack

Overview of this book

Azure Stack is all about creating fewer gaps between on-premise and public cloud application deployment. Azure Stack is the logical progression of Microsoft Cloud Services to create a true hybrid cloud-ready application. This book provides an introduction to Azure Stack and the cloud-first approach. Starting with an introduction to the architecture of Azure Stack, the book will help you plan and deploy your Azure Stack. Next, you will learn about the network and storage options in Azure Stack and you'll create your own private cloud solution. Finally, you will understand how to integrate public cloud using the third-party resource provider. After reading the book, you will have a good understanding of the end-to-end process of designing, offering, and supporting cloud solutions for enterprises or service providers.
Table of Contents (19 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Microsoft Azure Stack


In May 2015, Microsoft formally announced a new solution that brings Azure to your datacenter. This solution was named Microsoft Azure Stack. To put it in one sentence: Azure Stack is the same technology with the same APIs and portal as public Azure, but you could run it in your datacenter or in that of your service providers. With Azure Stack, System Center is completely gone because everything is the way it is in Azure now, and in Azure, there is no System Center at all. This is what the primary focus of this book is.

The following diagram gives a current overview of the technical design of Azure Stack compared with Azure:

The one and only difference between Microsoft Azure Stack and Microsoft Azure is the Cloud infrastructure. In Azure, there are thousands of servers that are part of the solution; with Azure Stack, the number is slightly smaller. That's why there is the Cloud-inspired infrastructure based on Windows Server, Hyper-V, and Azure technologies as the underlying technology stack. There is no System Center product in this stack anymore. This does not mean that it cannot be there (for example, SCOM for on-premise monitoring), but Azure Stack itself provides all functionality with the solution itself.

For stability and functionality, Microsoft decided to provide Azure Stack as a so-called integrated system, so it will come to your door with the hardware stack included. The customer buys Azure Stack as a complete technology stack. At the general availability (GA) stage, the hardware OEMs are HPE, Dell EMC, and Lenovo. In addition to this, there will be a one-host development toolkit available for download that could be run as a proof of concept solution on every type of hardware, as soon as it meets the hardware requirements.

Technical design

Looking at the technical design a bit more in depth, there are some components that we need to dive deeper into:

The general basis of Azure Stack is Windows Server 2016 technology, which builds the cloud-inspired infrastructure:

  • Storage Spaces Direct (S2D)
  • VxLAN
  • Nano Server
  • Azure Resource Manager (ARM)

Storage Spaces Direct

Storage Spaces and Scale-Out File Server were technologies that came with Windows Server 2012. The lack of stability in the initial versions and the issues with the underlying hardware was a bad phase. The general concept was a shared storage setup using JBODs controlled from Windows Server 2012 Storage Spaces servers, and a magic Scale-Out File Server cluster that acted as the single point of contact for storage:

With Windows Server 2016, the design is quite different and the concept relies on a shared-nothing model, even with local attached storage:

This is the storage design Azure Stack has come up with as one of its main pillars.

VxLAN networking technology

With Windows Server 2012, Microsoft introduced Software-defined Networking (SDN) and the NVGRE technology. Hyper-V Network Virtualization supports Network Virtualization using Generic Routing Encapsulation (NVGRE) as the mechanism to virtualize IP addresses. In NVGRE, the virtual machine's packet is encapsulated inside another packet:

VxLAN comes as the new SDNv2 protocol; it is RFC compliant and is supported by most network hardware vendors by default. The Virtual eXtensible Local Area Network (VxLAN) RFC 7348 protocol has been widely adopted in the marketplace, with support from vendors such as Cisco, Brocade, Arista, Dell, and HP. The VxLAN protocol uses UDP as the transport:

Nano Server

Nano Server offers a minimal-footprint, headless version of Windows Server 2016. It completely excludes the graphical user interface, which means that it is quite small, headless, and easy to handle regarding updates and security fixes, but it doesn't provide the GUI expected by customers of Windows Server.

Azure Resource Manager

The magical Azure Resource Manager is a 1-1 bit share with ARM from Azure, so it has the same update frequency and features that are available in Azure, too.

ARM is a consistent management layer that saves resources, dependencies, inputs, and outputs as an idempotent deployment as a JSON file called an ARM template. This template defines the tastes of a deployment, whether be it VMs, databases, websites, or anything else. The goal is that once a template is designed, it can be run on each Azure-based cloud platform, including Azure Stack. ARM provides cloud consistency with the finest granularity, and the only difference between the clouds is the region the template is being deployed to and the corresponding REST endpoints.

ARM not only provides a template for a logical combination of resources within Azure, it also manages subscriptions and role-based access control (RBAC) and defines the gallery, metric, and usage data, too. This means quite simply, that everything that needs to be done with Azure resources should be done with ARM.

Not only does Azure Resource Manager design one virtual machine, it is responsible for setting up one to a bunch of resources that fit together for a specific service. Even ARM templates can be nested; this means they can depend on each other.

When working with ARM, you should know the following vocabulary:

  • Resource: A resource is a manageable item available in Azure.
  • Resource group: A resource group is the container of resources that fit together within a service.
  • Resource provider: A resource provider is a service that can be consumed within Azure.
  • Resource manager template: A resource manager template is the definition of a specific service.
  • Declarative syntax: Declarative syntax means that the template does not define the way to set up a resource; it just defines how the result and the resource itself have the feature to set up and configure itself to fulfill the syntax. To create your own ARM templates, you need to fulfill the following minimum requirements:
    • A test editor of your choice
    • Visual Studio Community edition
    • Azure SDK

Visual Studio Community edition is available for free from the internet. After setting these things up, you could start it, and define your own templates:

Setting up a simple blank template looks like this:

There are different ways to get a template that you can work on and modify it to fit your needs:

  • Visual Studio templates
  • Quick-start templates on GitHub
  • Azure ARM templates

You could export the ARM template directly from Azure portal if the resource has been deployed:

After clicking on View template, the following opens up:

Note

For further reading on ARM basics, the Getting started with Azure Resource Manager document is a good place to begin:http://aka.ms/GettingStartedWithARM.

PowerShell desired state configuration

In the previous section, we talked about ARM and ARM templates that define resources, but they are unable to design the way a VM looks inside, specify which software needs to be installed, and how the deployment should be done. This is why we need to have a look at VM extensions. VM extensions define what should be done after ARM deployment has finished. In general, the extension could be anything that's a script. The best practice is to use PowerShell and it's add-on called Desired State Configuration (DSC).

DSC defines quite similarly to ARM, how the software needs to be installed and configured. The great concept is, that it also monitors whether the desired state of a virtual machine is changing (for example, because an administrator uninstalls or re-configures a machine). If it does, it makes sure within minutes whether the original state will be fulfilled again and rolls back the actions to the desired state:

Azure Stack VMs

When Azure Stack is deployed, the following VMs are brought up on the Hyper-V hosts:

As of GA, Azure Stack consists of 13 VMs that all have their functions to make Azure Stack work. All of them are Core server instances configured with static resources (up to 8 GB of RAM and 4 vCPUs each). For the multi node environments most of these VMs are redundant and load balanced using the Software Load Balancer (SLB).

A resource provider adds features and functionality to the Azure Stack using a predefined structure, API set, design, and VM.

AzS-ACS01

The ACS01 VM is hosting the Azure Stack storage provider service. It is responsible for one of the most important resource providers. As the underlying storage technology is Storage Spaces Direct, this VM manages it.

If a tenant creates a new resource, it adds the storage account to a resource group. The storage account then manages the different storage services types on the physical host, such as BLOB, page, table, SOFS, ReFS cluster shared volumes, virtual disks, and Storage Spaces Direct. In addition, the storage account is the point to set up security, too. It's possible to add temporary (token based) and long-term (key based) storage.

When it comes to the roles for storage management, Azure Stack provides the following three levels:

  • Storage Tenant Administrator (consumer of storage services).
  • Storage Developer (developer of cloud-storage-based apps).
  • Storage Service Provider (provides storage services for tenants on a shared infrastructure and can be divided into two separate roles):
    • Storage Fabric Administrator (responsible for fabric storage lifecycle)
    • Storage Service Administrator (responsible for cloud storage lifecycle)

The Azure consistent storage can be managed with:

  • REST APIs
  • PowerShell commandlets
  • A modern UI
  • Other tools (scripts and third-party tools)

Storage always needs to be a part of the tenant offer. It's one of the necessary pillars to providing resources within the Azure Stack.

AzS-ADFS01

The ADFS01 VM provides the technical basis for Active Directory Federation Services (ADFS or AD FS), which provides one authentication and authorization model for Azure Stack. Specifically, if a deployment does not rely on Azure AD, there needs to be a feature to authenticate and authorize users from other Active Directory domains.

This VM is the most important one in disconnected scenarios, because it is the ADFS target for internal Active Directory domains connected for identity providers services.

AzS-SQL01

The SQL01 VM provides complete SQL services for Azure Stack. A lot of services need to store data (for example, offers, tenant plans, and ARM templates), and this is the place where it is stored. Compared to other products in the past (such as Windows Azure Pack (WAP)), there is no high load on this service because, it stores only internal data for infrastructure roles.

AzS-BGPNAT01

The BGPNAT01 VM provides NAT and VPN access based on the BGP routing protocol, which is the default for Azure, too. This VM does not exist in multi node deployments and is replaced by the TOR switch (TOR stands for Top of the Rack). As a tenant is able to deploy a VPN device in its Azure Stack-based cloud, and connect it to another on or off premises networking environment, all traffic goes through this VM. Compared to other designs, this VM is at the top of the rack switch and requires the following features:

  • Border Gateway Protocol (BGP): This is the internet protocol for connecting autonomous systems and allows communication.
  • Data Center Bridging (DCB): This is a technology for the Ethernet LAN communication protocol, especially for datacenter environments, for example, for clustering and SAN. It consists of the following subsets:
    • Enhanced Transmission Selection (ETS): This provides a framework for assigning bandwidth priorities to frames.
    • Priority Flow Control (PFC): This provides a link-level flow-control technology for each frame priority.
    • Switch Independent Teaming (SIT): This is a teaming mode launched with Windows Server 2012. The teaming configuration will work with any Ethernet switch, even non-intelligent switches, because the operating system is responsible for the overall technology.

AzS-CA01

CA01 runs the certificate authority services for deploying and controlling certificates for authentication within Azure Stack. As all communication is secured using certificates, this service is mandatory and needs to work properly. Each certificate will be refactored once every 30 days, completely in the Azure Stack management environment.

MAS-DC01

As the complete Azure Stack environment runs in a dedicated Active Directory domain, this VM is the source for all Azure Pack internal authentications and authorizations. As there is no other domain controller available, it's responsible for Flexible Single Master Operation (FSMO) roles and global cataloging, too. It provides the Microsoft Graph resource provider which is a REST endpoint to Active Directory. Finally, it is the VM running the DHCP and DNS services for the Azure Stack environment.

AzS-ERCS01

In case of an issue with Azure Stack itself (so called break the cloud scenario) it may be suitable to receive support from Microsoft. Therefore, there is the MAS-ERCS01 VM which provides the possibility to connect to an Azure Stack deployment using Just Enough Administration (JEA) and Just in Time Administration (JIT) externally.

AzS-Gwy01

The MAS-Gwy01 VM is responsible for site-to-site VPN connections of tenant networks in order to provide in-between network connectivity. It is one of the most important VMs for tenant connectivity.

AzS-NC01

NC01 is responsible for the network controller services. The network controller works based on the SDN capabilities of Windows Server 2016. It is the central control plane for all networking stuff and provides network fault tolerance, and it's the magic key to bringing your own address space for IP addressing (VxLAN and NVGRE technology is supported, but VxLAN is the prioritized one).

Azure Stack uses virtual IP addressing for the following services:

  • Azure Resource Manager
  • Portal-UI (where it's admin or tenant)
  • Storage
  • ADFS and Graph API
  • Key vault
  • Site-to-site endpoints

The network controller (or network resource provider) makes sure that all communication goes its predefined way and is aware of security, priority, high availability, and flexibility.

In addition, it is responsible for all VMs that are part of the networking stack of Azure Stack:

  • AzS-BGPNAT01
  • AzS-Gwy01
  • AzS-SLB01
  • AzS-Xrp01

AzS-SLB01

SLB01 is the VM responsible for all load balancing. With the former product, Azure Pack, there was no real load balancer available as Windows load balancing always had its issues with network devices at the tenant side. Therefore, the only solution for this was adding a third-party load balancer.

With Microsoft Azure, a software load balancer was always present, and SLB01 is again the same one running in public Azure. It has just been moved to Azure Stack. It is responsible for tenant load balancing but also provides high availability for Azure Stack infrastructure services. As expected, providing the SLB to Azure Stack cloud instances means deploying the corresponding ARM template. The underlying technology is load balancing based on hashes. By default, a 5-tuple hash is used, and it contains the following:

  • Source IP
  • Source port
  • Destination IP
  • Destination port
  • Protocol type

The stickiness is only provided within one transport session, and packages of TCP or UDP sessions will always be transferred to the same instance beyond the load balancer. The following chart shows an overview of the hash-based traffic's contribution:

AzS-WASP01

The WASP01 is responsible for the Azure Stack tenant portal and running the Azure Resource Manager services for it.

AzS-WAS01

The VM named WAS01 runs the portal and your Azure Resource Manager instance. It is responsible for running the Azure Stack administrative portal, as you know already from Azure (codenamed Ibiza). In addition, your Azure Resource Manager instance is running on this VM.

ARM is your instance responsible for the design of the services provided in your Azure Stack instance. ARM will make sure that the resources will be deployed the way you design your templates and that they will be running the same throughout their lifecycle.

AzS-XRP01

The XRP01 VM is responsible for the core resource provider: compute, storage, and networks. It holds the registration of these providers and knows how they interact with each other; therefore, it can be called the heart of Azure Stack, too.

Services summary

As you have seen, these VMs provide the management environment of Azure Stack, and they are all available only in one instance. But we all know that scalability means deploying more instances of each service, and as we already have a built-in software load balancer, the product itself is designed for scale. Another way to scale is to implement another Azure Stack integrated system in your environment and provide a place familiar to Azure users. Indeed, there are two ways to scale. A good question is what scale unit will we need: if we need more performance, a scale with more VMs providing the same services is a good choice. The other option is to scale with a second region, which provides geo-redundancy.

(Re)starting an Azure Stack environment

As the Azure Stack integrated system is a set of VMs, we need to talk about what to do when restarting the entire environment. By default, each VM is set to go into saved mode when the environment is being shut down. In general, this should not be a problem because when the environment is restarting, the VMs should recover from the saved mode, too. If there are any delays with the VMs starting, the environment may run into issues and since a complete restart is not always a good idea, the following boot order is the best practice. Between each of these, there should be a delay of 60 seconds. The AD domain controller itself should reboot with the host machine:

  • AzS-BGPNAT01
  • AzS-NC01
  • AzS-SLB01
  • AzS-Gwy01
  • AzS-SQL01
  • AzS-ADFS01
  • AzS-CA01
  • AzS-ACS01
  • AzS-WAS01
  • AzS-WASP01
  • AzS-Xrp01
  • AzS-ERCS01

The shutdown sequence is the other way round.

Note

If you want to make it easy for yourself, I would prefer to set up a PowerShell script for shutting down or restarting the VMs. Thanks to Daniel Neumann (TSP Microsoft Germany), there is a good script available on his blog at http://www.danielstechblog.info/shutdown-and-startup-order-for-the-microsoft-azure-stack-tp2-vms/.

Resource providers

The main concept of the Azure Stack extensibility feature is that a web service is called a resource provider. This concept makes the product itself quite easy to maintain and extend:

Regarding the logical design of Azure Stack shown in the preceding diagram, there are the following resource providers in the product today. The three main ones are as follows:

  • Storage Resource Provider (SRP)
  • Compute Resource Provider (CRP)
  • Network Resource Provider (NRP)

We need to differentiate between them and the following additional resource providers:

  • Fabric Resource Provider (FRP)
  • Health Resource Provider (HRP)
  • Update Resource Provider (URP)

Finally, we have the third-party resource providers. To make sure that each third-party resource provider acts as intended, there is a REST API and a certification for it with Azure Stack by Microsoft.