Book Image

Optimizing Microsoft Azure Workloads

By : Rithin Skaria
Book Image

Optimizing Microsoft Azure Workloads

By: Rithin Skaria

Overview of this book

It’s easy to learn and deploy resources in Microsoft Azure, without worrying about resource optimization. However, for production or mission critical workloads, it’s crucial that you follow best practices for resource deployment to attain security, reliability, operational excellence and performance. Apart from these aspects, you need to account for cost considerations, as it’s the leading reason for almost every organization’s cloud transformation. In this book, you’ll learn to leverage Microsoft Well-Architected Framework to optimize your workloads in Azure. This Framework is a set of recommended practices developed by Microsoft based on five aligned pillars; cost optimization, performance, reliability, operational excellence, and security. You’ll explore each of these pillars and discover how to perform an assessment to determine the quality of your existing workloads. Through the book, you’ll uncover different design patterns and procedures related to each of the Well-Architected Framework pillars. By the end of this book, you’ll be well-equipped to collect and assess data from an Azure environment and perform the necessary upturn of your Azure workloads.
Table of Contents (14 chapters)
1
Part 1: Well-Architected Framework Fundamentals
4
Part 2: Exploring the Well-Architected Framework Pillars and Their Principles
10
Part 3: Assessment and Recommendations

What are the pillars of the WAF?

As you read in the previous section, Microsoft has divided its optimization plans, targeting five pillars of architectural excellence. Even though we have dedicated chapters for each of the pillars, for the time being, let’s cover some key concepts related to each of the pillars.

The following figure shows the five pillars of the WAF:

Figure 1.1 – The five pillars of the WAF

Figure 1.1 – The five pillars of the WAF

We will start with the first pillar, cost optimization.

Cost optimization

One of the main reasons for organizations to adopt the cloud is its cost-effectiveness. The total cost of ownership (TCO) is much less in the cloud as the end customer doesn’t need to purchase any physical servers or set up data centers. Due to the agility of the cloud, they can deploy, scale, and decommission as required. With the help of the Azure TCO calculator (https://azure.microsoft.com/en-us/pricing/tco/calculator/), customers can estimate cost savings before migrating to Azure. Once they are migrated, the journey doesn’t end there; migrations mostly go with the lift-and-shift strategy where the workloads are deployed with a similar size as on-premises. The challenge here is that with on-premises, there is no cost for individual VMs or servers as the customer will make a capital investment and purchase the servers. The only cost will be for licensing, maintenance, electricity, cooling, and labor. In the case of Azure, the cost will be pay-as-you-go; for n number of hours, you must pay n times the per-hour cost, and the price of the server varies with size and location. If the servers were wrongly sized on-premises, then during the migration we will replicate that mistake in the cloud. With the servers running underutilized, you are paying extra every hour, every day, and every month. For this reason, we need cost optimization after migration.

It’s recommended that organizations conduct cost reviews every quarter to understand anomalies, plan the budget, and forecast usage. With the help of cost optimization, we will find underutilized and idle resources, often referred to as waste, and eliminate them. Eliminating this waste will improve the cost profile of your workloads and result in cost savings. In Chapter 3, Implementing Cost Optimization, we will assess a demo Azure environment and see how we can develop a remediation plan. Once we figure out the weak points in our infrastructure, we can resize, eliminate, or enforce policies for cost optimization.

Operational excellence

Operations and procedures required to run a production application are covered by operational excellence. When we are deploying our applications to our resources, we need to make sure that we have a reliable, predictable, and repeatable process for deployment. In Azure, we can automate the deployment process, which will eliminate any human errors. Bug fixes can be easily deployed if we have a fast and reliable deployment. Most importantly, whenever there is an issue post-deployment, we can always roll back to the last known good configuration.

In Chapter 4, Achieving Operational Excellence, we will learn about key topics related to operational excellence. For the time being, let’s name the topics and explore them later. The key topics are application design, monitoring, app performance management, code deployment, infrastructure provisioning, and testing.

Operational excellence mainly concentrates on DevOps patterns for application deployment and processes related to deployment. This includes guidance on application design and the build process, as well as automating deployments using DevOps principles.

Performance efficiency

As we saw in the case of cost optimization, we scale the workloads to meet demand with the help of autoscaling; this ability to scale is what we cover in the performance efficiency pillar. In Azure, we can define the minimum number of instances that are adequate to run our application during non-peak hours. During peak hours, we can define an autoscaling policy by which the number of instances can be increased. The increase can be controlled by a metric (CPU, memory, and so on) or a schedule. Nevertheless, we can also define the maximum number of instances to stop scaling after a certain number to control billing. To be honest, this autoscaling scenario was not at all possible before the cloud. Earlier, administrators used to create oversized instances that could handle both peak and non-peak hours. But with Azure, this has changed; the advantage here is that Azure will collect all metrics out of the box, and we can easily figure out bottlenecks.

Proper planning is required to define the scaling requirements. In Azure, how we define scaling varies from resource to resource. Some resource tiers don’t offer autoscaling and you must go with manual scaling, while others don’t support both automatic and manual scaling. One thing to note here is performance efficiency is not only about autoscaling, but it also includes data performance, content delivery performance, caching, and background jobs. Thus, we can infer that this pillar deals with the overall performance efficiency of our application.

In Chapter 5, Improving Applications with Performance Efficiency, we will take a deep dive into performance patterns, practices, and performance checklists.

Reliability

The word “reliability” means consistent performance and, in this context, it means redundant operation of the application. When we build and deploy our applications in Azure, we need to make sure that they are reliable. In our on-premises environment, we use different redundancy techniques to make sure that our application and data are available even if there is a failure. For example, we use Redundant Array of Independent Disks (RAID) on-premises, where we replicate the data using multiple disks to increase data reliability.

In Azure or any other cloud, the first and foremost thing we need to admit is that there are chances of failure and it’s not completely failproof. Keeping this in mind, we need to design our applications in a reliable manner by making use of different cloud features. Incorporating these techniques in the design will avoid a single point of failure (SPOF).

The level of reliability is often driven by the service-level agreement (SLA) required by the application or end users. For example, a single VM with a premium disk offers 99.9% uptime, but if a failure happens on the host server in the Azure data center, your VM will face downtime. Here, we can leverage availability sets or availability zones, which will help you deploy multiple VMs across fault domains/update domains or zones. By doing so, the SLA can be increased to 99.95% for availability sets and 99.99% for availability zones. Always keep in mind that to get this SLA, you need to have at least two VMs deployed across the availability sets or zones. Earlier, we read that the pillars of the WAF are interconnected, and they work hand in hand. However, in this case, if you want to increase reliability, you need to deploy multiple instances of your application, and what that essentially means is your costs will increase. Remember that these pillars work hand in hand, and sometimes there will be trade-offs, as we have seen in this scenario.

Security

Security in public clouds was—and is always—a concern for enterprise customers because of the complexity and the way attackers are coming up with new types of attacks. Coping with these types of attacks is always a challenge, and finding the right skills to mitigate these attacks is not easy for organizations. In Azure, we follow the shared responsibility model; the model defines the responsibilities of Microsoft and its customers based on the technology. If we take Infrastructure-as-a-Service (IaaS) solutions such as VMs, more responsibility is with the customer, and Microsoft is responsible for the security of the underlying infrastructure. The levels of responsibilities will shift more to Microsoft if you choose a Platform-as-a-Service (PaaS) solution.

It’s very important to leverage the different security options provided by Azure to improve the security of our workloads. In the security pillar, we will assess the workloads and make sure they align with the security best practices outlined by Microsoft. As we progress, in Chapter 7, Leveraging the Security Pillar, we will take a holistic approach to security and how to build secure applications.