Book Image

Designing and Implementing Microsoft DevOps Solutions AZ-400 Exam Guide - Second Edition

By : Subhajit Chatterjee, Swapneel Deshpande, Henry Been, Maik van der Gaag
Book Image

Designing and Implementing Microsoft DevOps Solutions AZ-400 Exam Guide - Second Edition

By: Subhajit Chatterjee, Swapneel Deshpande, Henry Been, Maik van der Gaag

Overview of this book

The AZ-400 Designing and Implementing Microsoft DevOps Solutions certification helps DevOps engineers and administrators get to grips with practices such as continuous integration and continuous delivery (CI/CD), containerization, and zero downtime deployments using Azure DevOps Services. This new edition is updated with advanced topics such as site reliability engineering (SRE), continuous improvement, and planning your cloud transformation journey. The book begins with the basics of CI/CD and automated deployments, and then moves ahead to show you how to apply configuration management and Infrastructure as Code (IaC) along with managing databases in DevOps scenarios. As you make progress, you’ll explore fitting security and compliance with DevOps and find out how to instrument applications and gather metrics to understand application usage and user behavior. This book will also help you implement a container build strategy and manage Azure Kubernetes Services. Lastly, you’ll discover quick tips and tricks to confidently apply effective DevOps practices and learn to create your own Azure DevOps organization. By the end of this DevOps book, you'll have gained the knowledge needed to ensure seamless application deployments and business continuity.
Table of Contents (27 chapters)
1
Part 1 – Digital Transformation through DevOps
5
Part 2 – Getting to Continuous Delivery
9
Part 3 – Expanding Your DevOps Pipeline
15
Part 4 – Closing the Loop
18
Part 5 – Advanced Topics

What is DevOps?

If you were to list all of the different definitions and descriptions of DevOps, there would be many. However, as different as these might be, they most likely share several concepts. These are collaboration, continuous delivery of business value, and breaking down silos.

With all the technical discussion in the rest of this book, it is important not to overlook the value proposition for adopting DevOps – namely, that it will help you improve the way that you continuously deliver value to your end users. To do this, you must decrease the time between starting work on a new feature and the first user using it in production. This means that you not only have to write the software but also deliver and operate it.

Over the last decade, the way we write software has fundamentally changed. More and more companies are now adopting an agile way of working to increase the efficiency of their software development. More and more teams are now working in short iterations or sprints to create new increments of a product in quick succession. However, creating potentially shippable increments faster and faster does not create any value by itself. Only when each new version of your software is also released to production and used by your end users does it start delivering value.

In traditional organizations, developers and operators are often located in different departments, and taking software into production includes a hand-off, often with a formal ceremony around it. In such an organization, it can be hard to accelerate that delivery to production, along with the speed at which development can create new versions.

Next to that, the development and operations departments often have conflicting goals. While a development department is rewarded for creating many changes as fast as possible, operations departments are rewarded for limiting downtime and preventing issues. The latter is often best achieved by making as few changes as possible. The conflict here is clear – both departments have optimizations for one subgoal, as shown in the following diagram:

Figure 1.1 – Conflicting goals between development and operations

Figure 1.1 – Conflicting goals between development and operations

This defeats the purpose of these subgoals, which comes from the shared, overarching goal of quickly taking in new versions while maintaining stability. It’s precisely this conflict between developmental and operational goals that is one of the things that should disappear in a DevOps culture. In such a culture, development and operations teams should work together on delivering new versions to production in a fast and reliable manner and share responsibility for both subgoals.

While it is good to know that DevOps is a cultural movement, tools and automation are an important part of that culture. In this book, we will focus on these tools and how to use them to implement many of the practices that come with a DevOps culture. In other words, this book will be mostly about the products and processes associated with DevOps. If you want to learn more about the cultural side of things and the people, there are many other books you can read. A very good read is The Phoenix Project: A Novel About IT, DevOps, And Helping Your Business Win, by Gene Kim.

The rest of this section will explore the relationship between DevOps and Agile to see how they complement each other. The focus will be on agile techniques and prices for work management. We will also discuss the goals and benefits of a DevOps culture.

The relationship between DevOps and Agile

If you take a look at Agile, you may notice that part of it focuses on business value and shortening the time of delivering a new business value. From that perspective, adopting DevOps is a logical next step after Agile. Agile advocates that the software development teams’ responsibilities should extend forward by engaging with users and other stakeholders to deliver valuable and potentially shippable products quicker. DevOps is not just about something that might be shipped, but shipping it as well. With Agile and DevOps combined, you can create an end-to-end and continuous flow of value to your users.

You will need a common approach to managing the work to be done for everyone involved. In the next section, you will find some pointers on how to incorporate operational concerns in the way you manage your work.

Agile work management

When you are starting to increase the collaboration between development and operations, you will quickly notice that they have to cope with different types of work. In development, a large part of the work is planned: user stories and bugs that are picked up from a backlog. On the other hand, for operations, a large part of their work is unplanned. They respond to warnings and alerts from systems and requests or tickets from users or developers.

Integrating these two, especially if developers and operators are located on the same team, can be challenging. To learn how to deal with this, let’s explore the following approach:

  1. First, switch to a flow-based way of working for developers.
  2. Next, allow for operations to also list their work in the same work management system as developers using synchronizations. You can also choose to implement fastlaning, a way to expedite urgent work.
  3. Finally, you may choose to decommission existing ticketing tools for operations if possible.

Fastlaning is an approach to organizing work that allows for both planned and unplanned work by visualizing two separate lanes of work. To do this, the Scrum board is extended with a Kanban-like board on the top. This is the fast lane. On the Kanban board, urgent but unplanned work is added. Any work that’s added to this lane is picked up by the team with the highest priority. Only when there is no work remaining in the fast lane is work from the Scrum board, along with planned work, picked up. Whenever new work is added to the fast lane, this takes priority again. Often, there is the agreement that any work in progress is finished before switching to work in the fast lane.

Important Note

Dependency management is also an important aspect of agile work planning. Hence, teams often make use of the prioritization attribute to qualify more important work for the short term.

Switching to a flow-based methodology

The first thing to consider when switching to a flow-based methodology is transitioning the way developers work from batch-wise to flow-based. An example of a batch-wise way of working is Scrum. If you are using the Scrum framework, you are used to picking up a batch of work every 2 to 4 weeks and focusing on completing all of that work within that time window. Only when that batch is done do you deliver a potentially shippable product.

When changing to a flow-based approach, you try to focus not on a batch, but just on one thing. You work on that one work item and drive it completely until it’s done before you start on the next. This way, there is no longer a sprint backlog, only a product backlog. The advantage of this approach is that you no longer decide which work to perform upfront; whenever you are free to start on new work, you can pick up the next item from the backlog. In an environment where priorities shift quickly, this allows you to react to change quickly.

These changes to the way developers organize their work make it easier to include operations in work management, but there is also another benefit. When developers are focusing on getting a single work item done instead of a whole sprint at once, you can also increase the number of times you can deliver a small portion of value to your users.

Synchronizing work items to one system

Once the development team has changed the way it organizes its work, it should be easier for developers to also list their planned work on the shared backlog and pull work from that backlog when they have time to work on it. They now also have a place where they can list their unplanned work.

However, there may still be an existing ticketing system where requests for operations are dropped by users or automatically created by monitoring tools. While Azure DevOps has a great API to rework this integration to directly create work items in Azure DevOps, you may first choose to create a synchronization between your existing ticketing tool and Azure Boards. There are many integration options available and there is a lot of ongoing work in this area. This way, operators can slowly move from their old tool to the new one, since they are now in sync. Of course, the goal is for them to move over to the same tool, as the developers, completely.

Fastlaning

With the work of developers and operators in the same work management tool, you will notice that you have a mix of planned and unplanned, often urgent, work in the system. To ensure that urgent work gets the attention and precedence it deserves, you can introduce what is called a fast lane to your sprint board. The following screenshot shows an example of an Azure board that has been set up for fastlaning production issues:

Figure 1.2 – Azure Board setup depicting the fast lane

Figure 1.2 – Azure Board setup depicting the fast lane

The horizontal split in this board is only used to work on tasks in the regular lane when there is no work to be picked up in the fast lane.

You can find instructions on how to configure swim lanes in your Azure (Kanban) boards for expediting work at https://docs.microsoft.com/en-us/azure/devops/boards/boards/expedite-work?view=azure-devops.

Decommissioning other work management tools

After creating a shared work management system between development and operations, there is an opportunity to increase the amount of collaboration between them. When this collaboration is taking off, old ticketing systems that were used by operations may now slowly be decommissioned over time. Integrations from monitoring tools can be shifted to the new shared tools, and the number of tickets between developers and operators should slowly decrease as they find new ways of working together.

Important Note

Azure DevOps allows you to customize work item templates, as well as define life cycle states. Using this feature, teams can easily model their work item template types based on any existing taxonomy they might be using in their existing tools. This significantly reduces the learning curve in the adoption of the new shared work management tool. For more information on this, go to https://docs.microsoft.com/en-us/azure/devops/boards/backlogs/work-item-template?view=azure-devops&tabs=browser#manage-work-item-templates.

Goals and benefits of a DevOps culture

At this point, you might be wondering about the point of it all. What are the benefits of DevOps and what’s in it for you, your colleagues, and your organization? The most common goal of adopting DevOps is to achieve a reduction in cycle time. Cycle time is the time between starting work on a new feature and the moment that the first user can use it. The way this is achieved, by automation, also serves the goals of lower change failure rate, lower mean time to repair (MTTR), and lower planned downtime.

Next to all that, there may be other benefits, such as increased employee satisfaction, less burnout and stress, and better employee retention. This is attributed to removing opposing goals between developers and operators.

For a while, there was doubt about whether DevOps works, whether these goals can be met, and whether the extra benefits can be achieved since this was only shown using case studies. The downside of this is that case studies are often only available for successful cases, not for unsuccessful ones. This all changed in 2018 when the book Accelerate came out. This book shows, based on years of quantitative research, that modern development practices such as DevOps contribute to reaching IT goals and organizational goals.

Measuring results

To measure where you currently stand as a team or organization and the impact DevOps has on you, there are several metrics that you can start recording. As always, when working with metrics or key performance indicators (KPIs), make sure that you do not encourage people to game the system by looking only at the numbers. Several interesting metrics are detailed in the following sections and if you go over them, you will notice that they are all about encouraging flow.

Cycle time and lead time

Cycle time and lead time are metrics that come from Lean and Kanban and are used to measure the time needed to realize a change. Cycle time is the amount of time between starting work on a feature and users being able to use that feature in production. The lower the cycle time, the quicker you can react to changing requirements or insights. Lead time is the amount of time between requesting a feature and realizing that feature. It is the time between adding work to the backlog and when you start implementing it.

When you add cycle time and lead time together, you are calculating another metric, known as the time to market. This is often an important business metric when developing software. Hence, minimizing both cycle time and lead time will have a business impact.

The amount of work in progress

Another thing you can measure is the amount of work in progress at any point in time. DevOps focuses on the flow of value to the user. This implies that everyone should, if possible, be doing only one thing at a time and finish that before moving on to something else. This reduces the amount of time spent on task switching and the amount of time spent on not yet complete work. Measuring how many things a team works on in parallel and reporting on this can act as a source of encouragement.

You can even go as far as putting actual limits on the amount of work that can be in progress. The following is a small part of Figure 1.2, showing that these work-in-progress limits can even be shown in the tool:

Figure 1.3 – Azure Boards depicting limits for each stage

Figure 1.3 – Azure Boards depicting limits for each stage

The goal is to have as little work in progress at the same time as possible.

Mean time to recovery

The third metric is the mean time to recovery. How long does it take you to restore a service in case of a (partial) outage? In the past, companies focused on reducing the mean time between failures. This used to be the mean indicator of the stability of a product. However, this metric encourages limiting the number of changes going to production. The unwanted consequence is often that outages, though they might be rare, last long and are hard to fix.

Measuring the mean time to recovery shifts the attention to how quickly you can remediate an outage. If you can fix outages quickly, you can achieve the same – namely, you can minimize the amount of downtime without sacrificing the rate of change. The goal is to minimize the time to recovery.

Change rate and change failure rate

Finally, you can measure the number of changes that are delivered to production and the percentage of that which is not successful. Increasing the rate of change implies that you are delivering value to your users more often, hence realizing a flow of value. Also, by measuring not just the number of failures but also the percentage that fails, you are encouraging many small, successful changes instead of encouraging whether the number of changes is limited overall.

Your goal should be to increase the rate of change while lowering the change failure rate. Apart from the four major KPIs listed in this section, many other metrics may be useful in measuring your DevOps maturity. All these metrics must be linked back to the important business objectives and key results (OKRs) that are expected. You can find more information about OKRs here: https://docs.microsoft.com/en-us/azure/cloud-adoption-framework/strategy/business-outcomes/okr.

A representative sample, for illustration purposes, is depicted in the following table:

Objective

Key Results

Faster time to market

  • Deployment Frequency: Every week
  • Deployment Time <= 4 hours
  • Lead Time (Major Releases): Once every quarter

Increase the business value that’s been realized while maintaining or reducing costs

  • CI/CD processes: 100% automated
  • Resource Utilization (95th percentile): 80%
  • Dashboards for monitoring both Health and Costs

Predictable and quality delivery and faster correction with fewer defects

  • High Availability > 99.9%
  • RTO < 1 hour, RPO < 15 mins

Better processes across IT, automation, teamwork, and culture

  • MTTR < 1 hour
  • Lead Time (Bugs) < 8 hour
  • Scaled Agile: Feature Teams > 5
  • Technical Debt < 1 week

Improved customer engagement and ability to quickly respond to market demands

  • CSAT: 4 or above
  • Product Planning: 50% of the backlog focuses on Customer Feedback

Table 1.1 – Using the OKR approach for your DevOps maturity

At this point, you might be wondering, how do I help my organization foster this culture and reap all of these benefits? The next section will answer this.