A few years ago, when I was in the middle of a manual deployment at 4 A.M., I remember asking myself "there has to be a better way". Tools were not mature enough, and the majority of the companies did not consider IT the core of their business. Then, a change happened: DevOps tools started to do well in the open source community and companies started to create continuous delivery pipelines. Some of them were successful, but a big majority of them failed for two reasons:
- Release management process
- Failure in the organizational alignment
We will talk about organizational alignment later on in this chapter. For now, we are going to focus on the release management process as it needs to be completely different from the traditional release management in order to facilitate the software life cycle.
In the preceding section, we talked about different phases:
We also explained how it works well with gigantic software where we group features into big releases that get executed in a big bang style with all or nothing deployments.
The first try to fit this process into smaller software components was what everyone calls agile, but no one really knew what it was.
In the traditional release management, one of the big problems was the communication: chains of people passing on messages and information, as we've seen, never ends well.
Agile encourages shorter communication strings: the stakeholders are supposed to be involved in the software development management, from the definition of requirements to the verification (testing) of the same software. This has an enormous advantage: teams never build features that are not required. If deadlines need to be met, the engineering team sizes down the final product sacrificing functionality but not quality.
Deliver early and deliver often is the mantra of agile, which basically means defining an Minimum Viable Product (MVP) and delivering it as soon as it is ready in order to deliver value to the customers of your application and then delivering new features as required. With this method, we are delivering value since the first release and getting feedback very early on in the product life.
In order to articulate this way of working, a new concept was introduced: the sprint. A sprint is a period of time (usually 2 weeks) with a set of functionalities that are supposed to be delivered at the end of it into production so that we achieve different effects:
- Customers are able to get value very often
- Feedback reaches the development team every 2 weeks so that corrective actions can be carried on
- The team becomes predictable and savvy with estimates
This last point is very important: if our estimates are off by 10% in a quarter release, it means that we are off by two weeks, whereas in a two weeks sprint, we are off only by 1 day, which, over time, with the knowledge gained sprint after sprint, means the team will be able to adjust due to the fact that the team builds a database of features and time spent on them so that we are able to compare new features against the already developed ones.
These features aren't called features. They are called stories. A story is, by definition, a well-defined functionality with all the info for the development team captured before the sprint starts, so once we start the development of the sprint, developers can focus on technical activities instead of focusing on resolving unknowns in these features.
Not all the stories have the same size, so we need a measurement unit: the story points. Usually, story points do not relate to a time-frame but to the complexity of it. This allows the team to calculate how many story points can be delivered at the end of the sprint, so with time, they get better at the estimates and everybody gets their expectations satisfied.
At the end of every sprint, the team is supposed to release the features developed, tested, and integrated into production in order to move to the next sprint.
The content of the sprints are selected from a backlog that the team is also maintaining and preparing as they go.
The main goal is to meet everyone's expectations by keeping the communication open and be able to predict what is being delivered and when and what is needed for it.
There are several ways of implementing the agile methodologies in our software product. The one explained earlier is called Scrum, but if you look into other development methodologies, you'll see that they all focus on the same concept: improving the communication across different actors of the same team.
If you are interested in Scrum, there is more info at https://en.wikipedia.org/wiki/Scrum_(software_development).
As explained earlier, if we follow the Scrum methodology, we are supposed to deliver a new version every 2 weeks (the duration of a sprint in the majority of the cases), which has a dramatic impact on the resources consumed. Let's do the maths: quarter versus bi-weekly releases:
- In quarter releases, we release only four times a year in addition to emergency releases to fix problems found in production.
- In bi-weekly releases, we release once every 2 weeks in addition to emergency releases. This means 26 releases a year (52 weeks roughly) in addition to emergency releases.
For the sake of simplicity, let's ignore the emergency releases and focus on business as usual in our application. Let's assume this takes us 10 hours to prepare and release our software:
- Quarter releases: 10 x 4 = 40 hours a year
- Bi-weekly releases: 10 x 26 = 260 hours a year
As of now, releasing software is always the same activity, no matter whether we do it every quarter or every day. The implication is the same (roughly), so we have a big problem: our bi-weekly release is consuming a lot of time and it gets worse if we need to release fixes for problems that have been overlooked in QA.
There is only one solution for this: automation. As mentioned earlier, up until 2 years ago (around 2015) the tools to orchestrate automatic deployments weren't mature enough. Bash scripts were common but weren't ideal as bash is not designed to alter the state of production servers.
The first few tools to automate deployments were frameworks to manage the state of servers: Capistrano or Fabric wrapped
ssh access and state management in a set of commands on Ruby and Python, which allowed the developers to create scripts that, depending on the state of the servers, were executing different steps to achieve a goal: deploying a new version.
These frameworks were a good step forward, but there were bigger problems with them: a solution across different companies usually solves the same problem in different ways, which implies that DevOps (developers + ops) engineers need to learn how to handle this in every single company.
The real change came with Docker and orchestration platforms, such as Kubernetes or Docker Swarm. In this book, we will look at how to use them, particularly Kubernetes, to reduce the deployment time from 10 hours (or hours in general) to a simple click, so our 260 hours a year become a few minutes for every release.
This also has a side-effect, which is related to what we explained earlier in this chapter: from a very risky release (remember, 85.38% of success) with a lot of stress, we are moving toward a release that can be patched in minutes, so releasing a bug, even though it is bad, has a reduced impact due to the fact that we can fix it within minutes or even roll back within seconds. We will look at how to do this in Chapter 8, Release Management – Continuous Delivery.
Once we are aligned with these practices, we can even release individual items to production: once a feature is ready, if the deployment is automated and it gets reduced to a single click, why not just roll out the stories as they are completed?