Book Image

The DevOps 2.2 Toolkit

By : Viktor Farcic
Book Image

The DevOps 2.2 Toolkit

By: Viktor Farcic

Overview of this book

Building on The DevOps 2.0 Toolkit and The DevOps 2.1 Toolkit: Docker Swarm, Viktor Farcic brings his latest exploration of the Docker technology as he records his journey to explore two new programs, self-adaptive and self-healing systems within Docker. The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters is the latest book in Viktor Farcic’s series that helps you build a full DevOps Toolkit. This book in the series looks at Docker, the tool designed to make it easier in the creation and running of applications using containers. In this latest entry, Viktor combines theory with a hands-on approach to guide you through the process of creating self-adaptive and self-healing systems. Within this book, Viktor will cover a wide-range of emerging topics, including what exactly self-adaptive and self-healing systems are, how to choose a solution for metrics storage and query, the creation of cluster-wide alerts and what a successful self-sufficient system blueprint looks like. Work with Viktor and dive into the creation of self-adaptive and self-healing systems within Docker.
Table of Contents (18 chapters)

What is a self-healing system?

A self-healing system needs to be adaptive. Without the capability to adapt to the changes in the environment, we cannot self-heal. While adaptation is more permanent or longer lasting, healing is a temporary action. Take a number of requests as an example. Let's imagine that it increased permanently because now we have more users or because the new design of the UI is so good that users are spending more using our frontend. As a result of such an increase, our system needs to adapt and permanently (or, at least, longer lastingly) increase the number of replicas of our services. That increase should match the minimum expected load. Maybe we run five replicas of our shopping cart, and that was enough in most circumstances but, since our number of users increased, the number of instances of the shopping cart needs to increase to, let's say, ten replicas. It does not need to be a fixed number. It can, for example, vary from seven (lowest expected load) to twelve (highest expected load).

Self-healing is a reaction to unexpected and has a temporary nature. Take us (humans) as an example. When a virus attacks us, our body reacts and fights it back. Once the virus is annihilated, the state of the emergency ceases and we go back to the normal state. It started with a virus entering and ended once it's removed. A side effect is that we might adapt during the process and permanently create a better immune system. We can apply the same logic to our clusters. We can create processes that will react to external threats and execute reactive measures. Some of those measures will be removed as soon as the threat is gone while others might result in permanent changes to our system.

Self-healing does not always work. Both us (humans) and software systems sometimes need external help. If all else fails, and we cannot self-heal ourselves and eliminate the problem internally, we might go to a doctor. Similarly, if a cluster cannot fix itself it should send a notification to an operator who will, hopefully, be able to fix the problem, write a post-mortem, and improve the system so that the next time the same problem occurs it can self-heal itself.

This need for an external help outlines an effective way to build a self-healing system. We cannot predict all the combinations that might occur in a system. However, what we can do is make sure that when unexpected happens, it is not unexpected for long. A good engineer will try to make himself obsolete. He will try to do the same action only once, and the only way to accomplish that is through an ever-increasing level of automated processes. Everything that is expected should be scripted and fall into self-adapting and self-healing processes executed by the system. We should react only when unexpected happens.