Book Image

The DevOps 2.2 Toolkit

By : Viktor Farcic
Book Image

The DevOps 2.2 Toolkit

By: Viktor Farcic

Overview of this book

Building on The DevOps 2.0 Toolkit and The DevOps 2.1 Toolkit: Docker Swarm, Viktor Farcic brings his latest exploration of the Docker technology as he records his journey to explore two new programs, self-adaptive and self-healing systems within Docker. The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters is the latest book in Viktor Farcic’s series that helps you build a full DevOps Toolkit. This book in the series looks at Docker, the tool designed to make it easier in the creation and running of applications using containers. In this latest entry, Viktor combines theory with a hands-on approach to guide you through the process of creating self-adaptive and self-healing systems. Within this book, Viktor will cover a wide-range of emerging topics, including what exactly self-adaptive and self-healing systems are, how to choose a solution for metrics storage and query, the creation of cluster-wide alerts and what a successful self-sufficient system blueprint looks like. Work with Viktor and dive into the creation of self-adaptive and self-healing systems within Docker.
Table of Contents (18 chapters)

Self-Healing Applied to Services

The job of a system that self-heals services is to make sure that they are (almost) always running according to the design. Such a system needs to monitor the state of the cluster and continuously ensure that all the services are running the specified number of replicas. If one of them stops, the system should start a new one. If a whole node goes does, all the replicas that were running on that node should be scheduled to run across the healthy nodes. As long as the capacity of the cluster can host all the replicas, such a system should be able to maintain the defined specifications.

Having a system that self-heals services does not mean that it provides high-availability. If a replica stops being operational, the system will bring it back into the running state. However, there will be a (very) short period between a failure and until the system...