Book Image

The DevOps 2.2 Toolkit

By : Viktor Farcic

Book Image

The DevOps 2.2 Toolkit

By: Viktor Farcic

Overview of this book

Building on The DevOps 2.0 Toolkit and The DevOps 2.1 Toolkit: Docker Swarm, Viktor Farcic brings his latest exploration of the Docker technology as he records his journey to explore two new programs, self-adaptive and self-healing systems within Docker. The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters is the latest book in Viktor Farcic’s series that helps you build a full DevOps Toolkit. This book in the series looks at Docker, the tool designed to make it easier in the creation and running of applications using containers. In this latest entry, Viktor combines theory with a hands-on approach to guide you through the process of creating self-adaptive and self-healing systems. Within this book, Viktor will cover a wide-range of emerging topics, including what exactly self-adaptive and self-healing systems are, how to choose a solution for metrics storage and query, the creation of cluster-wide alerts and what a successful self-sufficient system blueprint looks like. Work with Viktor and dive into the creation of self-adaptive and self-healing systems within Docker.

Preface

Free Chapter

Introduction to Self-Adapting and Self-Healing Systems

Introduction to Self-Adapting and Self-Healing Systems

What is a self-adaptive system?

What is a self-healing system?

Choosing a Solution for Metrics Storage and Query

Choosing a Solution for Metrics Storage and Query

Non-dimensional versus dimensional metrics

Deploying and Configuring Prometheus

Deploying and Configuring Prometheus

Deploying Prometheus stack

Designing a more dynamic monitoring solution

Deploying Docker Flow Monitor

Integrating Docker Flow Monitor with Docker Flow Proxy

Scraping Metrics

Scraping Metrics

Creating the cluster and deploying services

Deploying exporters

Exploring exporter metrics

Querying metrics

Updating service constraints

Using memory reservations and limits in Prometheus

Defining Cluster-Wide Alerts

Defining Cluster-Wide Alerts

Creating the cluster and deploying services

Creating alerts based on metrics

Defining multiple alerts for a service

Postponing alerts firing

Defining additional alert information through labels and annotations

Using shortcuts to define alerts

Alerting Humans

Alerting Humans

Creating the cluster and deploying services

Setting up Alertmanager

Using templates in Alertmanager configuration

Alerting the System

Alerting the System

The four quadrants of a dynamic and self-sufficient system

Self-Healing Applied to Services

Self-Healing Applied to Services

Creating the cluster and deploying services

Using Docker Swarm for self-healing services

Is it enough to have self-healing applied to services?

Self-Adaptation Applied to Services

Self-Adaptation Applied to Services

Choosing the tool for scaling

Creating the cluster and deploying services

Preparing the system for alerts

Creating a scaling pipeline

Preventing the scaling disaster

Notifying humans that scaling failed

Integrating Alertmanager with Jenkins

Painting the Big Picture – The Self-Sufficient System Thus Far

Painting the Big Picture – The Self-Sufficient System Thus Far

Developer's role in the system

Continuous deployment role in the system

Service configuration role in the system

Proxy role in the system

Metrics role in the system

Alerting role in the system

Scheduler role in the system

Cluster role in the system

Instrumenting Services

Instrumenting Services

Defining requirements behind service specific metrics

Differentiating services based on their types

Choosing instrumentation type

Creating the cluster and deploying services

Instrumenting services using counter

Instrumenting services using gauges

Instrumenting services using histograms and summaries

Self-Adaptation Applied to Instrumented Services

Self-Adaptation Applied to Instrumented Services

Setting up the objectives

Creating the cluster and deploying services

Scraping metrics from instrumented services

Querying metrics from instrumented services

Firing alerts based on instrumented metrics

Scaling services automatically

Sending error notifications to slack

Setting Up a Production Cluster

Setting Up a Production Cluster

Creating a Docker for AWS cluster

Deploying services

Securing services

Persisting state

Alternatives to CloudStor volume driver

Setting up centralized logging

Extending the capacity of the cluster

Self-Healing Applied to Infrastructure

Self-Healing Applied to Infrastructure

Automating cluster setup

Exploring fault tolerance

Self-Adaptation Applied to Infrastructure

Self-Adaptation Applied to Infrastructure

Creating a cluster

Scaling nodes manually

Creating scaling job

Scaling cluster nodes automatically

Rescheduling services after scaling nodes

Scaling nodes when replica state is pending

Blueprint of a Self-Sufficient System

Blueprint of a Self-Sufficient System

Infrastructure tasks

Logic matters, tools might vary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Self-Healing Applied to Services

The job of a system that self-heals services is to make sure that they are (almost) always running according to the design. Such a system needs to monitor the state of the cluster and continuously ensure that all the services are running the specified number of replicas. If one of them stops, the system should start a new one. If a whole node goes does, all the replicas that were running on that node should be scheduled to run across the healthy nodes. As long as the capacity of the cluster can host all the replicas, such a system should be able to maintain the defined specifications.

Having a system that self-heals services does not mean that it provides high-availability. If a replica stops being operational, the system will bring it back into the running state. However, there will be a (very) short period between a failure and until the system...