Book Image

The DevOps 2.2 Toolkit

By : Viktor Farcic

Book Image

The DevOps 2.2 Toolkit

By: Viktor Farcic

Overview of this book

Building on The DevOps 2.0 Toolkit and The DevOps 2.1 Toolkit: Docker Swarm, Viktor Farcic brings his latest exploration of the Docker technology as he records his journey to explore two new programs, self-adaptive and self-healing systems within Docker. The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters is the latest book in Viktor Farcic’s series that helps you build a full DevOps Toolkit. This book in the series looks at Docker, the tool designed to make it easier in the creation and running of applications using containers. In this latest entry, Viktor combines theory with a hands-on approach to guide you through the process of creating self-adaptive and self-healing systems. Within this book, Viktor will cover a wide-range of emerging topics, including what exactly self-adaptive and self-healing systems are, how to choose a solution for metrics storage and query, the creation of cluster-wide alerts and what a successful self-sufficient system blueprint looks like. Work with Viktor and dive into the creation of self-adaptive and self-healing systems within Docker.

Preface

Free Chapter

Introduction to Self-Adapting and Self-Healing Systems

Introduction to Self-Adapting and Self-Healing Systems

What is a self-adaptive system?

What is a self-healing system?

Choosing a Solution for Metrics Storage and Query

Choosing a Solution for Metrics Storage and Query

Non-dimensional versus dimensional metrics

Deploying and Configuring Prometheus

Deploying and Configuring Prometheus

Deploying Prometheus stack

Designing a more dynamic monitoring solution

Deploying Docker Flow Monitor

Integrating Docker Flow Monitor with Docker Flow Proxy

Scraping Metrics

Scraping Metrics

Creating the cluster and deploying services

Deploying exporters

Exploring exporter metrics

Querying metrics

Updating service constraints

Using memory reservations and limits in Prometheus

Defining Cluster-Wide Alerts

Defining Cluster-Wide Alerts

Creating the cluster and deploying services

Creating alerts based on metrics

Defining multiple alerts for a service

Postponing alerts firing

Defining additional alert information through labels and annotations

Using shortcuts to define alerts

Alerting Humans

Alerting Humans

Creating the cluster and deploying services

Setting up Alertmanager

Using templates in Alertmanager configuration

Alerting the System

Alerting the System

The four quadrants of a dynamic and self-sufficient system

Self-Healing Applied to Services

Self-Healing Applied to Services

Creating the cluster and deploying services

Using Docker Swarm for self-healing services

Is it enough to have self-healing applied to services?

Self-Adaptation Applied to Services

Self-Adaptation Applied to Services

Choosing the tool for scaling

Creating the cluster and deploying services

Preparing the system for alerts

Creating a scaling pipeline

Preventing the scaling disaster

Notifying humans that scaling failed

Integrating Alertmanager with Jenkins

Painting the Big Picture – The Self-Sufficient System Thus Far

Painting the Big Picture – The Self-Sufficient System Thus Far

Developer's role in the system

Continuous deployment role in the system

Service configuration role in the system

Proxy role in the system

Metrics role in the system

Alerting role in the system

Scheduler role in the system

Cluster role in the system

Instrumenting Services

Instrumenting Services

Defining requirements behind service specific metrics

Differentiating services based on their types

Choosing instrumentation type

Creating the cluster and deploying services

Instrumenting services using counter

Instrumenting services using gauges

Instrumenting services using histograms and summaries

Self-Adaptation Applied to Instrumented Services

Self-Adaptation Applied to Instrumented Services

Setting up the objectives

Creating the cluster and deploying services

Scraping metrics from instrumented services

Querying metrics from instrumented services

Firing alerts based on instrumented metrics

Scaling services automatically

Sending error notifications to slack

Setting Up a Production Cluster

Setting Up a Production Cluster

Creating a Docker for AWS cluster

Deploying services

Securing services

Persisting state

Alternatives to CloudStor volume driver

Setting up centralized logging

Extending the capacity of the cluster

Self-Healing Applied to Infrastructure

Self-Healing Applied to Infrastructure

Automating cluster setup

Exploring fault tolerance

Self-Adaptation Applied to Infrastructure

Self-Adaptation Applied to Infrastructure

Creating a cluster

Scaling nodes manually

Creating scaling job

Scaling cluster nodes automatically

Rescheduling services after scaling nodes

Scaling nodes when replica state is pending

Blueprint of a Self-Sufficient System

Blueprint of a Self-Sufficient System

Infrastructure tasks

Logic matters, tools might vary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Is it enough to have self-healing applied to services?

Self-healing applied to services is only the beginning. It is by no means enough. The system, as it is now, is far from being autonomous. At best, it can recuperate from a few types failures. If one replica of a service goes down, Swarm will do the right thing. Even a simultaneous failure of a few replicas should not be a cause for alarm. However, self-healing applied to services by itself does not contemplate many of the common circumstances.

Let us imagine that the sizing of a cluster is done in a way that around 80 percent of CPU and memory is utilized. Such a number, more or less, provides a good balance between having too many unused resources and under-provisioning our cluster. With greater resource utilization we are running a risk that even a failure of a single node would mean that there are no available resources...