Book Image

The DevOps 2.2 Toolkit

By : Viktor Farcic

Book Image

The DevOps 2.2 Toolkit

By: Viktor Farcic

Overview of this book

Building on The DevOps 2.0 Toolkit and The DevOps 2.1 Toolkit: Docker Swarm, Viktor Farcic brings his latest exploration of the Docker technology as he records his journey to explore two new programs, self-adaptive and self-healing systems within Docker. The DevOps 2.2 Toolkit: Self-Sufficient Docker Clusters is the latest book in Viktor Farcic’s series that helps you build a full DevOps Toolkit. This book in the series looks at Docker, the tool designed to make it easier in the creation and running of applications using containers. In this latest entry, Viktor combines theory with a hands-on approach to guide you through the process of creating self-adaptive and self-healing systems. Within this book, Viktor will cover a wide-range of emerging topics, including what exactly self-adaptive and self-healing systems are, how to choose a solution for metrics storage and query, the creation of cluster-wide alerts and what a successful self-sufficient system blueprint looks like. Work with Viktor and dive into the creation of self-adaptive and self-healing systems within Docker.

Preface

Free Chapter

Introduction to Self-Adapting and Self-Healing Systems

Introduction to Self-Adapting and Self-Healing Systems

What is a self-adaptive system?

What is a self-healing system?

Choosing a Solution for Metrics Storage and Query

Choosing a Solution for Metrics Storage and Query

Non-dimensional versus dimensional metrics

Deploying and Configuring Prometheus

Deploying and Configuring Prometheus

Deploying Prometheus stack

Designing a more dynamic monitoring solution

Deploying Docker Flow Monitor

Integrating Docker Flow Monitor with Docker Flow Proxy

Scraping Metrics

Scraping Metrics

Creating the cluster and deploying services

Deploying exporters

Exploring exporter metrics

Querying metrics

Updating service constraints

Using memory reservations and limits in Prometheus

Defining Cluster-Wide Alerts

Defining Cluster-Wide Alerts

Creating the cluster and deploying services

Creating alerts based on metrics

Defining multiple alerts for a service

Postponing alerts firing

Defining additional alert information through labels and annotations

Using shortcuts to define alerts

Alerting Humans

Alerting Humans

Creating the cluster and deploying services

Setting up Alertmanager

Using templates in Alertmanager configuration

Alerting the System

Alerting the System

The four quadrants of a dynamic and self-sufficient system

Self-Healing Applied to Services

Self-Healing Applied to Services

Creating the cluster and deploying services

Using Docker Swarm for self-healing services

Is it enough to have self-healing applied to services?

Self-Adaptation Applied to Services

Self-Adaptation Applied to Services

Choosing the tool for scaling

Creating the cluster and deploying services

Preparing the system for alerts

Creating a scaling pipeline

Preventing the scaling disaster

Notifying humans that scaling failed

Integrating Alertmanager with Jenkins

Painting the Big Picture – The Self-Sufficient System Thus Far

Painting the Big Picture – The Self-Sufficient System Thus Far

Developer's role in the system

Continuous deployment role in the system

Service configuration role in the system

Proxy role in the system

Metrics role in the system

Alerting role in the system

Scheduler role in the system

Cluster role in the system

Instrumenting Services

Instrumenting Services

Defining requirements behind service specific metrics

Differentiating services based on their types

Choosing instrumentation type

Creating the cluster and deploying services

Instrumenting services using counter

Instrumenting services using gauges

Instrumenting services using histograms and summaries

Self-Adaptation Applied to Instrumented Services

Self-Adaptation Applied to Instrumented Services

Setting up the objectives

Creating the cluster and deploying services

Scraping metrics from instrumented services

Querying metrics from instrumented services

Firing alerts based on instrumented metrics

Scaling services automatically

Sending error notifications to slack

Setting Up a Production Cluster

Setting Up a Production Cluster

Creating a Docker for AWS cluster

Deploying services

Securing services

Persisting state

Alternatives to CloudStor volume driver

Setting up centralized logging

Extending the capacity of the cluster

Self-Healing Applied to Infrastructure

Self-Healing Applied to Infrastructure

Automating cluster setup

Exploring fault tolerance

Self-Adaptation Applied to Infrastructure

Self-Adaptation Applied to Infrastructure

Creating a cluster

Scaling nodes manually

Creating scaling job

Scaling cluster nodes automatically

Rescheduling services after scaling nodes

Scaling nodes when replica state is pending

Blueprint of a Self-Sufficient System

Blueprint of a Self-Sufficient System

Infrastructure tasks

Logic matters, tools might vary

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Preventing the scaling disaster

On the first look, the script we created works correctly. Doesn't it?. I've seen similar scripts in other places, and there is only one thing I have to say. Do not run this pipeline in production!!! It is too dangerous. It can easily crash your entire cluster or make your service disappear. Can you guess why?

Let us imagine the following situation. Prometheus detects that certain threshold is reached (for example, memory utilization, response time, and so on) and send a notification to Alertmanager. It sends a build request to Jenkins which, in turn, scales the service by increasing the number of replicas by one. So far, so good.

What happens if scaling does not resolve the problem? What if the threshold reached in Prometheus persists? After a while, the process will be repeated, and the service will be scaled up one more time. That might...