Book Image

Practical Site Reliability Engineering

By : Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh
Book Image

Practical Site Reliability Engineering

By: Pethuru Raj Chelliah, Shreyash Naithani, Shailender Singh

Overview of this book

Site reliability engineering (SRE) is being touted as the most competent paradigm in establishing and ensuring next-generation high-quality software solutions. This book starts by introducing you to the SRE paradigm and covers the need for highly reliable IT platforms and infrastructures. As you make your way through the next set of chapters, you will learn to develop microservices using Spring Boot and make use of RESTful frameworks. You will also learn about GitHub for deployment, containerization, and Docker containers. Practical Site Reliability Engineering teaches you to set up and sustain containerized cloud environments, and also covers architectural and design patterns and reliability implementation techniques such as reactive programming, and languages such as Ballerina and Rust. In the concluding chapters, you will get well-versed with service mesh solutions such as Istio and Linkerd, and understand service resilience test practices, API gateways, and edge/fog computing. By the end of this book, you will have gained experience on working with SRE concepts and be able to deliver highly reliable apps and services.
Table of Contents (19 chapters)
Title Page
Dedication
About Packt
Contributors
Preface
10
Containers, Kubernetes, and Istio Monitoring
Index

Prometheus


Prometheus is an open source monitoring tool that was originally built by SoundCloud in 2012, inspired by Google's BrogMon. It is written in GoLang. According to the New Stack Survey of 2017, Prometheus is one of the most widely used tools for monitoring Kubernetes clusters. What makes Prometheus different than other open source monitoring systems is that it has a simple, text-based format, making it easy to get metrics from other systems. It also has a multidimensional data model and a rich and concise query language. Using Prometheus, we can monitor all levels, nodes, container-scheduling systems, and also routers and switches. If we are dealing with large applications and a fast-moving infrastructure, this means that the jobs that we run change rapidly and we have to deploy them around 100 times a day. In this case, Prometheus will be very useful, as it has the ability to discover services. If we have a dynamic infrastructure, we can use Prometheus to detect early failures...