Book Image

Google Cloud for DevOps Engineers

By : Sandeep Madamanchi
Book Image

Google Cloud for DevOps Engineers

By: Sandeep Madamanchi

Overview of this book

DevOps is a set of practices that help remove barriers between developers and system administrators, and is implemented by Google through site reliability engineering (SRE). With the help of this book, you'll explore the evolution of DevOps and SRE, before delving into SRE technical practices such as SLA, SLO, SLI, and error budgets that are critical to building reliable software faster and balance new feature deployment with system reliability. You'll then explore SRE cultural practices such as incident management and being on-call, and learn the building blocks to form SRE teams. The second part of the book focuses on Google Cloud services to implement DevOps via continuous integration and continuous delivery (CI/CD). You'll learn how to add source code via Cloud Source Repositories, build code to create deployment artifacts via Cloud Build, and push it to Container Registry. Moving on, you'll understand the need for container orchestration via Kubernetes, comprehend Kubernetes essentials, apply via Google Kubernetes Engine (GKE), and secure the GKE cluster. Finally, you'll explore Cloud Operations to monitor, alert, debug, trace, and profile deployed applications. By the end of this SRE book, you'll be well-versed with the key concepts necessary for gaining Professional Cloud DevOps Engineer certification with the help of mock tests.
Table of Contents (17 chapters)
1
Section 1: Site Reliability Engineering – A Prescriptive Way to Implement DevOps
6
Section 2: Google Cloud Services to Implement DevOps via CI/CD
Appendix: Getting Ready for Professional Cloud DevOps Engineer Certification

Alerting

SLIs are quantitative measurements at a given point in time and SLOs use SLIs to reflect the reliability of the system. SLIs are captured or represented in the form of metrics. Monitoring systems monitor these metrics against a specific set of policies. These policies represent the target SLOs over a period and are referred to as alerting rules.

Alerting is the process of processing the alerting rules, which track the SLOs and notify or perform certain actions when the rules are violated. In other words, alerting allows the conversion of SLOs into actionable alerts on significant events. Alerts can then be sent to an external application or a ticketing system or a person.

Common scenarios for triggering alerts include (and are not limited to) the following:

  • The service or system is down.
  • SLOs or SLAs are not met.
  • Immediate human intervention is required to change something.

As discussed previously, SLOs represent an achievable target, and error...