Book Image

Infrastructure Monitoring with Amazon CloudWatch

By : Ewere Diagboya
Book Image

Infrastructure Monitoring with Amazon CloudWatch

By: Ewere Diagboya

Overview of this book

CloudWatch is Amazon’s monitoring and observability service, designed to help those in the IT industry who are interested in optimizing resource utilization, visualizing operational health, and eventually increasing infrastructure performance. This book helps IT administrators, DevOps engineers, network engineers, and solutions architects to make optimum use of this cloud service for effective infrastructure productivity. You’ll start with a brief introduction to monitoring and Amazon CloudWatch and its core functionalities. Next, you’ll get to grips with CloudWatch features and their usability. Once the book has helped you develop your foundational knowledge of CloudWatch, you’ll be able to build your practical skills in monitoring and alerting various Amazon Web Services, such as EC2, EBS, RDS, ECS, EKS, DynamoDB, AWS Lambda, and ELB, with the help of real-world use cases. As you progress, you'll also learn how to use CloudWatch to detect anomalous behavior, set alarms, visualize logs and metrics, define automated actions, and rapidly troubleshoot issues. Finally, the book will take you through monitoring AWS billing and costs. By the end of this book, you'll be capable of making decisions that enhance your infrastructure performance and maintain it at its peak.
Table of Contents (16 chapters)
1
Section 1: Introduction to Monitoring and Amazon CloudWatch
5
Section 2: AWS Services and Amazon CloudWatch

Best methods used in monitoring

As was said in the introduction of this chapter, monitoring is usually an afterthought of application development and deployment, although it is a major process and is a major part of the Site Reliability Engineer (SRE) role. A major purpose of this role is to ensure that systems maintain high availability and reliability. One of the pillars of making sure a system is highly available and reliable is to ensure that there is proper monitoring and observability of the system. The SRE role goes beyond configuring monitoring tools; SREs bring a lot of automation into the work being done, meaning that some programming/scripting knowledge is needed to be a good SRE.

SREs are also involved in building and designing a process for how incidents, escalations, and downtimes are handled in the system. They are the ones that work with businesses and other departments to set service-level agreements (SLAs), service-level objectives (SLOs), and service-level indicators...