Book Image

AWS Observability Handbook

By : Phani Kumar Lingamallu, Fabio Braga de Oliveira
Book Image

AWS Observability Handbook

By: Phani Kumar Lingamallu, Fabio Braga de Oliveira

Overview of this book

As modern application architecture grows increasingly complex, identifying potential points of failure and measuring end user satisfaction, in addition to monitoring application availability, is key. This book helps you explore AWS observability tools that provide end-to-end visibility, enabling quick identification of performance bottlenecks in distributed applications. You’ll gain a holistic view of monitoring and observability on AWS, starting from observability basics using Amazon CloudWatch and AWS X-Ray to advanced ML-powered tools such as AWS DevOps Guru. As you progress, you'll learn about AWS-managed open source services such as AWS Distro for OpenTelemetry (ADOT) and AWS managed Prometheus, Grafana, and the ELK Stack. You’ll implement observability in EC2 instances, containers, Kubernetes, and serverless apps and grasp UX monitoring. With a fair mix of concepts and examples, this book helps you gain hands-on experience in implementing end-to-end AWS observability in your applications and navigating and troubleshooting performance issues with the help of use cases. You'll also learn best practices and guidelines, such as how observability relates to the Well-Architected Framework. By the end of this AWS book, you’ll be able to implement observability and monitoring in your apps using AWS’ native and managed open source tools in real-world scenarios.
Table of Contents (22 chapters)
1
Part 1: Getting Started with Observability on AWS
6
Part 2: Automated and Machine Learning-Powered Observability on AWS
11
Part 3: Open Source Managed Services on AWS
15
Part 4: Scaled Observability and Beyond

Benefits of observability

Adopting observability to analyze system performance used to be the job of sysadmins and ops teams, who cared most about the mean time to detect (MTTD) and mean time to resolve (MTTR). Today, more job roles than ever need to use observability data. With the rise of DevOps, CI/CD, and Agile methods, developers are often directly responsible for the performance of their apps in production. SREs and DevOps staff care about meeting service-level indicators (SLIs) and service-level objectives (SLOs). Information about systems and workloads is also used by business leaders in making decisions about capacity, spending, risk, and end user experience. Each stakeholder in an organization has different needs for what is monitored and how the resulting data is analyzed, reported, and displayed. Let’s try to understand the benefits of observability in the real world for different personas.

Understanding application health and performance to improve customer experience

The main observability goal is to know what is going on anywhere in your system to ensure the best possible experience for your end users. You want to detect problems quickly, investigate them efficiently, and remediate them as soon as possible to minimize downtime and other disruptions to your customers.

Improving developer productivity

Traditional debugging by analyzing logs or instrumenting breakpoints into code is tedious, repetitive, and time-consuming. It doesn’t scale well for production applications or those built using microservice or serverless architectures. To analyze performance across distributed applications, developers need to correlate metrics and traces to identify user impact from any source and to find broken or expensive code paths as quickly as possible. And they need to do all this without having to re-instrument their code when they want to add new observability tools to their kit.

Getting more insight with visualizations

Observability, especially at scale, can generate huge volumes of data that become difficult for humans to parse. Visualization tools help humans make sense of data by correlating observability data into intuitive graphic displays. However, having a bunch of graphs, charts, and more scattered across multiple tools and displays becomes a problem. It’s essential to centralize visual data into a single dashboard, giving you a unified view of your system’s critical information and performance.

Digital eperience monitoring

Digital Experience Monitoring (DEM) correlates infrastructure and operations metrics with business outcomes by focusing on the end user experience. It seeks to reduce the MTTR in the event of client-side performance issues by monitoring the client-side performance on web and mobile applications in real time. Resolution is assisted by the relevant debugging data such as error messages, stack traces, and user sessions to fix performance issues such as JavaScript errors, crashes, and latencies.

Controlling cost and planning capacity

A key advantage to operating in the cloud is that you can scale quickly to meet demand during peak load times. However, unplanned and uncontrolled growth can result in unexpected costs. Observability can help you find performance improvements, such as reducing the CPU footprint. Across a fleet of thousands or hundreds of thousands of instances, a slight percentage performance improvement in how much CPU an application uses can save millions of dollars. Similarly, by using observability to understand and predict your future capacity needs, you can take advantage of the cost savings available from reserve and spot pricing and avoid cost surprises.