Observability with Grafana

By : Rob Chapman, Peter Holmes

Observability with Grafana

By: Rob Chapman, Peter Holmes

Overview of this book

To overcome application monitoring and observability challenges, Grafana Labs offers a modern, highly scalable, cost-effective Loki, Grafana, Tempo, and Mimir (LGTM) stack along with Prometheus for the collection, visualization, and storage of telemetry data. Beginning with an overview of observability concepts, this book teaches you how to instrument code and monitor systems in practice using standard protocols and Grafana libraries. As you progress, you’ll create a free Grafana cloud instance and deploy a demo application to a Kubernetes cluster to delve into the implementation of the LGTM stack. You’ll learn how to connect Grafana Cloud to AWS, GCP, and Azure to collect infrastructure data, build interactive dashboards, make use of service level indicators and objectives to produce great alerts, and leverage the AI & ML capabilities to keep your systems healthy. You’ll also explore real user monitoring with Faro and performance monitoring with Pyroscope and k6. Advanced concepts like architecting a Grafana installation, using automation and infrastructure as code tools for DevOps processes, troubleshooting strategies, and best practices to avoid common pitfalls will also be covered. After reading this book, you’ll be able to use the Grafana stack to deliver amazing operational results for the systems your organization uses.

Preface

Who this book is for

What this book covers

To get the most out of this book

Download the example code files

Download a free PDF copy of this book

Part 1: Get Started with Grafana and Observability

Free Chapter

Chapter 1: Introducing Observability and the Grafana Stack

Observability in a nutshell

Telemetry types and technologies

Introducing the user personas of observers

Introducing the Grafana stack

Alternatives to the Grafana stack

Deploying the Grafana stack

Summary

Chapter 2: Instrumenting Applications and Infrastructure

Common log formats

Exploring metric types and best practices

Tracing protocols and best practices

Using libraries to instrument efficiently

Infrastructure data technologies

Summary

Chapter 3: Setting Up a Learning Environment with Demo Applications

Technical requirements

Introducing Grafana Cloud

Installing the prerequisite tools

Installing the OpenTelemetry Demo application

Exploring telemetry from the demo application

Troubleshooting your OpenTelemetry Demo installation

Summary

Part 2: Implement Telemetry in Grafana

Chapter 4: Looking at Logs with Grafana Loki

Technical requirements

Updating the OpenTelemetry demo application

Introducing Loki

Understanding LogQL

Exploring Loki’s architecture

Tips, tricks, and best practices

Summary

Chapter 5: Monitoring with Metrics Using Grafana Mimir and Prometheus

Technical requirements

Updating the OpenTelemetry demo application

Introducing PromQL

Exploring data collection and metric protocols

Understanding data storage architectures

Using exemplars in Grafana

Summary

Chapter 6: Tracing Technicalities with Grafana Tempo

Technical requirements

Updating the OpenTelemetry Demo application

Introducing Tempo and the TraceQL query language

Exploring tracing protocols

Understanding the Tempo architecture

Summary

Chapter 7: Interrogating Infrastructure with Kubernetes, AWS, GCP, and Azure

Technical requirements

Monitoring Kubernetes using Grafana

Visualizing AWS telemetry with Grafana Cloud

Monitoring GCP using Grafana

Monitoring Azure using Grafana

Best practices and approaches

Summary

Part 3: Grafana in Practice

Chapter 8: Displaying Data with Dashboards

Technical requirements

Creating your first dashboard

Developing your dashboard further

Using visualizations in Grafana

Developing a dashboard purpose

Advanced dashboard techniques

Managing and organizing dashboards

Case study – an overall system view

Summary

Chapter 9: Managing Incidents Using Alerts

Technical requirements

Being alerted versus being alarmed

Writing great alerts using SLIs and SLOs

Grafana Alerting

Grafana OnCall

Grafana Incident

Summary

Chapter 10: Automation with Infrastructure as Code

Technical requirements

Benefits of automating Grafana

Introducing the components of observability systems

Automating collection infrastructure with Helm or Ansible

Getting to grips with the Grafana API

Managing dashboards and alerts with Terraform or Ansible

Summary

Chapter 11: Architecting an Observability Platform

Architecting your observability platform

Developing a proof of concept

Setting the right access levels

Sending telemetry to other consumers

Summary

Part 4: Advanced Applications and Best Practices of Grafana

Chapter 12: Real User Monitoring with Grafana

Introducing RUM

Setting up Grafana Frontend Observability

Exploring Web Vitals

Pivoting from frontend to backend data

Enhancements and custom configurations

Summary

Chapter 13: Application Performance with Grafana Pyroscope and k6

Using Pyroscope for continuous profiling

Using k6 for load testing

Summary

Chapter 14: Supporting DevOps Processes with Observability

Introducing the DevOps life cycle

Using Grafana for fast feedback during the development life cycle

Using Grafana to monitor infrastructure and platforms

Summary

Chapter 15: Troubleshooting, Implementing Best Practices, and More with Grafana

Best practices and troubleshooting for data collection

Best practices and troubleshooting for the Grafana stack

Avoiding pitfalls of observability

Future trends in application monitoring

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Introducing the Grafana stack

Grafana was born in 2013 when a developer was looking for a new user interface to display metrics from Graphite. Initially forked from Kibana, the Grafana project was developed to make it easy to build quick, interactive dashboards that were valuable to organizations. In 2014, Grafana Labs was formed with the core value of building a sustainable business with a strong commitment to open source projects. From that foundation, Grafana has grown into a strong company supporting more than 1 million active installations. Grafana Labs is a huge contributor to open source projects, from their own tools to widely adopted technologies such as Prometheus, and recent initiatives with a lot of traction such as OpenTelemetry.

Grafana offers many tools, which we’ve grouped into the following categories:

The core Grafana stack: LGTM and the Grafana Agent
Grafana enterprise plugins
Incident response tools
Other Grafana tools

Let’s explore these tools in the following sections.

The core Grafana stack

The core Grafana stack consists of Mimir, Loki, Tempo, and Grafana; the acronym LGTM is often used to refer to this tech stack.

Mimir

Mimir is a Time Series Database (TSDB) for the storage of metric data. It uses low-cost object storage such as S3, GCS, or Azure Blob Storage. First announced for general availability in March 2022, Mimir is the newest of the four products we’ll discuss here, although it’s worth highlighting that Mimir initially forked from another project, Cortex, which was started in 2016. Parts of Cortex also form the core of Loki and Tempo.

Mimir is a fully Prometheus-compatible solution that addresses the common scalability problems encountered with storing and searching huge quantities of metric data. In 2021 Mimir was load tested to 1 billion active time series. An active time series is a metric with a value and unique labels that has reported a sample in the last 20 minutes. We will explore Mimir and Prometheus in much greater detail in Chapter 5.

Loki

Loki is a set of components that offer a full feature logging stack. Loki uses lower-cost object storage such as S3 or GCS, and only indexes label metadata. Loki entered general availability in November 2019.

Log aggregation tools typically use two data structures to store log data. An index that contains references to the location of the raw data paired with searchable metadata, and the raw data itself stored in a compressed form. Loki differs from a lot of other log aggregation tools by keeping the index data relatively small and scaling the search functionality by using horizontal scaling of the querying component. The process of selecting the best index fields is one we will cover in Chapter 4.

Tempo

Tempo is a storage backend for high-scale distributed trace telemetry, with the aim of sampling 100% of the read path. Like Loki and Mimir, it leverages lower-cost object storage such as S3, GCS, or Azure Blob Storage. Tempo went into general availability in June 2021.

When Tempo released 1.0, it was tested at a sustained ingestion of >2 million spans per second (about 350 MB per second). Tempo also offers the ability to generate metrics from spans as they are ingested; these metrics can be written to any backend that supports Prometheus remote write. Tempo is explored in detail in Chapter 6.

Grafana

Grafana has been a staple for fantastic visualization of data since 2014. It has targeted the ability to connect to a huge variety of data sources from TSDBs to relational databases and even other observability tools. Grafana has over 150 data source plugins available. Grafana has a huge community using it for many different purposes. This community supports over 6,000 dashboards, which means there is a starting place for most available technologies with minimal time to value.

Grafana Agent

Collecting telemetry from many places is one of the fundamental aspects of observability. Grafana Agent is a collection of tools for collecting logs, metrics, and traces. There are many other collection tools that Grafana integrates well with. Different collection tools offer different advantages and disadvantages, which is not a topic we will explore in this book. We will highlight other tools in the space later in this chapter and in Chapter 2 to give you a starting point for learning more about this topic. We will also briefly discuss architecting a collection infrastructure in Chapter 11.

The Grafana stack is a fantastic group of open source software for observability. The commitment of Grafana Labs to open source is supported by great enterprise plugins. Let’s explore them now.

Grafana Enterprise plugins

As part of their Cloud Pro, Cloud Advanced, and Enterprise license offerings, Grafana offers Enterprise plugins. These are part of any paid subscription to Grafana.

The Enterprise data source plugins allow organizations to read data from many other storage tools they may use, from software development tools such as GitLab and Azure DevOps to business intelligence tools such as Snowflake, Databricks, and Looker. Grafana also offers tools to read data from many other observability tools, which enables organizations to build comprehensive operational coverage while offering individual teams a choice of the tools they use.

Alongside the data source plugins, Grafana offers premium tools for logs, metrics, and traces. These include access policies and tokens for log data to secure sensitive information, in-depth health monitoring for the ingest and storage of cloud stacks, and management of tenants.

Grafana incident response and management

Grafana offers three products in the incident response and management (IRM) space:

At the foundation of IRM are alerting rules, which can notify via messaging apps, email, or Grafana OnCall
Grafana OnCall offers an on-call schedule management system that centralizes alert grouping and escalation routing
Finally, Grafana Incident offers a chatbot functionality that can set up necessary incident spaces, collect timelines for a post-incident review process, and even manage the incident directly from a messaging service

These tools are covered in more detail in Chapter 9. Now let’s take a look at some other important Grafana tools.

Other Grafana tools

Grafana Labs continues to be a leader in observability and has acquired several companies in this space to release new products that complement the tools we’ve already discussed. Let’s discuss some of these tools now.

Faro

Grafana Faro is a JavaScript agent that can be added to frontend web applications. The project allows for real user monitoring (RUM) by collecting telemetry from a browser. By adding RUM into an environment where backend applications and infrastructure are instrumented, observers gain the ability to traverse data from the full application stack. Faro supports the collection of the five core web vitals out of the box, as well as several other signals of interest. Faro entered general availability in November 2022. We cover Faro in more detail in Chapter 12.

k6

k6 is a load testing tool that provides both a packaged tool to run in your own infrastructure and a cloud Software as a Service (SaaS) offering. Load testing, especially as part of a CI/CD pipeline, really enables teams to see how their application will perform under load, and evaluate optimizations and refactoring. Paired with other detailed analysis tools such as Pyroscope, the level of visibility and accessibility to non-technical members of the team can be astounding. The project started back in 2016 and was acquired by Grafana Labs in June 2021. The goal of k6 is to make performance testing easy and repeatable. We’ll explore k6 in Chapter 13.

Pyroscope

Pyroscope is a recent acquisition of Grafana Labs, joining in March 2023. Pyroscope is a tool that enable teams to engage in the continuous profiling of system resource use by applications (CPU, memory, etc.). Pyroscope advertises that with a minimal overhead of ~2-5% of performance, they can collect samples as frequently as every 10 seconds. Phlare is a Grafana Labs project started in 2022, and the two projects have now merged. We discuss Pyroscope in more detail in Chapter 13.

Now that you know the different tools available from Grafana Labs, let’s look at some alternatives that are available.

Observability with Grafana

By : Rob Chapman, Peter Holmes

Observability with Grafana

By: Rob Chapman, Peter Holmes

Overview of this book

Related Content you might be interested in

Current Title:

Observability with Grafana

Cloud-Native Observability with OpenTelemetry

Learn Grafana 10.x

Implementing Enterprise Observability for Success

Introducing the Grafana stack

The core Grafana stack

Mimir

Loki

Tempo

Grafana

Grafana Agent

Grafana Enterprise plugins

Grafana incident response and management

Other Grafana tools

Faro

k6

Pyroscope