Mastering Distributed Tracing

Mastering Distributed Tracing

By : Yuri Shkuro

Buy this Book

Mastering Distributed Tracing

By: Yuri Shkuro

Buy this Book

Overview of this book

Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems.

Mastering Distributed Tracing

Contributors

Preface

Other Books You May Enjoy

Leave a review - let other readers know what you think

Free Chapter

Why Distributed Tracing?

Microservices and cloud-native applications

What is observability?

The observability challenge of microservices

Traditional monitoring tools

Distributed tracing

My experience with tracing

Why this book?

Summary

References

Take Tracing for a HotROD Ride

Span tags versus logs

Identifying sources of latency

Resource usage attribution

Summary

References

Distributed Tracing Fundamentals

The idea

Request correlation

Anatomy of distributed tracing

Sampling

Preserving causality

Trace models

Clock skew adjustment

Trace analysis

Summary

References

Instrumentation Basics with OpenTracing

Prerequisites

OpenTracing

Exercise 1 – the Hello application

Exercise 2 – the first trace

Exercise 3 – tracing functions and passing context

Exercise 4 – tracing RPC requests

Exercise 5 – using baggage

Exercise 6 – auto-instrumentation

Exercise 7 – extra credit

Summary

References

Instrumentation of Asynchronous Applications

Prerequisites

The Tracing Talk chat application

Instrumenting with OpenTracing

Instrumenting asynchronous code

Summary

References

Tracing Standards and Ecosystem

Styles of instrumentation

Anatomy of tracing deployment and interoperability

Five shades of tracing

Know your audience

The ecosystem

Summary

References

Tracing with Service Meshes

Service meshes

Observability via a service mesh

Prerequisites

The Hello application

Distributed tracing with Istio

Using Istio to generate a service graph

Distributed context and routing

Summary

References

All About Sampling

Head-based consistent sampling

Tail-based consistent sampling

Partial sampling

Summary

References

Turning the Lights On

Tracing as a knowledge base

Summary

Distributed Context Propagation

Summary

Integration with Metrics and Logs

Three pillars of observability

Prerequisites

The Hello application

Integration with metrics

Integration with logs

Summary

References

Gathering Insights with Data Mining

Feature extraction

Components of a data mining pipeline

Feature extraction exercise

Summary

Implementing Tracing in Large Organizations

Why is it hard to deploy tracing instrumentation?

Reduce the barrier to adoption

Where to start

Building the culture

Tracing Quality Metrics

Troubleshooting guide

Don't be on the critical path

Summary

References

Under the Hood of a Distributed Tracing System

Why host your own?

Bet on emerging standards

Architecture and deployment modes

Monitoring and troubleshooting

Resiliency

Summary

References

Afterword

References

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Distributed tracing

As soon as we start building a distributed system, traditional monitoring tools begin struggling with providing observability for the whole system, because they were designed to observe a single component, such as a program, a server, or a network switch. The story of a single component may no doubt be very interesting, but it tells us very little about the story of a request that touches many of those components. We need to know what happens to that request in all of them, end-to-end, if we want to understand why a system is behaving pathologically. In other words, we first want a macro view.

At the same time, once we get that macro view and zoom in to a particular component that seems to be at fault for the failure or performance problems with our request, we want a micro view of what exactly happened to that request in that component. Most other tools cannot tell that to us either because they only observe what "generally" happens in the component as a whole, for example, how many requests per second it handles (metrics), what events occurred on a given thread (logs), or which threads are on and off CPU at a given point in time (profilers). They don't have the granularity or context to observe a specific request.

Distributed tracing takes a request-centric view. It captures the detailed execution of causally-related activities performed by the components of a distributed system as it processes a given request. In Chapter 3, Distributed Tracing Fundamentals, I will go into more detail on how exactly it works, but in a nutshell:

Tracing infrastructure attaches contextual metadata to each request and ensures that metadata is passed around during the request execution, even when one component communicates with another over a network.
At various trace points in the code, the instrumentation records events annotated with relevant information, such as the URL of an HTTP request or an SQL statement of a database query.
Recorded events are tagged with the contextual metadata and explicit causality references to prior events.

That deceptively simple technique allows the tracing infrastructure to reconstruct the whole path of the request, through the components of a distributed system, as a graph of events and causal edges between them, which we call a "trace." A trace allows us to reason about how the system was processing the request. Individual graphs can be aggregated and clustered to infer patterns of behaviors in the system. Traces can be displayed using various forms of visualizations, including Gantt charts (Figure 1.7) and graph representations (Figure 1.8), to give our visual cortex cues to finding the root cause of performance problems:

Figure 1.7: Jaeger UI view of a single request to the HotROD application, further discussed in chapter 2. In the bottom half, one of the spans (named GetDriver from service redis, with a warning icon) is expanded to show additional information, such as tags and span logs.

Figure 1.8: Jaeger UI view of two traces A and B being compared structurally in the graph form (best viewed in color). Light/dark green colors indicate services that were encountered more/only in trace B, and light/dark red colors indicate services encountered more/only in trace A.

By taking a request-centric view, tracing helps to illuminate different behaviors of the system. Of course, as Bryan Cantrill said in his KubeCon talk, just because we have tracing, it doesn't mean that we eliminated performance pathologies in our applications. We actually need to know how to use it to ask sophisticated questions that we now can ask with this powerful tool. Fortunately, distributed tracing is able to answer all the questions we posed in The observability challenge of microservices section.

Mastering Distributed Tracing

By : Yuri Shkuro

Mastering Distributed Tracing

By: Yuri Shkuro

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Distributed Tracing

Modern Distributed Tracing in .NET

Cloud-Native Observability with OpenTelemetry

Hands-On Enterprise Java Microservices with Eclipse MicroProfile

Distributed tracing