Mastering Distributed Tracing

Book Image

Mastering Distributed Tracing

By : Yuri Shkuro

Book Image

Mastering Distributed Tracing

By: Yuri Shkuro

Overview of this book

Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems.

Mastering Distributed Tracing

Mastering Distributed Tracing

Contributors

Preface

Other Books You May Enjoy

Other Books You May Enjoy

Leave a review - let other readers know what you think

Leave a review - let other readers know what you think

Free Chapter

Why Distributed Tracing?

Why Distributed Tracing?

Microservices and cloud-native applications

What is observability?

The observability challenge of microservices

Traditional monitoring tools

Distributed tracing

My experience with tracing

Take Tracing for a HotROD Ride

Take Tracing for a HotROD Ride

Meet the HotROD

The architecture

Contextualized logs

Span tags versus logs

Identifying sources of latency

Resource usage attribution

Distributed Tracing Fundamentals

Distributed Tracing Fundamentals

Request correlation

Anatomy of distributed tracing

Preserving causality

Clock skew adjustment

Instrumentation Basics with OpenTracing

Instrumentation Basics with OpenTracing

Exercise 1 – the Hello application

Exercise 2 – the first trace

Exercise 3 – tracing functions and passing context

Exercise 4 – tracing RPC requests

Exercise 5 – using baggage

Exercise 6 – auto-instrumentation

Exercise 7 – extra credit

Instrumentation of Asynchronous Applications

Instrumentation of Asynchronous Applications

The Tracing Talk chat application

Instrumenting with OpenTracing

Instrumenting asynchronous code

Tracing Standards and Ecosystem

Tracing Standards and Ecosystem

Styles of instrumentation

Anatomy of tracing deployment and interoperability

Five shades of tracing

Know your audience

Tracing with Service Meshes

Tracing with Service Meshes

Observability via a service mesh

The Hello application

Distributed tracing with Istio

Using Istio to generate a service graph

Distributed context and routing

All About Sampling

All About Sampling

Head-based consistent sampling

Tail-based consistent sampling

Partial sampling

Turning the Lights On

Turning the Lights On

Tracing as a knowledge base

Performance analysis

Long-term profiling

Distributed Context Propagation

Distributed Context Propagation

Brown Tracing Plane

Chaos engineering

Traffic labeling

Integration with Metrics and Logs

Integration with Metrics and Logs

Three pillars of observability

The Hello application

Integration with metrics

Integration with logs

Gathering Insights with Data Mining

Gathering Insights with Data Mining

Feature extraction

Components of a data mining pipeline

Feature extraction exercise

The Span Count job

Observing trends

Historical analysis

Ad hoc analysis

Implementing Tracing in Large Organizations

Implementing Tracing in Large Organizations

Why is it hard to deploy tracing instrumentation?

Reduce the barrier to adoption

Building the culture

Tracing Quality Metrics

Troubleshooting guide

Don't be on the critical path

Under the Hood of a Distributed Tracing System

Under the Hood of a Distributed Tracing System

Why host your own?

Bet on emerging standards

Architecture and deployment modes

Monitoring and troubleshooting

Afterword

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Three pillars of observability

If you have followed the application monitoring and performance management space in the past few years, whether at conferences or in the news and tech blogs, you have probably heard the term "three pillars of observability" used to refer to metrics, logs, and distributed tracing. While some people have strong, very-amusing-to-read (strong language warning!), and partially justified objections [1], [2] to this term, we can look at these three areas as different approaches to recording events occurring in the applications. In the end, all of these signals are collected by instrumentation in the code, triggered by some events we deem worthy of recording.

Ideally, when we troubleshoot a performance problem, we would like to know as much as possible about what the application was doing at the time, by recording all possible events. The main challenge we face is the cost of collecting and reporting all that telemetry. The three "pillars" primarily differ in their...