Book Image

Mastering Distributed Tracing

By : Yuri Shkuro
Book Image

Mastering Distributed Tracing

By: Yuri Shkuro

Overview of this book

Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems.
Table of Contents (21 chapters)
Mastering Distributed Tracing
Contributors
Preface
Other Books You May Enjoy
Leave a review - let other readers know what you think
15
Afterword
Index

Why this book?


When I began studying distributed tracing after joining Uber, there was not a lot of information out there. The Dapper paper gave the foundational overview and the technical report by Raja Sambasivan and others [13] provided a very useful historical background. But there was little in the way of a recipe book that would answer more practical questions, such as:

  • Where do I start with tracing in a large organization?

  • How do I drive adoption of tracing instrumentation across existing systems?

  • How does the instrumentation even work? What are the basics? What are the recommended patterns?

  • How do I get the most benefit and return on investment from tracing?

  • What do I do with all that tracing data?

  • How do I operate a tracing backend in real production and not in a toy application?

In the early 2018, I realized that I had pretty good answers to these questions, while most people who were just starting to look into tracing still didn't, and no comprehensive guide has been published anywhere. Even the basic instrumentation steps are often confusing to people if they do not understand the underlying concepts, as evidenced by the many questions posted in the Jaeger and OpenTracing chat rooms.

When I gave the OpenTracing tutorial at the Velocity NYC conference in 2017, I created a GitHub repository that contained step-by-step walkthroughs for instrumentation, from a basic "Hello, World!" program to a small microservices-based application. The tutorials were repeated in several programming languages (I originally created ones for Java, Go, and Python, and later other people created more, for Node.js and C#). I have seen time and again how these most simple tutorials help people to learn the ropes:

Figure 1.9: Feedback about a tutorial

So, I was thinking, maybe I should write a book that would cover not just the instrumentation tutorials, but give a comprehensive overview of the field, from its history and fundamentals to practical advice about where to start and how to get the most benefits from tracing. To my surprise, Andrew Waldron from Packt Publishing reached out to me offering to do exactly that. The rest is history, or rather, this book.

One aspect that made me reluctant to start writing was the fact that the boom of microservices and serverless created a big gap in the observability solutions that can address the challenges posed by these architectural styles, and tracing is receiving a lot of renewed interest, even though the basic idea of distributed tracing systems is not new. Accordingly, there are a lot of changes happening in this area, and there was a risk that anything I wrote would quickly become obsolete. It is possible that in the future, OpenTracing might be replaced by some more advanced API. However, the thought that made me push through was that this book is not about OpenTracing or Jaeger. I use them as examples because they are the projects that are most familiar to me. The ideas and concepts introduced throughout the book are not tied to these projects. If you decide to instrument your applications with Zipkin's Brave library, or with OpenCensus, or even with some vendor's proprietary API, the fundamentals of instrumentation and distributed tracing mechanics are going to be the same, and the advice I give in the later chapters about practical applications and the adoption of tracing will still apply equally.