Book Image

Mastering Distributed Tracing

By : Yuri Shkuro
Book Image

Mastering Distributed Tracing

By: Yuri Shkuro

Overview of this book

Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems.
Table of Contents (21 chapters)
Mastering Distributed Tracing
Contributors
Preface
Other Books You May Enjoy
Leave a review - let other readers know what you think
15
Afterword
Index

Preface

Distributed tracing, also known as end-to-end tracing, while not a new idea, has recently began receiving a lot of attention as a must-have observability tool for complex distributed systems. Unlike most other tools that only monitor individual components of the architecture, like a process or a server, tracing plays a rather unique role by being able to observe end-to-end execution of individual requests, or transactions, following them across process and network boundaries. With the rise of such architectural patterns as microservices and functions-as-a-service (or FaaS, or serverless), distributed tracing is becoming the only practical way of managing the complexity of modern architectures.

The book you are about to read is based on my personal experiences of being a technical lead for the tracing team at Uber Technologies. During that time, I have seen the engineering organization grow from a few hundred to several thousand engineers, and the complexity of Uber's microservices-based architecture increasing from a few hundred microservices when we first rolled out Jaeger, our distributed tracing platform, to several thousand microservices we have today. As most practitioners of distributed tracing would tell you, building a tracing system is "the easy part"; getting it widely adopted in a large organization is a completely different challenge altogether, one that unfortunately does not have easy-to-follow recipes. This book is my attempt to provide an end-to-end overview of the problem space, including the history and theoretical underpinning of the technology, the ways to address instrumentation and organizational adoption challenges, the standards emerging in the industry for instrumentation and data formats, and practical suggestions for deploying and operating a tracing infrastructure in real world scenarios.

The book is not intended as a reference material or a tutorial for any particular technology. Instead, I want you to gain an understanding of the underlying principles and trade-offs of distributed tracing and its applications. Equipped with these fundamentals, you should be able to navigate this fairly complex area of technology and find effective ways to apply it to your own use cases and your systems.

Who this book is for

It is my hope that this book may be useful to a wide range of audiences, from beginners who know very little about distributed tracing to active practitioners who are looking to expand their knowledge and find ways to extract more value from their tracing platforms. Different parts of the book may be of interest to these groups of readers:

  • Application developers, SREs, and DevOps, who are the end users of distributed tracing. This group is generally less interested in how tracing infrastructure and instrumentation work; they are more interested in what the technology can do for their day-to-day work. The book provides many examples of the benefits of distributed tracing, from the simplest use cases of "let's look at one trace and see what performance problems it can help us discover" to advanced data mining scenarios of "how do we process that vast amounts of tracing data we are collecting and gain insights into the behaviors of our distributed system that cannot be inferred from individual transactions."

  • Framework and infrastructure developers, who are building libraries and tools for other developers and want to make those tools observable through integration with distributed tracing. This group would benefit from the thorough review of the instrumentation techniques and patterns, and the discussion of the emerging standards for tracing.

  • Engineering managers and executives, who have the "power of the purse" and need to understand and be convinced of the value that tracing provides to an organization.

  • Finally, the tracing teams, that is, engineers tasked with building, deploying, and operating tracing infrastructure in an organization. This group must deal with many challenges, both technical and organizational, if it wants to scale its technology and its own efforts to amplify the impact of tracing on the organization at large.

What this book covers

Part I, Introduction, provides a general introduction to the area of distributed tracing.

Chapter 1, Why Distributed Tracing, frames the observability problem that distributed tracing aims to solve and explains why other monitoring tools fall short when it comes to troubleshooting pathological behavior in complex distributed systems. The chapter includes a brief history of my personal experience with tracing and an explanation of why I felt that writing this book would be a useful contribution to the industry.

Chapter 2, Take Tracing for a HotROD Ride, dives in with an easy to run, hands-on example used to illustrate the core features, benefits, and capabilities of distributed tracing, using Jaeger, an open source tracing platform, the OpenTracing instrumentation, and a demo application HotROD (Rides on Demand).

Chapter 3, Distributed Tracing Fundamentals, reviews the basic operating principles of end-to-end tracing, such as causality tracking and metadata propagation, and various design decisions historically taken by different implementations that affect the types of problems a given tracing architecture is able to solve. It introduces the reader to two different tracing models, the more expressive event model, and more popular span model.

Part II, Data Gathering Problem, is dedicated to discussions about the different ways of getting tracing data out of the applications, through manual and automatic (agent-based) instrumentation, for both RPC-style and asynchronous (for example, using message queues) applications.

Chapter 4, Instrumentation Basics with OpenTracing, provides a step-by-step guide of manually instrumenting a simple "hello, world" style application for tracing, as it is being evolved from a monolith to a microservices-based system. Three parallel sets of examples are provided in popular programming languages: Go, Java, and Python. The chapter teaches the fundamentals of tracing instrumentation using the OpenTracing API; however, the general patterns are applicable to other instrumentation APIs as well. In the final exercises the chapter introduces automatic (agent-based) instrumentation style that requires little, if any, actual code changes in the application.

Chapter 5, Instrumentation of Asynchronous Applications, continues the lessons from chapter 4, and applies them to an "online chat" type of application built around asynchronous messaging using Apache Kafka.

Chapter 6, Tracing Standards and Ecosystem, explores the often confusing ecosystem of the tracing industry at large, including the emerging standards such as OpenTracing, W3C Trace Context, and OpenCensus. It provides a useful taxonomy of how to think about the different commercial and open source projects and their positions in relation to each other.

Chapter 7, Tracing with Service Mesh, uses the service mesh Istio, running on Kubernetes, to trace an application and compare the results with tracing an application that is natively instrumented for tracing via the OpenTracing API. It reviews the pros and cons of each approach.

Chapter 8, All About Sampling, explains why tracing platforms are often required to sample transactions and provides an in-depth review of different sampling techniques, from consistent head-based sampling strategies (probabilistic, rate limiting, adaptive, and so on) to the emerging favorite, tail-based sampling.

Part III, Getting Value from Tracing, talks about the different ways engineers and organization can benefit from adopting a distributed tracing solution.

Chapter 9, Turning the Lights On, gives examples of the core value proposition of tracing, covering features that are commonly available in most tracing solutions; such as service graphs; critical path analysis; performance analysis with trace patterns; latency histograms and exemplars; and the long-term profiling techniques.

Chapter 10, Distributed Context Propagation, steps back to discuss context propagation, a technology that underpins most existing tracing infrastructures. It covers Tracing Plane from Brown University, which implements a general-purpose, tool-agnostic framework for context propagation, or "baggage," and covers a number of useful techniques and tools for observability and chaos engineering that have been built on top of context propagation and tracing.

Chapter 11, Integration with Metrics and Logs, shows how all is not lost for traditional monitoring tools, and how combining them with tracing infrastructure gives them new capabilities and makes them more useful in microservices environments.

Chapter 12, Gathering Insights with Data Mining, begins with the basics of data mining and feature extraction from tracing data, followed by a practical example involving the Jaeger backend, Apache Kafka, Elasticsearch, Kibana, an Apache Flink data mining job, and a microservices simulator, microsim. It ends with a discussion of further evolution of data mining techniques, such as inferring and observing trends, and historical and ad hoc data analysis.

Part IV, Deploying and Operating Tracing Infrastructure, completes the book with an assortment of practical advice to the tracing teams about implementing and operating tracing platforms in large organizations.

Chapter 13, Implementing Tracing in Large Organizations, discusses how to overcome many technical and organizational challenges that often prevent wide adoption of distributed tracing in enterprises or full realization of its value.

Chapter 14, Under the Hood of a Distributed Tracing System, starts with a brief discussion of build versus buy considerations, then goes deep into many technical details of the architecture and deployment modes of a tracing platform, such as multi-tenancy, security, operation in multiple data centers, monitoring, and resiliency. The Jaeger project is used to illustrate many architectural decisions, yet overall the content is applicable to most tracing infrastructures.

To get the most out of this book

The book is intended for a wide range of audiences interested in solving the observability challenges in complex distributed systems. Some familiarity with the existing monitoring tools, such as metrics, is useful, but not required. Most code examples are written in Java, so a basic level of reading Java code is required.

The included exercises make heavy use of Docker and docker-compose to bring up various third-party dependencies, such as MySQL and Elasticsearch databases, Kafka and Zookeeper, and various observability tools like Jaeger, Kibana, Grafana, and Prometheus. A working installation of Docker is required to run most of the examples.

I strongly advise you to not only try running and playing with the provided examples, but also to try adopting them to your own applications and use cases. I have seen time and again how engineers find silly mistakes and inefficiencies simply by looking at a sample trace from their application. If is often surprising how much more visibility into the system behavior is provided by tracing. If this is your first time dealing with this technology, then instrumenting your own application, instead of running the provided abstract examples, is the most effective way to learn and appreciate tracing.

Download the example code files

You can download the example code files for this book from your account at http://www.packt.com. If you purchased this book elsewhere, you can visit http://www.packt.com/support and register to have the files emailed directly to you.

You can download the code files by following these steps:

  1. Log in or register at http://www.packt.com.

  2. Select the SUPPORT tab.

  3. Click on Code Downloads & Errata.

  4. Enter the name of the book in the Search box and follow the on-screen instructions.

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows

  • Zipeg / iZip / UnRarX for Mac

  • 7-Zip / PeaZip for Linux

The code bundle for the book is also hosted on GitHub at https://github.com/PacktPublishing/Mastering-Distributed-Tracing. In case there's an update to the code, it will be updated on the existing GitHub repository.

We also have other code bundles from our rich catalog of books and videos available at https://github.com/PacktPublishing/. Check them out!

Download the color images

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. You can download it here: https://www.packtpub.com/sites/default/files/downloads/9781788628464_ColorImages.pdf.

Conventions used

There are a number of text conventions used throughout this book.

CodeInText: Indicates code words in text, database table names, folder names, filenames, file extensions, pathnames, dummy URLs, user input, and Twitter handles. For example; "Mount the downloaded WebStorm-10*.dmg disk image file as another disk in your system."

A block of code is set as follows:

type SpanContext struct {
        traceID TraceID
        spanID  SpanID
        flags   byte
        baggage map[string]string
        debugID string
}

When we wish to draw your attention to a particular part of a code block, the relevant lines or items are set in bold:

type SpanContext struct {
        traceID TraceID
        spanID  SpanID
        flags   byte
        baggage map[string]string
        debugID string
}

Any command-line input or output is written as follows:

$ go run ./exercise1/hello.go
Listening on http://localhost:8080/

Bold: Indicates a new term, an important word, or words that you see on the screen, for example, in menus or dialog boxes, also appear in the text like this. For example: "Select System info from the Administration panel."

Note

Warnings or important notes appear like this.

Note

Tips and tricks appear like this.

Get in touch

Feedback from our readers is always welcome.

General feedback: If you have questions about any aspect of this book, mention the book title in the subject of your message and email us at .

Errata: Although we have taken every care to ensure the accuracy of our content, mistakes do happen. If you have found a mistake in this book we would be grateful if you would report this to us. Please visit, http://www.packt.com/submit-errata, selecting your book, clicking on the Errata Submission Form link, and entering the details.

Piracy: If you come across any illegal copies of our works in any form on the Internet, we would be grateful if you would provide us with the location address or website name. Please contact us at with a link to the material.

If you are interested in becoming an author: If there is a topic that you have expertise in and you are interested in either writing or contributing to a book, please visit http://authors.packtpub.com.

Reviews

Please leave a review. Once you have read and used this book, why not leave a review on the site that you purchased it from? Potential readers can then see and use your unbiased opinion to make purchase decisions, we at Packt can understand what you think about our products, and our authors can see your feedback on their book. Thank you!

For more information about Packt, please visit packt.com.