Mastering Distributed Tracing

Mastering Distributed Tracing

By : Yuri Shkuro

Buy this Book

Mastering Distributed Tracing

By: Yuri Shkuro

Buy this Book

Overview of this book

Mastering Distributed Tracing will equip you to operate and enhance your own tracing infrastructure. Through practical exercises and code examples, you will learn how end-to-end tracing can be used as a powerful application performance management and comprehension tool. The rise of Internet-scale companies, like Google and Amazon, ushered in a new era of distributed systems operating on thousands of nodes across multiple data centers. Microservices increased that complexity, often exponentially. It is harder to debug these systems, track down failures, detect bottlenecks, or even simply understand what is going on. Distributed tracing focuses on solving these problems for complex distributed systems. Today, tracing standards have developed and we have much faster systems, making instrumentation less intrusive and data more valuable. Yuri Shkuro, the creator of Jaeger, a popular open-source distributed tracing system, delivers end-to-end coverage of the field in Mastering Distributed Tracing. Review the history and theoretical foundations of tracing; solve the data gathering problem through code instrumentation, with open standards like OpenTracing, W3C Trace Context, and OpenCensus; and discuss the benefits and applications of a distributed tracing infrastructure for understanding, and profiling, complex systems.

Mastering Distributed Tracing

Contributors

Preface

Other Books You May Enjoy

Leave a review - let other readers know what you think

Free Chapter

Why Distributed Tracing?

Microservices and cloud-native applications

What is observability?

The observability challenge of microservices

Traditional monitoring tools

Distributed tracing

My experience with tracing

Why this book?

Summary

References

Take Tracing for a HotROD Ride

Span tags versus logs

Identifying sources of latency

Resource usage attribution

Summary

References

Distributed Tracing Fundamentals

The idea

Request correlation

Anatomy of distributed tracing

Sampling

Preserving causality

Trace models

Clock skew adjustment

Trace analysis

Summary

References

Instrumentation Basics with OpenTracing

Prerequisites

OpenTracing

Exercise 1 – the Hello application

Exercise 2 – the first trace

Exercise 3 – tracing functions and passing context

Exercise 4 – tracing RPC requests

Exercise 5 – using baggage

Exercise 6 – auto-instrumentation

Exercise 7 – extra credit

Summary

References

Instrumentation of Asynchronous Applications

Prerequisites

The Tracing Talk chat application

Instrumenting with OpenTracing

Instrumenting asynchronous code

Summary

References

Tracing Standards and Ecosystem

Styles of instrumentation

Anatomy of tracing deployment and interoperability

Five shades of tracing

Know your audience

The ecosystem

Summary

References

Tracing with Service Meshes

Service meshes

Observability via a service mesh

Prerequisites

The Hello application

Distributed tracing with Istio

Using Istio to generate a service graph

Distributed context and routing

Summary

References

All About Sampling

Head-based consistent sampling

Tail-based consistent sampling

Partial sampling

Summary

References

Turning the Lights On

Tracing as a knowledge base

Summary

Distributed Context Propagation

Summary

Integration with Metrics and Logs

Three pillars of observability

Prerequisites

The Hello application

Integration with metrics

Integration with logs

Summary

References

Gathering Insights with Data Mining

Feature extraction

Components of a data mining pipeline

Feature extraction exercise

Summary

Implementing Tracing in Large Organizations

Why is it hard to deploy tracing instrumentation?

Reduce the barrier to adoption

Where to start

Building the culture

Tracing Quality Metrics

Troubleshooting guide

Don't be on the critical path

Summary

References

Under the Hood of a Distributed Tracing System

Why host your own?

Bet on emerging standards

Architecture and deployment modes

Monitoring and troubleshooting

Resiliency

Summary

References

Afterword

References

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Microservices and cloud-native applications

In the last decade, we saw a significant shift in how modern, internet-scale applications are being built. Cloud computing (infrastructure as a service) and containerization technologies (popularized by Docker) enabled a new breed of distributed system designs commonly referred to as microservices (and their next incarnation, FaaS). Successful companies like Twitter and Netflix have been able to leverage them to build highly scalable, efficient, and reliable systems, and to deliver more features faster to their customers.

While there is no official definition of microservices, a certain consensus has evolved over time in the industry. Martin Fowler, the author of many books on software design, argues that microservices architectures exhibit the following common characteristics [1]:

Componentization via (micro)services: The componentization of functionality in a complex application is achieved via services, or microservices, that are independent processes communicating over a network. The microservices are designed to provide fine-grained interfaces and to be small in size, autonomously developed, and independently deployable.
Smart endpoints and dumb pipes: The communications between services utilize technology-agnostic protocols such as HTTP and REST, as opposed to smart mechanisms like the Enterprise Service Bus (ESB).
Organized around business capabilities: Products not projects: the services are organized around business functions ("user profile service" or "fulfillment service"), rather than technologies. The development process treats the services as continuously evolving products rather than projects that are considered to be completed once delivered.
Decentralized governance: Allows different microservices to be implemented using different technology stacks.
Decentralized data management: Manifests in the decisions for both the conceptual data models and the data storage technologies being made independently between services.
Infrastructure automation: The services are built, released, and deployed with automated processes, utilizing automated testing, continuous integration, and continuous deployment.
Design for failure: The services are always expected to tolerate failures of their dependencies and either retry the requests or gracefully degrade their own functionality.
Evolutionary design: Individual components of a microservices architecture are expected to evolve independently, without forcing upgrades on the components that depend on them.

Because of the large number of microservices involved in building modern applications, rapid provisioning, rapid deployment via decentralized continuous delivery, strict DevOps practices, and holistic service monitoring are necessary to effectively develop, maintain, and operate such applications. The infrastructure requirements imposed by the microservices architectures spawned a whole new area of development of infrastructure platforms and tools for managing these complex cloud-native applications. In 2015, the Cloud Native Computing Foundation (CNCF) was created as a vendor-neutral home for many emerging open source projects in this area, such as Kubernetes, Prometheus, Linkerd, and so on, with a mission to "make cloud-native computing ubiquitous."

"Cloud native technologies empower organizations to build and run scalable applications in modern, dynamic environments such as public, private, and hybrid clouds. Containers, service meshes, microservices, immutable infrastructure, and declarative APIs exemplify this approach.
These techniques enable loosely coupled systems that are resilient, manageable, and observable. Combined with robust automation, they allow engineers to make high-impact changes frequently and predictably with minimal toil."
-- Cloud Native Computing Foundation Charter [2]

At the time of writing, the list of graduated and incubating projects at CNCF [3] contained 20 projects (Figure 1.1). They all have a single common theme: providing a platform for efficient deployment and operation of cloud-native applications. The observability tools occupy an arguably disproportionate (20 percent) number of slots:

Prometheus: A monitoring and alerting platform
Fluentd: A logging data collection layer
OpenTracing: A vendor-neutral APIs and instrumentation for distributed tracing
Jaeger: A distributed tracing platform

CNCF sandbox projects, the third category not shown in Figure 1.1, include two more monitoring-related projects: OpenMetrics and Cortex. Why is observability in such high demand for cloud-native applications?

Figure 1.1: Graduated and incubating projects at CNCF as of January 2019. Project names and logos are registered trademarks of the Linux Foundation.

Mastering Distributed Tracing

By : Yuri Shkuro

Mastering Distributed Tracing

By: Yuri Shkuro

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Distributed Tracing

Modern Distributed Tracing in .NET

Cloud-Native Observability with OpenTelemetry

Hands-On Enterprise Java Microservices with Eclipse MicroProfile

Microservices and cloud-native applications