Book Image

Modern Distributed Tracing in .NET

By : Liudmila Molkova
Book Image

Modern Distributed Tracing in .NET

By: Liudmila Molkova

Overview of this book

As distributed systems become more complex and dynamic, their observability needs to grow to aid the development of holistic solutions for performance or usage analysis and debugging. Distributed tracing brings structure, correlation, causation, and consistency to your telemetry, thus allowing you to answer arbitrary questions about your system and creating a foundation for observability vendors to build visualizations and analytics. Modern Distributed Tracing in .NET is your comprehensive guide to observability that focuses on tracing and performance analysis using a combination of telemetry signals and diagnostic tools. You'll begin by learning how to instrument your apps automatically as well as manually in a vendor-neutral way. Next, you’ll explore how to produce useful traces and metrics for typical cloud patterns and get insights into your system and investigate functional, configurational, and performance issues. The book is filled with instrumentation examples that help you grasp how to enrich auto-generated telemetry or produce your own to get the level of detail your system needs, along with controlling your costs with sampling, aggregation, and verbosity. By the end of this book, you'll be ready to adopt and leverage tracing and other observability signals and tools and tailor them to your needs as your system evolves.
Table of Contents (23 chapters)
1
Part 1: Introducing Distributed Tracing
6
Part 2: Instrumenting .NET Applications
11
Part 3: Observability for Common Cloud Scenarios
16
Part 4: Implementing Distributed Tracing in Your Organization

Preface

If you have worked on distributed applications, infrastructure, or client libraries, you’ve likely encountered numerous ways in which distributed systems can break.

For example, a default retry policy on your service can bring it down along with all its dependencies. Race conditions can lead to deadlocks under certain load or result in data leak between user accounts. User operations that usually take milliseconds can significantly slow down, while service dashboards show no signs of other issues. Functional problems can cause obscure and inexplicable effects on the user’s end.

When working with distributed applications, we rely on telemetry to assess their performance and functionality. We need even more telemetry to identify and mitigate issues.

In the past, we relied on custom logs and metrics collected using vendor-specific SDKs. We built custom parsers, processing pipelines, and reporting tools to make telemetry usable.

However, as applications have become more complex, we require better and more user-friendly approaches to understand what is happening in our systems. Personally, I find it unproductive to read through megabytes of logs or visually detect anomalies in metrics.

Distributed tracing is a technique that allows us to trace operations throughout the entire system. It provides correlation and causation to our telemetry, enabling us to retrieve all the relevant data describing a specific operation or find all operations based on the context, such as a requested resource or a user identifier.

Distributed tracing alone is not enough; we need other telemetry signals such as metrics, events, logs, and profiles, as well as libraries to collect and export them to observability backends. Fortunately, we have OpenTelemetry for this purpose. OpenTelemetry is a cloud-native, vendor-neutral telemetry platform available in multiple programming languages. It offers the core components necessary to collect custom data along with instrumentation libraries for common technologies. OpenTelemetry standardizes telemetry formats for different signals ensuring correlation, consistency, and structure in the collected data.

By leveraging consistent and structured telemetry, different observability vendors can provide tools such as service maps, trace visualizations, error classification, and detection of common properties contributing to failures. This essentially allows us to automate the error-prone and tedious parts of performance analysis that humans struggle with. Monitoring and debugging techniques can now become standardized practices across the industry, no longer relying on tribal knowledge, runbooks, or outdated documentation.

Modern Distributed Tracing in .NET explores all aspects of telemetry collection in .NET applications, with a focus on distributed tracing and performance analysis. It begins with an overview of the observability challenges and solutions and then delves into the built-in monitoring capabilities offered by modern .NET applications. These capabilities become even more impressive when used alongside OpenTelemetry. While shared OpenTelemetry instrumentation libraries can take us a long way, sometimes we still need to write custom instrumentations. The book shows how to collect custom traces, metrics, and logs while considering performance impact and verbosity. It also covers the instrumentation of common cloud patterns such as network calls, messaging, and database interactions. Finally, it discusses the organizational and technical aspects of implementing and evolving observability in existing systems.

The observability field is still relatively new and rapidly evolving, which means there are often multiple solutions available for almost any problem. This book aims to explain fundamental observability concepts and provides several possible solutions to common problems while highlighting the associated trade-offs. It also helps you gain practical skills to implement and leverage tracing and observability.

I hope you find the provided examples useful and use them as a playground for experimentation. I encourage you to explore new and creative approaches to making distributed systems more observable and to share your findings with the community!