Book Image

Modern Distributed Tracing in .NET

By : Liudmila Molkova
Book Image

Modern Distributed Tracing in .NET

By: Liudmila Molkova

Overview of this book

As distributed systems become more complex and dynamic, their observability needs to grow to aid the development of holistic solutions for performance or usage analysis and debugging. Distributed tracing brings structure, correlation, causation, and consistency to your telemetry, thus allowing you to answer arbitrary questions about your system and creating a foundation for observability vendors to build visualizations and analytics. Modern Distributed Tracing in .NET is your comprehensive guide to observability that focuses on tracing and performance analysis using a combination of telemetry signals and diagnostic tools. You'll begin by learning how to instrument your apps automatically as well as manually in a vendor-neutral way. Next, you’ll explore how to produce useful traces and metrics for typical cloud patterns and get insights into your system and investigate functional, configurational, and performance issues. The book is filled with instrumentation examples that help you grasp how to enrich auto-generated telemetry or produce your own to get the level of detail your system needs, along with controlling your costs with sampling, aggregation, and verbosity. By the end of this book, you'll be ready to adopt and leverage tracing and other observability signals and tools and tailor them to your needs as your system evolves.
Table of Contents (23 chapters)
1
Part 1: Introducing Distributed Tracing
6
Part 2: Instrumenting .NET Applications
11
Part 3: Observability for Common Cloud Scenarios
16
Part 4: Implementing Distributed Tracing in Your Organization

Chapter 5 – Configuration and Control Plane

  1. We’d need tail-based sampling that’s applied after span or trace ends and we know the duration or if there were any failures. Tail-based sampling can’t be done inside the process since we have distributed multi-instance applications, but we can use a tail-based sampling processor in the OpenTelemetry Collector that buffers traces and then samples them based on latency, or status codes.

If we only capture suspicious traces, we will not have a baseline anymore – we won’t be able to use traces to observe normal system behavior, build analytics, and so on. So, we should additionally capture a percentage or rate of random traces – if we mark them somehow, we can analyze them separately from problematic traces to create unbiased analytics.

It’s always a good idea to rate-limit all traces, so we don’t overload the telemetry pipeline with traffic bursts.

In addition...