Sampling is used by tracing systems to reduce the performance overhead on the traced applications, and to control the amount of data that needs to be stored in the tracing backends. There are two important sampling techniques: head-based consistent sampling, which makes the sampling decision at the beginning of the request execution, and tail-based sampling, which makes the sampling decision after the execution.
Most existing tracing systems implement head-based sampling that imposes minimum overhead on the applications. Various sampling algorithms can be used to tune the sampling behavior and the impact on the tracing backend. Jaeger implements adaptive sampling that reduces the operational burden for the tracing teams and provides more equitable handling of endpoints with vastly different traffic volumes. A few commercial and open source solutions of tail-based sampling have emerged as well.
This concludes the part of the book dedicated to the data gathering problem in distributed...