In a microservices architecture, a single request can go through several different services and result in writes to several different data stores and event queues. When debugging a production incident, it isn't always clear whether a problem exists in one system or another. This lack of specificity means metrics and logs only form a small part of the picture. Sometimes we need to zoom out and look at the complete life cycle of a request from the user agent to a terminal service and back again.
In 2010, engineers at Google published a paper describing Dapper (https://research.google.com/archive/papers/dapper-2010-1.pdf), a large-scale distributed systems tracing infrastructure. The paper described how Google had been using an internally developed tracing system to aid in observing system behavior and debugging performance issues. This work inspired others, including engineers at Twitter who, in 2012, introduced an open source distributed tracing system...