If you have followed the application monitoring and performance management space in the past few years, whether at conferences or in the news and tech blogs, you have probably heard the term "three pillars of observability" used to refer to metrics, logs, and distributed tracing. While some people have strong, very-amusing-to-read (strong language warning!), and partially justified objections [1], [2] to this term, we can look at these three areas as different approaches to recording events occurring in the applications. In the end, all of these signals are collected by instrumentation in the code, triggered by some events we deem worthy of recording.
Ideally, when we troubleshoot a performance problem, we would like to know as much as possible about what the application was doing at the time, by recording all possible events. The main challenge we face is the cost of collecting and reporting all that telemetry. The three "pillars" primarily differ in their...