There are probably many ways of building near real-time data mining for traces. In Canopy, the feature extraction functionality is built directly into the tracing backend, whereas in Jaeger, it can be done via post-processing add-ons, as we will do in this chapter's code exercise. Major components that are required are shown in Figure 12.1:
Tracing backend, or tracing infrastructure in general, collects tracing data from the microservices of the distributed application
Trace completion trigger makes a judgement call that all spans of the trace have been received and it is ready for processing
Feature extractor performs the actual calculations on each trace
An optional Aggregator combines features from individual traces into an even smaller dataset
Storage records the results of the calculations or aggregations
In the following sections, I will go into detail about the responsibilities of each...