Feature extraction exercise
In this code example, we will build an Apache Flink job called SpanCountJob
for basic feature extraction from traces. Apache Flink is a big data, real-time streaming framework that is well suited to processing traces as they are being collected by the tracing backend. Other streaming frameworks, like Apache Spark or Apache Storm, can be used in a similar way. All these frameworks work well with the messaging queue infrastructure; we will be using Apache Kafka for that.
Since version 1.8, the Jaeger backend supports Kafka as an intermediate transport for spans received by the collectors. The jaeger-ingester component reads the spans from a Kafka stream and writes them to the storage backend, in our case Elasticsearch. Figure 12.2 shows the overall architecture of the exercise. By using this deployment mode of Jaeger, we are getting traces fed into Elasticsearch so that they can be viewed individually using the Jaeger UI, and they are also processed by Apache Flink...