Ad hoc analysis
In October 2018, the members of the tracing team from Facebook gave a presentation at the Distributed Tracing – NYC meetup [2], where they talked about a new direction that they are taking with their tracing system, Canopy. While not based on open source technologies like Apache Flink, the feature extraction framework in Canopy was conceptually similar to the approach we presented in this chapter.
The API for building new feature extractions was open to all Facebook engineers, but it often had a steep learning curve and required fairly deep familiarity with the overall tracing infrastructure and its data models. More importantly, new feature extractors had to be deployed in production as part of Canopy itself, which meant the Canopy team still had to be deeply involved in reviewing the code and deploying the new analysis algorithms. Finally, feature extraction was primarily designed to work on live data, not on historical data. All of this was creating enough procedural friction...