Ben Hamner, a data scientist at Kaggle, referred to common machine learning gotchas as ML gremlins.
Note
You can watch Ben's original talk at: https://www.youtube.com/watch?v=tleeC-KlsKA.
I like the metaphor because it makes my brain think about evil characters rather than some vague, abstract concepts. In addition to the original gremlins presented by Ben, I want to add several of my own and also present a taxonomy of gremlins (see the following diagram). I employed this metaphor throughout this chapter to avoid boring issues and problems when discussing how to identify and neutralize those pests:
Figure 13.3: The simplified taxonomy of machine learning problems
Dealing with data is hard; that's why we call it data science and data mining! Many different things can go wrong at different stages. Ben mentions data insufficiency, data leakage, non-stationary distributions, poor data sampling and splitting, data quality, and poorly anonymized data. Let's add...