Test Driven Machine Learning

To make sure that we're on the same page, let's start by looking at an example of an ROC curve and the AUC score. The scikit-learn documentation has an example code to build an ROC curve and calculate AUC, which you can find at http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html.

This ROC curve was built by running a classifier over the famous iris dataset. It shows us the true positive rate (y-axis) that we can get if we allow a given amount of false positive rate (x-axis). For example, if we were good with a 50 percent false positive rate, we would expect to see somewhere around a 90 percent true positive rate. Also, notice that the AUC percentage is 80 percent. Keeping in mind that a perfect classifier would score 100 percent, this seems pretty great. The dashed line in the chart represents a terrible and completely random (read non-predictive) model. An ideal model would be one that is pulled to the upper left-hand corner of the chart as much as possible. You can see in this chart that the model is somewhere between the two, which is pretty good. Whether or not that is acceptable depends on the problem that is being solved. How so?

Well, what if our classifier is attempting to identify the customers who would respond well to an advertisement? Every customer that we show it to who doesn't respond well to it has some chance of never doing business with us again. Let's say (though it's quite extreme) that the cost is so high that we need to eliminate all the false positives. Well, judging from our previous curve, this would mean we would only identify 10-15 percent of the true positives that exist. In this example, the little bit of performance boost is making more money, and so it's working quite well for our situation.

Imagine there's a one in 10,000 chance that if we incorrectly show a specific ad to someone, they'll sue us and it will cost us on average $25,000. Now, what does a good model look like? Here's a chart that I've created from the same previous ROC data, but with the following new set of parameters:

The maximum profit occurs right around a 1.9 percent false positive rate. As you can see, there is a huge drop off after that, even though this classifier works pretty well. For the purpose of this chapter, we can worry about writing the code for such thing as we progress. For now, it's fine to just have this gain chart. We'll get into guiding our process with these kind of results in future chapters.

Test Driven Machine Learning

Test Driven Machine Learning

Overview of this book

Related Content you might be interested in

Current Title:

Test Driven Machine Learning

Quantifying the classification models