Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Test Driven Machine Learning
  • Table Of Contents Toc
Test Driven Machine Learning

Test Driven Machine Learning

By : Justin Bozonier
3 (3)
close
close
Test Driven Machine Learning

Test Driven Machine Learning

3 (3)
By: Justin Bozonier

Overview of this book

Machine learning is the process of teaching machines to remember data patterns, using them to predict future outcomes, and offering choices that would appeal to individuals based on their past preferences. Machine learning is applicable to a lot of what you do every day. As a result, you can’t take forever to deliver your first iteration of software. Learning to build machine learning algorithms within a controlled test framework will speed up your time to deliver, quantify quality expectations with your clients, and enable rapid iteration and collaboration. This book will show you how to quantifiably test machine learning algorithms. The very different, foundational approach of this book starts every example algorithm with the simplest thing that could possibly work. With this approach, seasoned veterans will find simpler approaches to beginning a machine learning algorithm. You will learn how to iterate on these algorithms to enable rapid delivery and improve performance expectations. The book begins with an introduction to test driving machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression. You will discover how to test different approaches to naïve bayes and compare them quantitatively, along with how to apply OOP (Object-Oriented Programming) and OOP patterns to test-driven code, leveraging SciKit-Learn. Finally, you will walk through the development of an algorithm which maximizes the expected value of profit for a marketing campaign by combining one of the classifiers covered with the multiple regression example in the book.
Table of Contents (11 chapters)
close
close
2
2. Perceptively Testing a Perceptron
10
Index

Quantifying the classification models

To make sure that we're on the same page, let's start by looking at an example of an ROC curve and the AUC score. The scikit-learn documentation has an example code to build an ROC curve and calculate AUC, which you can find at http://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html.

Quantifying the classification models

This ROC curve was built by running a classifier over the famous iris dataset. It shows us the true positive rate (y-axis) that we can get if we allow a given amount of false positive rate (x-axis). For example, if we were good with a 50 percent false positive rate, we would expect to see somewhere around a 90 percent true positive rate. Also, notice that the AUC percentage is 80 percent. Keeping in mind that a perfect classifier would score 100 percent, this seems pretty great. The dashed line in the chart represents a terrible and completely random (read non-predictive) model. An ideal model would be one that is pulled to the upper left-hand corner of the chart as much as possible. You can see in this chart that the model is somewhere between the two, which is pretty good. Whether or not that is acceptable depends on the problem that is being solved. How so?

Well, what if our classifier is attempting to identify the customers who would respond well to an advertisement? Every customer that we show it to who doesn't respond well to it has some chance of never doing business with us again. Let's say (though it's quite extreme) that the cost is so high that we need to eliminate all the false positives. Well, judging from our previous curve, this would mean we would only identify 10-15 percent of the true positives that exist. In this example, the little bit of performance boost is making more money, and so it's working quite well for our situation.

Imagine there's a one in 10,000 chance that if we incorrectly show a specific ad to someone, they'll sue us and it will cost us on average $25,000. Now, what does a good model look like? Here's a chart that I've created from the same previous ROC data, but with the following new set of parameters:

Quantifying the classification models

The maximum profit occurs right around a 1.9 percent false positive rate. As you can see, there is a huge drop off after that, even though this classifier works pretty well. For the purpose of this chapter, we can worry about writing the code for such thing as we progress. For now, it's fine to just have this gain chart. We'll get into guiding our process with these kind of results in future chapters.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Test Driven Machine Learning
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon