Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Test Driven Machine Learning
  • Table Of Contents Toc
Test Driven Machine Learning

Test Driven Machine Learning

By : Justin Bozonier
3 (3)
close
close
Test Driven Machine Learning

Test Driven Machine Learning

3 (3)
By: Justin Bozonier

Overview of this book

Machine learning is the process of teaching machines to remember data patterns, using them to predict future outcomes, and offering choices that would appeal to individuals based on their past preferences. Machine learning is applicable to a lot of what you do every day. As a result, you can’t take forever to deliver your first iteration of software. Learning to build machine learning algorithms within a controlled test framework will speed up your time to deliver, quantify quality expectations with your clients, and enable rapid iteration and collaboration. This book will show you how to quantifiably test machine learning algorithms. The very different, foundational approach of this book starts every example algorithm with the simplest thing that could possibly work. With this approach, seasoned veterans will find simpler approaches to beginning a machine learning algorithm. You will learn how to iterate on these algorithms to enable rapid delivery and improve performance expectations. The book begins with an introduction to test driving machine learning and quantifying model quality. From there, you will test a neural network, predict values with regression, and build upon regression techniques with logistic regression. You will discover how to test different approaches to naïve bayes and compare them quantitatively, along with how to apply OOP (Object-Oriented Programming) and OOP patterns to test-driven code, leveraging SciKit-Learn. Finally, you will walk through the development of an algorithm which maximizes the expected value of profit for a marketing campaign by combining one of the classifiers covered with the multiple regression example in the book.
Table of Contents (11 chapters)
close
close
2
2. Perceptively Testing a Perceptron
10
Index

Different approaches to validating the improved models

Model quality validation, of course, depends upon the kinds of models that you're building, and the purpose of them. There are a few general types of machine learning problems that I've covered in this book, and each has different ways of validating model quality.

Classification overview

We'll get to the specifics in just a moment, but let's review the high-level terms. One method for quantifying the quality of a supervised classification is using ROC curves. These can be quantified by finding the total area under the curve (AUC), finding the location of the inflection point, or by simply setting a limit of the amount of data that must be classified correctly against percentage of the time.

Another common technique is that of a confusion matrix. Limits can be set on certain cells of the matrix to help drive testing. Also, they can be used as a diagnostic tool that can help identify the issues that come up.

We will typically use the k-fold cross validation. Cross validation is a technique where we take our sample dataset and divide it into several separate datasets. We can then use one of these datasets to develop against one of the others, to validate that our data isn't overfitted, and a third dataset for a final check to see whether the others went well. All of these separate datasets work to make sure that we develop a generally applicable model, and not just one that predicts our training data but falls apart in production.

Regression

Linear regression quality is typically quantified with the combination of an adjusted- R2 value and by checking this, the residuals of the model don't fit a pattern. How do we check for this in an automated test?

The adjusted R2 values are provided by the most statistical tools. It's a quick measure of how much of the variations in the data is explained by your model. Checking model assumptions is more difficult. It is much easier to see patterns visually than via discrete, specific tests.

So, this is hard but there are other tests… perhaps, even more important tests that are easier—cross-validation. By selecting strong test datasets with a litany of misbehavior, we can compare R2 statistics from development, to testing, to ready for production. If a serious drop occurs at any point, then we can circle back.

Clustering

Clustering is the way in which we create our classification model. From there, we can test it by cross validating against our data. This can be especially useful in clustering algorithms, such as k-means, where the feedback can help us tune the number of clusters we want to use to minimize the cluster variation. As we move from one cross-validation dataset to another, it's important to remember not to persist with our training data from the previous tests, lest we bias our results.

CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Test Driven Machine Learning
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon