Test Driven Machine Learning

At this point, you maybe wondering how TDD will be used in machine learning, and whether we use it on regression or classification problems. In every machine learning algorithm there exists a way to quantify the quality of what you're doing. In the linear regression it's your adjusted R2 value; in classification problems it's an ROC curve (and the area beneath it) or a confusion matrix, and more. All of these are testable quantities. Of course, none of these quantities have a built-in way of saying that the algorithm is good enough.

We can get around this by starting our work on every problem by first building up a completely naïve and ignorant algorithm. The scores that we get for this will basically represent a plain, old, and random chance. Once we have built an algorithm that can beat our random chance scores, we just start iterating, attempting to beat the next highest score that we achieve. Benchmarking algorithms is an entire field in its own right that can be delved into more deeply.

In this book, we will implement a naïve algorithm to get a random chance score, and we will build up a small test suite that we can then use to pit this model against another. This will allow us to have a conversation with our machine learning models in the same manner as we had with Python earlier.

For a professional machine learning developer, it's quite likely that the ideal metric to test is a profitability model that compares risk (monetary exposure) to expected value (profit). This can help us keep a balanced view of how much error and what kind of error we can tolerate. In machine learning, we will never have a perfect model, and we can search for the rest of our lives for "the best" model. By finding a way to work your financial assumptions into the model, we will improve our ability to decide between the competing models. We will definitely touch on this topic throughout the book, so it's good to keep it in mind.

Test Driven Machine Learning

Test Driven Machine Learning

Overview of this book

Related Content you might be interested in

Current Title:

Test Driven Machine Learning

TDD applied to machine learning