Book Image

Test Driven Machine Learning

Book Image

Test Driven Machine Learning

Overview of this book

Table of Contents (16 chapters)
Test-Driven Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
2
Perceptively Testing a Perceptron
Index

Test-driven development


Kent Beck wrote in his seminal book on the topic that TDD consists of only two specific rules, which are as follows:

  • Don't write a line of new code unless you first have a failing automated test

  • Eliminate duplication

This, as he notes fairly quickly, leads us to a mantra, really the mantra of TDD: "Red, Green, Refactor."

If this is a bit abstract, let me restate that TDD is a software development process that enables a programmer to write code that specifies the intended behavior before writing any software to actually implement the behavior. The key value of TDD is that at each step of the way, you have working software as well as an itemized set of specifications.

TDD is a software development process that requires the following:

  • The writing of code to detect the intended behavioral change.

  • A rapid iteration cycle that produces working software after each iteration.

  • Clear definitions of what a bug is. If a test is not failing but a bug is found, it is not a bug. It is a new feature.

Another point that Kent makes is that ultimately, this technique is meant to reduce fear in the development process. Each test is a checkpoint along the way to your goal. If you stray too far from the path and wind up in trouble, you can simply delete any tests that shouldn't apply, and then work your code back to a state where the rest of your tests pass. There's a lot of trial and error inherent in TDD, but the same applies to machine learning.

As a result, this whole process changes our minds. The software that you design using TDD will also be modular enough to be able to have different components swapped in and out of your pipeline. We will see more of this in the later chapters of this book.

You might be thinking that just thinking through test cases is equivalent to TDD. If you are like most people, what you write is different from what you might verbally say, and very different from what you think. By writing the intent of our code before we write our code, it applies a pressure to the software design that prevents you from writing "just in case" code. By this I mean the code that we write just because we aren't sure if there will be a problem. Using TDD, we think of a test case, prove that it isn't supported currently, and then fix it. If we can't think of a test case, we then don't add code.

TDD can and does operate at many different levels of the software under development. Tests can be written against functions and methods, entire classes, programs, web services, neural networks, random forests, and whole machine learning pipelines. At each level, the tests are written from the perspective of the prospective client. How does this relate to machine learning? Let's take a step back and reframe what I just said.

In the context of machine learning, tests can be written against functions, methods, classes, mathematical implementations, and all the machine learning algorithms. TDD can even be used to explore technique and methods in a very directed and focused manner, much like you might use a REPL (an interactive shell where you can try out snippets of code) or interactive Python (or IPython) sessions.