Book Image

Test Driven Machine Learning

Book Image

Test Driven Machine Learning

Overview of this book

Table of Contents (16 chapters)
Test-Driven Machine Learning
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
2
Perceptively Testing a Perceptron
Index

Dealing with randomness


Dealing with randomness in algorithms can be a huge mental block for some people when they try to understand how they might use TDD. TDD is so deterministic, intentional, and controlled that your initial gut reaction to introducing a random process may be to think that it makes TDD impossible. This is a place where TDD actually shines though. Here's how.

Let's pick up where we left off on the simplistic NumberGuesser from earlier. We're going to add a requirement so that it will randomly choose numbers that the user has guessed, but will also weigh for what is most likely.

To get there, I first have the NumberGuesser guess whatever the previous number was revealed to be every time I ask for a guess. The test for this looks like the following:

def given_one_datapoint_when_asked_to_guess_test():
  #given
  number_guesser = NumberGuesser()
  previously_chosen_number = 5
  number_guesser.number_was(previously_chosen_number)
  #when
  guessed_number = number_guesser.guess()
  #then
  assert type(guessed_number) is int, 'the answer should be a number'
  assert guessed_number == previously_chosen_number, 'the answer should be the previously chosen number.'

It's a simple test that ultimately just requires us to set a variable value in our class. The behavior of predicting on the basis of the last previous input can be valuable. It's the simplest prediction that we can start with.

If you run your tests here, you will see them fail. This is what my code looks like after getting this to pass:

class NumberGuesser:
  """Guesses numbers based on the history of your input"""
  def __init__(self):
    self._guessed_numbers = None
  def number_was(self, guessed_number):
    self._guessed_number = guessed_number
  def guess(self):
    return self._guessed_number

Upon making this test pass, we can review it for any refactoring opportunities. It's still pretty simple, so let's keep going. Next, I will have NumberGuesser randomly choose from all of the numbers that were previously guessed, instead of just the last previous guess. I will start with making sure that the guessed number is the one that I've seen before:

def given_two_datapoints_when_asked_to_guess_test():
  #given
  number_guesser = NumberGuesser()
  previously_chosen_numbers = [1,2,5]
  number_guesser.numbers_were(previously_chosen_numbers)
  #when
  guessed_number = number_guesser.guess()
  #then
  assert guessed_number in previously_chosen_numbers, 'the guess should be one of the previously chosen numbers'

Running this test now will cause a new failure. While thinking about the laziest way of getting this test to work, I realized that I can cheat big time. All I need to do is create my new method, and take the first element in the list:

class NumberGuesser:
  """Guesses numbers based on the history of your input"""
  def __init__(self):
    self._guessed_numbers = None
  def numbers_were(self, guessed_numbers):
    self._guessed_number = guessed_numbers[0]
  def number_was(self, guessed_number):
    self._guessed_number = guessed_number
  def guess(self):
    return self._guessed_number

For our purposes, laziness is king. Laziness guards us from the over-engineered solutions, and forces our test suite to become more robust. It does this by making our problem-solving faster, and spurring an uncomfortable feeling that will prompt us to test more edge cases.

So, now I want to assert that I don't always choose the same number. I don't want to force it to always choose a different number, but there should be some mixture. To test this, I will refactor my test, and add a new assertion, as follows:

def given_multiple_datapoints_when_asked_to_guess_many_times_test():
  #given
  number_guesser = NumberGuesser()
  previously_chosen_numbers = [1,2,5]
  number_guesser.numbers_were(previously_chosen_numbers)
  #when
  guessed_numbers = [number_guesser.guess() for i in range(0,100)]
  #then
  for guessed_number in guessed_numbers:
    assert guessed_number in previously_chosen_numbers, 'every guess should be one of the previously chosen numbers'
  assert len(set(guessed_numbers)) > 1, "It shouldn't always guess the same number."

I get the test failure message It shouldn't always guess the same number, which is perfect. This test also causes others to fail, so I will work out the simplest thing that I can do to make everything green again, and I will end up here:

import random
class NumberGuesser:
  """Guesses numbers based on the history of your input"""
  def __init__(self):
    self._guessed_numbers = None
  def numbers_were(self, guessed_numbers):
    self._guessed_numbers = guessed_numbers
  def number_was(self, guessed_number):
    self._guessed_numbers = [guessed_number]
  def guess(self):
    if self._guessed_numbers == None:
      return None
    return random.choice(self._guessed_numbers)

There are probably many ways that one could get this test to pass. We've solved it this way because it's first to my mind, and feels like it's leading us in a good direction. What refactoring do we want to do here? Each method is a single line except the guess method. The guess method is still pretty simple, so let's keep going.

Now, I notice that if I've just used number_was to enter the observations of the previous numbers, it will only ever guess the previous number, which is bad. So, I need another test to catch this. Let's write the new test (this should be our fourth):

def given_a_starting_set_of_observations_followed_by_a_one_off_observation_test():
    #given
  number_guesser = NumberGuesser()
  previously_chosen_numbers = [1,2,5]
  number_guesser.numbers_were(previously_chosen_numbers)
  one_off_observation = 0
  number_guesser.number_was(one_off_observation)
  #when
  guessed_numbers = [number_guesser.guess() for i in range(0,100)]
  #then
  for guessed_number in guessed_numbers:
    assert guessed_number in previously_chosen_numbers + [one_off_observation], 'every guess should be one of the previously chosen numbers'
  assert len(set(guessed_numbers)) > 1, "It shouldn't always guess the same number."

This fails on the last assertion, which is perfect. I will make the test pass using the following code:

import random
class NumberGuesser:
  """Guesses numbers based on the history of your input"""
  def __init__(self):
    self._guessed_numbers = []
  def numbers_were(self, guessed_numbers):
    self._guessed_numbers = guessed_numbers
  def number_was(self, guessed_number):
    self._guessed_numbers.append(guessed_number)
  def guess(self):
    if self._guessed_numbers == []:
      return None
    return random.choice(self._guessed_numbers)

Tip

Downloading the example code

You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

There are other issues here that I can write for the failing tests. Notice that if I were to provide a single observation and provide a set of observations, every assertion that I've listed so far would succeed. So, I write a new test to ensure that NumberGuesser guesses every number at least once. We can code this up in the following way:

def given_a_one_off_observation_followed_by_a_set_of_observations_test():
    #given
  number_guesser = NumberGuesser()
  previously_chosen_numbers = [1,2]
  one_off_observation = 0
  all_observations = previously_chosen_numbers + [one_off_observation]
  number_guesser.number_was(one_off_observation)
  number_guesser.numbers_were(previously_chosen_numbers)
  #when
  guessed_numbers = [number_guesser.guess() for i in range(0,100)]
  #then
  for guessed_number in guessed_numbers:
    assert guessed_number in all_observations, 'every guess should be one of the previously chosen numbers'
  assert len(set(guessed_numbers)) == len(all_observations), "It should eventually guess every number at least once."

And my final code looks like the following:

import random

class NumberGuesser:
  """Guesses numbers based on the history of your input"""
  def __init__(self):
    self._guessed_numbers = []
  def numbers_were(self, guessed_numbers):
    self._guessed_numbers += guessed_numbers
  def number_was(self, guessed_number):
    self._guessed_numbers.append(guessed_number)
  def guess(self):
    if self._guessed_numbers == []:
      return None
    return random.choice(self._guessed_numbers)

Technically, there is a chance that this test will fail just due to a random chance. The probability of this test failing for this reason is 0.5^100, which is 7.9 x 10^-31. Basically, the chance is zero.