Test Driven Machine Learning

Book Image

Test Driven Machine Learning

Book Image

Test Driven Machine Learning

Overview of this book

Test-Driven Machine Learning

Test-Driven Machine Learning

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Introducing Test-Driven Machine Learning

Introducing Test-Driven Machine Learning

Test-driven development

Behavior-driven development

TDD applied to machine learning

Dealing with randomness

Different approaches to validating the improved models

Quantifying the classification models

Perceptively Testing a Perceptron

Perceptively Testing a Perceptron

Getting started

Exploring the Unknown with Multi-armed Bandits

Exploring the Unknown with Multi-armed Bandits

Understanding a bandit

Testing with simulation

Starting from scratch

Simulating real world situations

A randomized probability matching algorithm

A bootstrapping bandit

The problem with straight bootstrapping

Multi-armed armed bandit throw down

Predicting Values with Regression

Predicting Values with Regression

Refresher on advanced regression

Generating our own data

Building the foundations of our model

Cross-validating our model

Generating data

Making Decisions Black and White with Logistic Regression

Making Decisions Black and White with Logistic Regression

Generating logistic data

Measuring model accuracy

Generating a more complex example

Test driving our model

You're So Naïve, Bayes

You're So Naïve, Bayes

Gaussian classification by hand

Beginning the development

Optimizing by Choosing a New Algorithm

Optimizing by Choosing a New Algorithm

Upgrading the classifier

Applying our classifier

Upgrading to Random Forest

Exploring scikit-learn Test First

Exploring scikit-learn Test First

Test-driven design

Planning our journey

Getting choosey

Developing testable documentation

Bringing It All Together

Bringing It All Together

Starting at the highest level

What we've accomplished

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

The problem with straight bootstrapping

What you could see happening was that with a single observation of data, bootstrapping will give the same answer every time. Ironically, this means that when you're bootstrapping such a small dataset, you will have zero variance. Here's an example in code:

plt.hist([np.random.choice([1]) for i in range(100)])

The histogram for sampling from a dataset that consists of only one element looks as follows:

As predicted, every value is the same. This doesn't really match our intuition about uncertainty though. We have only observed a single number, but it could have just as easily been a different number. This technique doesn't capture it right now. So, how can we fix this? By throwing in a random number of course! It's not terribly academic, but hopefully the tests will reveal that the performance will farewell. Here's the same scenario with the improved bootstrap:

As you can see in this visualization, rather than all of the distribution being focused in a...