Test Driven Machine Learning

Book Image

Test Driven Machine Learning

Book Image

Test Driven Machine Learning

Overview of this book

Test-Driven Machine Learning

Test-Driven Machine Learning

Credits

About the Author

About the Author

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Introducing Test-Driven Machine Learning

Introducing Test-Driven Machine Learning

Test-driven development

Behavior-driven development

TDD applied to machine learning

Dealing with randomness

Different approaches to validating the improved models

Quantifying the classification models

Perceptively Testing a Perceptron

Perceptively Testing a Perceptron

Getting started

Exploring the Unknown with Multi-armed Bandits

Exploring the Unknown with Multi-armed Bandits

Understanding a bandit

Testing with simulation

Starting from scratch

Simulating real world situations

A randomized probability matching algorithm

A bootstrapping bandit

The problem with straight bootstrapping

Multi-armed armed bandit throw down

Predicting Values with Regression

Predicting Values with Regression

Refresher on advanced regression

Generating our own data

Building the foundations of our model

Cross-validating our model

Generating data

Making Decisions Black and White with Logistic Regression

Making Decisions Black and White with Logistic Regression

Generating logistic data

Measuring model accuracy

Generating a more complex example

Test driving our model

You're So Naïve, Bayes

You're So Naïve, Bayes

Gaussian classification by hand

Beginning the development

Optimizing by Choosing a New Algorithm

Optimizing by Choosing a New Algorithm

Upgrading the classifier

Applying our classifier

Upgrading to Random Forest

Exploring scikit-learn Test First

Exploring scikit-learn Test First

Test-driven design

Planning our journey

Getting choosey

Developing testable documentation

Bringing It All Together

Bringing It All Together

Starting at the highest level

What we've accomplished

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Test driving our model

To start with now, we must create the framework for scoring our model in a test. It will look like the following:

import pandas
import sklearn.metrics
import statsmodels.formula.api as smf
import numpy as np

def logistic_regression_test():
  df = pandas.DataFrame.from_csv('./generated_logistic_data.csv')
  generated_model = smf.logit('y ~ variable_d', df)
  generated_fit = generated_model.fit()
  roc_data = sklearn.metrics.roc_curve(df['y'], generated_fit.predict(df))
  auc = sklearn.metrics.auc(roc_data[0], roc_data[1])
  print generated_fit.summary()
  print "AUC score: {0}".format(auc)
  assert auc > .6, 'AUC should be significantly above random'

The previous code also includes a first stab at a model. Because we generated the data, we know that variable_d is completely unhelpful, but it makes this a bit more of an interesting exploration.

When we run the previous code, the test fails, as expected. I have the test set up to give the full statistical summary, as...