Book Image

Learning Data Mining with Python

Book Image

Learning Data Mining with Python

Overview of this book

Table of Contents (20 chapters)
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Sports outcome prediction


We may be able to do better by trying other features. We have a method for testing how accurate our models are. The cross_val_score method allows us to try new features.

There are many possible features we could use, but we will try the following questions:

  • Which team is considered better generally?

  • Which team won their last encounter?

We will also try putting the raw teams into the algorithm to check whether the algorithm can learn a model that checks how different teams play against each other.

Putting it all together

For the first feature, we will create a feature that tells us if the home team is generally better than the visitors. To do this, we will load the standings (also called a ladder in some sports) from the NBA in the previous season. A team will be considered better if it ranked higher in 2013 than the other team.

To obtain the standings data, perform the following steps:

  1. Navigate to http://www.basketball-reference.com/leagues/NBA_2013_standings.html in your...