Book Image

Learning Data Mining with Python

Book Image

Learning Data Mining with Python

Overview of this book

Table of Contents (20 chapters)
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 3: Predicting Sports Winners with Decision Trees


More on pandas

http://pandas.pydata.org/pandas-docs/stable/tutorials.html

The pandas library is a great package—anything you normally write to do data loading is probably already implemented in pandas. You can learn more about it from their tutorial, linked above.

There is also a great blog post written by Chris Moffitt that overviews common tasks people do in Excel and how to do them in pandas: http://pbpython.com/excel-pandas-comp.html

You can also handle large datasets with pandas; see the answer, from user Jeff (the top answer at the time of writing), to this StackOverflow question for an extensive overview of the process: http://stackoverflow.com/questions/14262433/large-data-work-flows-using-pandas.

Another great tutorial on pandas is written by Brian Connelly: http://bconnelly.net/2013/10/summarizing-data-in-python-with-pandas/

More complex features

http://www.basketball-reference.com/teams/ORL/2014_roster_status.html

Sports teams change regularly from game to game. What is an easy win for a team can turn into a difficult game if a couple of the best players are injured. You can get the team rosters from basketball-reference as well. For example, the roster for the 2013-2014 season for the Orlando Magic is available at the above link—similar data is available for all NBA teams.

Writing code to integrate how much a team changes, and using that to add new features, can improve the model significantly. This task will take quite a bit of work, though!