Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

Training and testing a model


Let's take the data and divide it into training and test sets:

>>> from sklearn import linear_model,cross_validation, 
                   feature_selection,preprocessing
>>> import statsmodels.formula.api as sm
>>> from statsmodels.tools.eval_measures import mse
>>> from statsmodels.tools.tools import add_constant
>>> from sklearn.metrics import mean_squared_error

>>> X = b_data.values.copy() 
>>> X_train, X_valid, y_train, y_valid = 
                     cross_validation.train_test_split( X[:, :-1], X[:, -1], 
                     train_size=0.80)

We first convert the data frame into an array structure using values.copy() of b_data. We then use the train_test_split function of cross_validation from SciKit to divide the data into training and test set for 80% of the data.

We'll learn how to build the linear regression models using the following packages:

  • The statsmodels module

  • The SciKit package

Even...