Let's take the data and divide it into training and test sets:
>>> from sklearn import linear_model,cross_validation, feature_selection,preprocessing >>> import statsmodels.formula.api as sm >>> from statsmodels.tools.eval_measures import mse >>> from statsmodels.tools.tools import add_constant >>> from sklearn.metrics import mean_squared_error >>> X = b_data.values.copy() >>> X_train, X_valid, y_train, y_valid = cross_validation.train_test_split( X[:, :-1], X[:, -1], train_size=0.80)
We first convert the data frame into an array structure using values.copy()
of b_data
. We then use the train_test_split
function of cross_validation
from SciKit to divide the data into training and test set for 80% of the data.
We'll learn how to build the linear regression models using the following packages:
The statsmodels module
The SciKit package
Even...