How to run linear regression in practice
The accompanying notebook, linear_regression_intro.ipynb
, illustrates a simple and then a multiple linear regression, the latter using both OLS and gradient descent. For the multiple regression, we generate two random input variables x1 and x2 that range from -50 to +50, and an outcome variable that's calculated as a linear combination of the inputs, plus random Gaussian noise, to meet the normality assumption GMT 6:
OLS with statsmodels
We use statsmodels
to estimate a multiple regression model that accurately reflects the data-generating process, as follows:
import statsmodels.api as sm
X_ols = sm.add_constant(X)
model = sm.OLS(y, X_ols).fit()
model.summary()
This yields the following OLS Regression Results summary:
Figure 7.2: OLS Regression Results summary
The upper part of the summary displays the dataset characteristics—namely, the estimation method and the number of observations and parameters...