Wrapping Up
Congratulations! You just built a classification model in a spreadsheet. Two of them actually. Maybe even two and a half. And if you took me up on my median regression challenge, then you're a beast.
Let's recap some of the things we covered:
- Feature selection and assembling training data, including creating dummy variables out of categorical predictors
- Training a linear regression model by minimizing the sum of squared error
- Calculating R-squared, showing a model is statistically significant using an F test, and showing model coefficients are individually significant using a t test
- Evaluating model performance on a holdout set at various classification cutoff values by calculating precision, specificity, false positive rate, and recall
- Graphing a ROC curve
- Adding a logistic link function to a general linear model and reoptimizing
- Maximizing likelihood in a logistic regression
- Comparing models with the ROC curve
And while I'll be the first to admit that the...