Chapter 2
Data Cleaning and Advanced Machine Learning
Section 5
K-Fold Cross-Validation
Thus far, we have trained models on a subset of the data and then assessed performance on the unseen portion, called the test set. This is good practice because the model performance on training data is not a good indicator of its e?ectiveness as a predictor. It's very easy to increase accuracy on a training dataset by overfitting a model, which can result in poorer performance on unseen data. This video covers: - Assessing Models with K-Fold Cross-Validation and Validation Curves - K-Fold Cross Validation - K-Fold Cross Validation Algorithm - Stratified –fold - Validation Curves - Demo on Using K-fold Cross Validation and Validation Curves in Python with Scikit-learn - Dimensionality Reduction Techniques - Principal Component Analysis (PCA) - Key Insights of PCA - Demo on Training a Predictive Model For The Employee Retention Problem