EXERCISES
CLARIFYING THE CONCEPTS
- What do we mean by high dimensionality in data science?
- Why do we need dimension reduction methods?
- What does principal components replace the original set of m predictors with?
- Which principal component accounts for the most variability?
- Which of the other principal components is correlated with the first principal component?
- Why do we use rotation?
- Explain the eigenvalue criterion?
- What is the proportion of variance explained criterion?
- True or false: It is not necessary to perform validation of the principal components.
- When we use the principal components as predictors in a regression model, what value do the VIFs take? What does this indicate?
WORKING WITH THE DATA
For the following exercises, work with the clothing_store_PCA_training and clothing_store_PCA_test data sets. Use either Python or R to solve each problem.
- Standardize or normalize the predictors.
- Construct the correlation matrix for the predictor variables Purchase Visits, Days...