Redundant features are those that are highly correlated with each other. They will contain similar information with respect to their output variables. We can remove such features by finding correlation coefficients between features.
In this exercise, we will find redundant features, select any one among them, and remove them.
- Attach the caret package:
#Loading the library
- Load the GermanCredit dataset:
# load the German Credit Data
- Create a correlation matrix:
# calculating the correlation matrix
correlationMatrix <- cor(GermanCredit[,1:9])
- Print the correlation matrix:
# printing the correlation matrix
The output is as follows:
Figure 3.12: The correlation matrix
- To find attributes that have high correlation, set the cutoff as 0.5.
# finding the attributes that are highly corrected
filterCorrelation <- findCorrelation(correlationMatrix, cutoff...