When performing regression or classification, some models perform better if highly correlated attributes are removed. The caret
package provides the findCorrelation
function, which can be used to find attributes that are highly correlated to each other. In this recipe, we will demonstrate how to find highly correlated features using the caret
package.
In this recipe, we will continue to use the telecom churn
dataset as the input data source to find highly correlated features.
Perform the following steps to find highly correlated attributes:
- Remove the features that are not coded in numeric characters:
> new_train = trainset[,! names(churnTrain) %in% c("churn", "international_plan", "voice_mail_plan")]
- Then, you can obtain the correlation of each attribute:
>cor_mat = cor(new_train)
- Next, we use
findCorrelation
to search for highly correlated attributes with a cut off equal to0...