In this recipe we will remove redundant variables by building a correlation matrix that identifies highly correlated variables.
This recipe uses the datafile, nasadata.txt
and the stream file, recipe_variableselection_correlations.str
.
You will need a copy of Microsoft Excel to visualize the correlation matrix.
To remove redundant variables using correlation matrices:
Open the stream,
recipe_variableselection_correlations.str
by navigating to File | Open Stream.Make sure the datafile points to the correct path to the file
nasadata.txt
.Open the Type node named
Correlation Types
. Notice that there are several variables of type continuous whose direction values have been set to Input, and a single continuous variable has its direction set to Target. The variable set to Target can be any variable that won't be an input to the model. If you don't have a good candidate, you can create a random variable and set that...