In this section, I'll discuss how I created the dataset used for this chapter and provide insight into the features and the class labels we'll endeavor to predict. The data is available on GitHub at https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R/blob/master/Data/sim_df.csv:
- Let's get our libraries and data loaded:
> library(magrittr) > install.packages("glmnet") > install.packages("caret") > install.packages("classifierplots") > install.packages("DataExplorer") > install.packages("InformationValue") > install.packages("Metrics") > install.packages("ROCR") > install.packages("tidyverse") > options(scipen=999) > sim_df <- readr::read_csv('sim_df.csv')
The dataframe is 10,000 observations of 17 variables, consisting of 16 input features and 1 response. I created this dataset using the twoClassSim()
function from the caret
package. The full code with seeds is available in the online code, allowing you to make changes...