As mentioned previously, we'll be predicting customer satisfaction. The data is based on a former online competition. I've taken the training portion of the data and cleaned it up for our use.
This is an excellent dataset for a classification problem for many reasons. Like so much customer data, it's very messy— especially before I removed a bunch of useless features (there was something like four dozen zero variance features). As discussed in the prior two chapters, I addressed missing values, linear dependencies, and highly correlated pairs. I also found the feature names lengthy and useless, so I coded them V1 through V142. The resulting data deals with what's usually a difficult...