Chapters 3, Bagging, to Chapters 8, Ensemble Diagnostics, were devoted to learning different types of ensembling methods. The discussion was largely based on the classification problem. If the regressand/output of the supervised learning problem is a numeric variable, then we have a regression problem, which will be addressed here. The housing price problem is selected for demonstration purposes throughout the chapter, and the dataset is chosen from a Kaggle competition: https://www.kaggle.com/c/house-prices-advanced-regression-techniques/. The data consists of numerous variables, including as many as 79 independent variables, with the price of the house as the output/dependent variable. The dataset needs some pre-processing as some variables have missing dates, some variables have lots of levels, with a few of them only occurring very rarely, and some variables have missing data in more than 20% of observations.
The pre-processing techniques will...