Case Study – Horse Colic Classification
To illustrate the different steps and methodologies described in Chapter 1, Machine Learning Review, from data analysis to model evaluation, a representative dataset that has real-world characteristics is essential.
We have chosen "Horse Colic Dataset" from the UCI Repository available at the following link: https://archive.ics.uci.edu/ml/datasets/Horse+Colic
The dataset has 23 features and has a good mix of categorical and continuous features. It has a large number of features and instances with missing values, hence understanding how to replace these missing values and using it in modeling is made more practical in this treatment. The large number of missing data (30%) is in fact a notable feature of this dataset. The data consists of attributes that are continuous, as well as nominal in type. Also, the presence of self-predictors makes working with this dataset instructive from a practical standpoint.
The goal of the exercise is to apply the techniques...