In this section, we will use an approach for machine learning where we will do the following:
- Partition the dataset into a training and testing set
- Generate a model of the data
- Test the efficiency of our model
Machine learning works by featuring a dataset that we will break up into a training section and a testing section. We will use the training data to come up with a model. We can then prove or test that model against the testing dataset.
For a dataset to be usable, we need at least a few hundred observations. I am using the housing data from http://uci.edu. Let's load the dataset by using the following command:
housing <- read.table("http://archive.ics.uci.edu/ml/machine-learning-databases/housing/housing.data")
The site documents the names of the variables as follows:
Variables | Description |
CRIM | Per capita crime rate |
ZN | Residential zone rate percentage |
INDUS | Proportion of non-retail business in town |
CHAS | Proximity to Charles River (Boolean) |
NOX | Nitric oxide concentration |
RM... |