We will now look at three main groups of classification (learning) models: Support Vector Machine (SVM), Nearest Neighbor, and Random Forest. SVM simply divides the space in N regions, separated by a boundary. The boundary can be allowed to have different shapes, for example, there is a linear boundary or quadratic boundary. The Nearest Neighbor classification identifies the k-nearest neighbors and classifies the current data point depending on what class the k-nearest neighbors belong to. The Random Forest classifier is a decision tree learning method, which, in simple terms, creates rules from the given training data to be able to classify new data. A set of if-statements in a row is what gives it the name decision tree.
The data that we are going to use comes from the the UCI Machine Learning repository (Lichman, M. (2013)- http://archive.ics.uci.edu/ml . (Irvine, CA: University of California, School of Information and Computer Science). The dataset contains several...