First example - the k-nearest neighbors algorithm
The k-nearest neighbors algorithm is a simple machine learning algorithm used for supervised classification. The main components of this algorithm are:
- A train dataset: This dataset is formed by instances with one or more attributes that define every instance and a special attribute that determines the label of the instance
- A distance metric: This metric is used to determine the distance (or similarity) between the instances of the train dataset and the new instances you want to classify
- A test dataset: This dataset is used to measure the behavior of the algorithm
When it has to classify an instance, it calculates the distance against this instance and all the instances of the train dataset. Then, it takes the k-nearest instances and looks at the tag of those instances. The tag with most instances is the tag assigned to the input instance.
In this chapter, we are going to work with the Bank Marketing dataset of the UCI Machine Learning Repository...