In Chapter 8, Probability Distributions, Covariance, and Correlation, we examined statistical distributions, covariance, and correlation. In the previous chapter, you learned about regression. Here, we will focus on classification using Naïve Bayes and k-Nearest Neighbors (k-NN). The problem we want to solve, when using both algorithms, is as follows:
We have data in which class (the attribute we want to predict) values are known. We call this training data.
We have data in which class values are not known (or we pretend we don't know to test that our classifier works, in which case we call this testing data).
We want to predict unknown class values using information from data where the class is known.
For instance, imagine we have collected data about the health habits of individuals. For half of these individuals, we know whether or not they have developed a disease, say, in the following year. For the other half of our sample...