Remember that, in Chapter 4, Cluster Analysis, we discovered that distance matrices are used by k-means to cluster data into a user-specified number of groups of homogenous cases. k-NN uses distances to select the user-defined number of observations that are closest (neighbors) to each of the observations to classify. In k-NN, any attribute can be categorical or numeric, including the target. As we discuss categorization in this chapter, I will limit the description to categorical target (called class attributes).
The classification of a given observation will be made as a majority vote in the neighbors—that is, the most frequent class among the k closest observations. This means that the classification of observations will depend on the chosen number of neighbors. Let's have a look at this. The following figure represents the membership of gray-outlined circles to two class values: the plain grey-lined and the dotted grey-lined. Notice there is one filled grey circle as...