Let's take the example from the first chapter regarding house ownership:
Age | Annual income in USD | House ownership status |
23 | 50,000 | Non-owner |
37 | 34,000 | Non-owner |
48 | 40,000 | Owner |
52 | 30,000 | Non-owner |
28 | 95,000 | Owner |
25 | 78,000 | Non-owner |
35 | 13,0000 | Owner |
32 | 10,5000 | Owner |
20 | 10,0000 | Non-owner |
40 | 60,000 | Owner |
50 | 80,000 | Peter |
We would like to predict whether Peter is a house owner using clustering.
Just as in the first chapter, we will have to scale the data, since the income axis is significantly greater and thus would diminish the impact of the age axis, which actually has a good predictive power in this kind of problem. This is because it is expected that older people have had more time to settle down, save money, and buy a house, as compared to younger people.
We apply the same rescaling from Chapter 1, Classification Using K Nearest Neighbors, and obtain the following table:
Age | Scaled age | Annual income in USD | Scaled annual income | House ownership status |
23 | 0.09375 | 50000 | 0.2 | non-owner... |