Let us take the example from the first chapter about the house ownership.
Age | Annual income in USD | House ownership status |
23 | 50000 | non-owner |
37 | 34000 | non-owner |
48 | 40000 | owner |
52 | 30000 | non-owner |
28 | 95000 | owner |
25 | 78000 | non-owner |
35 | 130000 | owner |
32 | 105000 | owner |
20 | 100000 | non-owner |
40 | 60000 | owner |
50 | 80000 | Peter |
We would like to predict if Peter is a house owner using clustering.
Analysis:
Just as in the first chapter, we will have to scale the data since the income axis is by orders of magnitude greater and thus would diminish the impact of the age axis which actually has a good predictive power in this kind of problem. This is because it is expected that older people have had more time to settle down, save money and buy a house than the younger ones.
We apply the same rescaling from the Chapter...