We take the data from the gender classification in the problem Chapter 2, Naive Bayes, Analysis point 6:
Height in cm |
Weight in kg |
Hair length |
Gender |
180 |
75 |
Short |
Male |
174 |
71 |
Short |
Male |
184 |
83 |
Short |
Male |
168 |
63 |
Short |
Male |
178 |
70 |
Long |
Male |
170 |
59 |
Long |
Female |
164 |
53 |
Short |
Female |
155 |
46 |
Long |
Female |
162 |
52 |
Long |
Female |
166 |
55 |
Long |
Female |
172 |
60 |
Long |
? |
To simplify the matters we will remove the column Hair length. We also remove the column Gender since we would like to cluster the people in the table based on their height and weight. We would like to find out whether the 11th person in the table is more likely to be a man or a woman using clustering:
Height in cm |
Weight in kg |
180 |
75 |
174 |
71 |
184... |