Here we go! After some necessary preparation review, we will finally start to learn from data; in this case, we are looking to label data we observe in real life.
In this case, we have the following elements:
- A set of N-dimensional elements of numeric type
- A predetermined number of groups (this is tricky because we have to make an educated guess)
- A set of common representative points for each group (called centroids)
The main objective of this method is to split the dataset into an arbitrary number of clusters, each of which can be represented by the mentioned centroids.
The word centroid comes from the mathematics world, and has been translated to calculus and physics. Here we find a classical representation of the analytical calculation of a triangle's centroid: