Implementing data clustering
In this section, we're going to take a look at how to approach data clustering. First, let's define what we mean by data clustering. Data clustering refers to how we partition data into groups or clusters. Clusters can be meaningful if they provide an expanded understanding of domain knowledge. We use clustering for many applications, such as medicine, where clustering can help identify how a group of patients responds to treatment, or market research, where clustering is used to group consumers in order to appeal to that group based on that particular group's characteristics.
For the purpose of this discussion, we are going to look at synthetic clusters rather than applied clusters. In Chapter 16, Advanced Applied Computational Thinking Problems, you'll see some examples of clusters in context. A synthetic cluster is made from a synthetic dataset. That is, we generate the dataset using an algorithm. Let's take a look at the...