In this section, we jump into our first example. A common use case for data mining is to improve sales, by asking a customer who is buying a product if he/she would like another similar product as well. You can perform this analysis through affinity analysis, which is the study of when things exist together, namely. correlate to each other.
To repeat the now-infamous phrase taught in statistics classes, correlation is not causation. This phrase means that the results from affinity analysis cannot give a cause. In our next example, we perform affinity analysis on product purchases. The results indicate that the products are purchased together, but not that buying one product causes the purchase of the other. The distinction is important, critically so when determining how to use the results to affect a business process, for instance.
Affinity analysis is a type of data mining that gives similarity between samples (objects). This could be the similarity between the following:
- Users on a website, to provide varied services or targeted advertising
- Items to sell to those users, to provide recommended movies or products
- Human genes, to find people that share the same ancestors
We can measure affinity in several ways. For instance, we can record how frequently two products are purchased together. We can also record the accuracy of the statement when a person buys object 1 and when they buy object 2. Other ways to measure affinity include computing the similarity between samples, which we will cover in later chapters.