One of the interesting tasks in unsupervised learning is the profiling or clustering of information, in this chapter, customers and products. Given one dataset, one wants to find groups of records that share similar characteristics. Examples are customers that buy the same products or products that are usually bought together. This task results in a number of benefits for business owners because they are provided the information on which groups of customers and products they have, whereby they are enabled to address them more accurately.
As seen in Chapter 6, Classifying Disease Diagnosis transactional databases can contain both numerical and categorical data. Whenever we face a categorical unscaled variable, we need to split it into the number of values the variable may take, using the CategoricalDataSet
class. For example, let's suppose we have the following transaction list of customer purchases:
Transaction ID |
Customer ID |
Products |
Discount |
Total |
---|---|---|---|---|
1399 |
56 | ... |