Pivot tables are very simple and easy to use. What we are going to do is use big datasets, such as the KDD cup dataset, and group certain values by certain keys.
For example, we have a dataset of people and their favorite fruits. We want to know how many people have apple as their favorite fruit, so we will group the number of people, which is the value, against a key, which is the fruit. This is the simple concept of a pivot table.
We can use the map function to move the KDD datasets into a key-value pair paradigm. We map feature 41 of the dataset using a lambda function in the kv key value, and we append the value as follows:
kv = csv.map(lambda x: (x[41], x))
kv.take(1)
We use feature 41 as the key, and the value is the data point, which is x. We can use the take function to take one of these transformed rows to see how it looks...