Using Spark SQL for creating pivot tables
Pivot tables alternate views of your data and are used during data exploration. In the following example, we demonstrate pivoting using Spark DataFrames:
The following example pivots on housing loan taken and computes the numbers by marital status:
In the next example, we create a DataFrame with appropriate column names for the total and average number of calls:
In the following example, we a DataFrame with appropriate names for the total and average duration of calls for each job category:
In the following example, we pivoting to compute average call for each job category, while also specifying a subset of marital status:
The following is the same as the preceding one, except that we the average call duration values by the housing loan field as well in this case:
Next, we how you can create a DataFrame of pivot table of deposits subscribed by month, save it to disk, and read it back into a RDD:
Further, we use the RDD in the preceding step to...