The pivot()
function allows you to translate rows into columns while performing aggregation on some of the columns. If you think about it you are physically adjusting the axes of a table about a pivot point.
I thought of an easy example to show how this all works. I think it is one of those features that once you see it in action you realize the number of areas that you could apply it.
In our example, we have some raw price points for stocks and we want to convert that table about a pivot to produce average prices per year per stock.
The code in our example is:
from pyspark import SparkContextfrom pyspark.sql import SparkSessionfrom pyspark.sql import functions as funcsc = SparkContext.getOrCreate()spark = SparkSession(sc)# load product setpivotDF = spark.read.format("csv") \ .option("header", "true") \ .load("pivot.csv");pivotDF.show()pivotDF.createOrReplaceTempView("pivot")# pivot data per the year to get average prices per stock per yearpivotDF \ .groupBy...