In this section, we will look at the transformations that should be avoided. Here, we will focus on one particular transformation.
We will start by understanding the groupBy API. Then, we will investigate data partitioning when using groupBy, and then we will look at what a skew partition is and why should we avoid skew partitions.
Here, we are creating a list of transactions. UserTransaction is another model class that includes userId and amount. The following code block shows a typical transaction where we are creating a list of five transactions:
test("should trigger computations using actions") {
//given
val input = spark.makeRDD(
List(
UserTransaction(userId = "A", amount = 1001),
UserTransaction(userId = "A", amount = 100),
UserTransaction(userId = "A", amount = 102),
UserTransaction...