Understanding GraphFrame internals
In the following sections, we briefly present GraphFrame internals with respect to its execution plan and partitioning.
Viewing GraphFrame physical execution plan
As the GraphFrames are built on Spark SQL DataFrames, we can the physical plan to understand the execution of the graph operations, as shown:
scala> g.edges.filter("salerank < 100").explain()
We will explore this in more detail in Chapter 11, Tuning Spark SQL Components for Performance.
Understanding partitioning in GraphFrames
Spark splits data into partitions and computations on the partitions in parallel. You can adjust the level of partitioning to improve the efficiency of Spark computations.
In the following example, we examine the results of repartitioning a GraphFrame. We can partition our GraphFrame based on the column values of the vertices DataFrame. Here, we use the values in the group column to partition by group or product type. Here, we will present the results of repartitioning...