Processing graphs containing multiple types of relationships
For the next few examples, we use an augmented DataFrame containing a relationship column. We insert two types of relationships in the column based on the number of similar purchases and the number of categories that a product belongs to.
For this, we join the nodes and edges DataFrames, and subsequently drop the node-related columns after the relationship computation is completed to obtain our final edges DataFrame (with the relationship column suitably populated):
scala> val joinDF = nodesDF.join(edgesDF).where(nodesDF("id") === edgesDF("src")).withColumn("relationship", when(($"similars" > 4) and ($"categories" <= 3), "highSimilars").otherwise("alsoPurchased")) scala> val edgesDFR = joinDF.select("src", "dst", "relationship") scala> val gDFR = GraphFrame(nodesDF, edgesDFR)
Next, we count the number of records for each type of relationship and list a few edges along with the relationship values:
scala> gDFR.edges...