Let's understand how GraphFrames works by taking a look at the architecture.
Note
The key thing in order to make use of DataFrames, Catalyst, and Tungsten is that the GraphFrames engine is based on relational queries.
This concept is illustrated in the following image:
So how can a graph query translate into a relational one?
Imagine that we have already found the vertex A, B, and C. Now we are searching for the edge from C to D. This query is illustrated in the following image:
This is pretty straightforward as we can scan the vertex table and search for entries where the Src (source) field is C. Once we have found out that the Dst (destination) field points to D (let's assume that we are also interested in the properties of the node D), we finally join the vertex table in order to obtain these properties of D.
The following image illustrates such a practically complete query and the resulting join operations: