Introducing large-scale graph applications
Analysis of graphs based on large Datasets is becoming increasingly important in various areas, such as social networks, communication networks, citation networks, web graphs, transport networks, product co-purchasing networks, and so on. Typically, graphs are created from source in a tabular or relational format, and then applications, such as search and graph algorithms, are run on them to derive key insights.
GraphFrames provide a declarative API that can be used for both interactive queries and standalone programs on large-scale graphs. As GraphFrames are implemented on top of Spark SQL, it enables parallel processing and optimization across the computation:
The main programming abstraction in GraphFrame's API is a GraphFrame. Conceptually, it consists of two DataFrames representing the vertices and edges of the graph. The vertices and edges may have multiple attributes, which can also be used in queries. For example, in a social network, the...