Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Performing neighborhood aggregation


GraphX does most of the computation by isolating each vertex and its neighbors. It makes it easier to process the massive graph data on distributed systems. This makes the neighborhood operations very important. GraphX has a mechanism to do it at each neighborhood level in the form of the aggregateMessages method. It does it in two steps:

  1. In the first step (first function of the method), messages are send to the destination vertex or source vertex (similar to the Map function in MapReduce).

  2. In the second step (second function of the method), aggregation is done on these messages (similar to the Reduce function in MapReduce).

Getting ready

Let's build a small dataset of the followers:

Follower

Followee

John

Barack

Pat

Barack

Gary

Barack

Chris

Mitt

Rob

Mitt

Our goal is to find out how many followers each node has. Let's load this data in the form of two files: nodes.csv and edges.csv.

The following is the content of nodes.csv:

1,Barack
2,John
3,Pat...