Book Image

Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Overview of this book

Table of Contents (16 chapters)
Apache Spark Graph Processing
Credits
Foreword
About the Author
About the Reviewer
www.PacktPub.com
Preface
Index

Community detection through label propagation


In the following section, we are going to implement a community detection algorithm using the Pregel interface. Label Propagation Algorithm (LPA) is a simple and fast method for detecting communities within graphs. By construction, the communities obtained by the label propagation process require each node to have at least as many neighbors within its community as it has with each of the other communities.

Let's quickly describe how the LPA works. First, each node is initially given its vertex ID as its label. At the subsequent iterations, each node determines its community, based on the labels of its neighbors. Specifically, the node chooses to join the community to which the maximum number of its neighbors belong to. If there is a tie, one of the majority labels is picked randomly. As we propagate the labels in this way across the graph, most labels will disappear, whereas the remaining ones define the communities. Ideally, this iterative algorithm...