Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Book Image

Apache Spark Graph Processing

Overview of this book

Apache Spark Graph Processing

Apache Spark Graph Processing

Credits

Foreword

About the Author

About the Author

About the Reviewer

About the Reviewer

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Getting Started with Spark and GraphX

Getting Started with Spark and GraphX

Downloading and installing Spark 1.4.1

Experimenting with the Spark shell

Getting started with GraphX

Building and Exploring Graphs

Building and Exploring Graphs

Network datasets

Building graphs

Computing the degrees of the network nodes

Graph Analysis and Visualization

Graph Analysis and Visualization

Network datasets

The graph visualization

The analysis of network connectedness

The network centrality and PageRank

Scala Build Tool revisited

Transforming and Shaping Up Graphs to Your Needs

Transforming and Shaping Up Graphs to Your Needs

Transforming the vertex and edge attributes

Modifying graph structures

Joining graph datasets

Data operations on VertexRDD and EdgeRDD

Creating Custom Graph Aggregation Operators

Creating Custom Graph Aggregation Operators

NCAA College Basketball datasets

The aggregateMessages operator

Joining average stats into a graph

Performance optimization

The MapReduceTriplets operator

Iterative Graph-Parallel Processing with Pregel

Iterative Graph-Parallel Processing with Pregel

The Pregel computational model

The Pregel API in GraphX

Community detection through label propagation

The Pregel implementation of PageRank

Learning Graph Structures

Learning Graph Structures

Community clustering in graphs

Applications – music fan community detection

References

Chapter 2, Building and Exploring Graphs

Chapter 3, Graph Analysis and Visualization

Chapter 7, Learning Graph Structures

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Chapter 7. Learning Graph Structures

In this chapter, we will show you how to learn interesting structures from graphs in Spark. In principle, one learns and finds relationships from data by first selecting the problem of interest. The most common learning problems are regression, classification, ranking, and clustering. In this book, we will focus on clustering. In particular, we will focus on graph data, and apply clustering to detect communities within the graphs. Here is our roadmap for this chapter. First, we will introduce the concepts of spectral clustering. Then, we will study a specific method, which allows us to cluster graphs in Spark. Finally, we will apply these techniques to music and song playlist datasets. This application will also serve as an opportunity to review the tools and techniques that we covered in the previous chapters. We will bring them together in this chapter.