7. Learning Graph Structures | Apache Spark Graph Processing

Book Overview & Buying
Table Of Contents

Apache Spark Graph Processing

By : Rindra Ramamonjison

3.5 (2)

Buy this Book

Apache Spark Graph Processing

3.5 (2)

By: Rindra Ramamonjison

Buy this Book

Overview of this book

Apache Spark is the next standard of open-source cluster-computing engine for processing big data. Many practical computing problems concern large graphs, like the Web graph and various social networks. The scale of these graphs - in some cases billions of vertices, trillions of edges - poses challenges to their efficient processing. Apache Spark GraphX API combines the advantages of both data-parallel and graph-parallel systems by efficiently expressing graph computation within the Spark data-parallel framework. This book will teach the user to do graphical programming in Apache Spark, apart from an explanation of the entire process of graphical data analysis. You will journey through the creation of graphs, its uses, its exploration and analysis and finally will also cover the conversion of graph elements into graph structures. This book begins with an introduction of the Spark system, its libraries and the Scala Build Tool. Using a hands-on approach, this book will quickly teach you how to install and leverage Spark interactively on the command line and in a standalone Scala program. Then, it presents all the methods for building Spark graphs using illustrative network datasets. Next, it will walk you through the process of exploring, visualizing and analyzing different network characteristics. This book will also teach you how to transform raw datasets into a usable form. In addition, you will learn powerful operations that can be used to transform graph elements and graph structures. Furthermore, this book also teaches how to create custom graph operations that are tailored for specific needs with efficiency in mind. The later chapters of this book cover more advanced topics such as clustering graphs, implementing graph-parallel iterative algorithms and learning methods from graph data.

Preface

Distinctive features

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Getting Started with Spark and GraphX

Downloading and installing Spark 1.4.1

Experimenting with the Spark shell

Getting started with GraphX

Summary

2. Building and Exploring Graphs

Network datasets

Graph builders

Building graphs

Computing the degrees of the network nodes

Summary

3. Graph Analysis and Visualization

Network datasets

The graph visualization

The analysis of network connectedness

The network centrality and PageRank

Scala Build Tool revisited

Summary

4. Transforming and Shaping Up Graphs to Your Needs

Transforming the vertex and edge attributes

Modifying graph structures

Joining graph datasets

Data operations on VertexRDD and EdgeRDD

Summary

5. Creating Custom Graph Aggregation Operators

NCAA College Basketball datasets

The aggregateMessages operator

Joining average stats into a graph

Performance optimization

The MapReduceTriplets operator

Summary

6. Iterative Graph-Parallel Processing with Pregel

The Pregel computational model

The Pregel API in GraphX

Community detection through label propagation

The Pregel implementation of PageRank

Summary

7. Learning Graph Structures

Community clustering in graphs

Applications – music fan community detection

Summary

A. References

Chapter 2, Building and Exploring Graphs

Chapter 3, Graph Analysis and Visualization

Chapter 7, Learning Graph Structures

Index

Apache Spark Graph Processing

By : Rindra Ramamonjison

Apache Spark Graph Processing

By: Rindra Ramamonjison

Overview of this book

Applications – music fan community detection

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access