Book Image

Apache Spark 2 for Beginners

By : Rajanarayanan Thottuvaikkatumana
Book Image

Apache Spark 2 for Beginners

By: Rajanarayanan Thottuvaikkatumana

Overview of this book

<p>Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.</p> <p>This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.</p> <p>By the end of this book, you will have all the knowledge you need to develop efficient large-scale applications using Apache Spark.</p>
Table of Contents (15 chapters)
Apache Spark 2 for Beginners
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Connected component algorithm


In a graph, finding a subgraph consisting of connected vertices is a very common requirement with tremendous applications. In any graph, two vertices are that connected to each other by paths consisting of one or more edges, and are not connected to any other vertex in the same graph, are called a connected component. For example, in a graph, G, vertex V1 is connected to V2 by an edge and V2 is connected to V3 by another edge. In the same graph, G, vertex V4 is connected to V5 by another edge. In this case V1 and V3 are connected, V4 and V5 are connected and V1 and V5 are not connected. In graph G, there are two connected components. The Spark GraphX library has an implementation of the connected components algorithm.

In a social networking application, if the connections between the users are modeled as a graph, finding whether a given user is connected to another user is achieved by checking whether there is a connected component with these two vertices. In...