Book Image

Apache Spark 2 for Beginners

By : Rajanarayanan Thottuvaikkatumana
Book Image

Apache Spark 2 for Beginners

By: Rajanarayanan Thottuvaikkatumana

Overview of this book

<p>Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.</p> <p>This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.</p> <p>By the end of this book, you will have all the knowledge you need to develop efficient large-scale applications using Apache Spark.</p>
Table of Contents (15 chapters)
Apache Spark 2 for Beginners
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Understanding GraphFrames queries


The Spark GraphX library is the RDD-based graph processing library, but GraphFrames is a Spark DataFrame-based graph processing library that is available as an external package. Spark GraphX supports many graph processing algorithms, but GraphFrames supports not only graph processing algorithms, but also graph queries. The major difference between graph processing algorithms and graph queries is that graph processing algorithms are used to process the data hidden in a graph data structure, while graph queries are used to search for patterns in the data hidden in a graph data structure. In GraphFrame parlance, graph queries are also known as motif finding. This has tremendous applications in genetics and other biological sciences that deal with sequence motifs.

From a use case perspective, take the use case of users following each other in a social media application. Users have relationships between them. In the previous sections, these relationships were...