Book Image

Apache Spark 2 for Beginners

By : Rajanarayanan Thottuvaikkatumana
Book Image

Apache Spark 2 for Beginners

By: Rajanarayanan Thottuvaikkatumana

Overview of this book

<p>Spark is one of the most widely-used large-scale data processing engines and runs extremely fast. It is a framework that has tools that are equally useful for application developers as well as data scientists.</p> <p>This book starts with the fundamentals of Spark 2 and covers the core data processing framework and API, installation, and application development setup. Then the Spark programming model is introduced through real-world examples followed by Spark SQL programming with DataFrames. An introduction to SparkR is covered next. Later, we cover the charting and plotting features of Python in conjunction with Spark data processing. After that, we take a look at Spark's stream processing, machine learning, and graph processing libraries. The last chapter combines all the skills you learned from the preceding chapters to develop a real-world Spark application.</p> <p>By the end of this book, you will have all the knowledge you need to develop efficient large-scale applications using Apache Spark.</p>
Table of Contents (15 chapters)
Apache Spark 2 for Beginners
Credits
About the Author
About the Reviewer
www.PacktPub.com
Preface

Summary


Spark provides a very powerful core data processing framework and the Spark machine learning library makes use of all the core features of Spark and Spark libraries such as Spark SQL, in addition to its rich set of machine learning algorithms. This chapter covered some of the very common prediction use cases and classification use cases with Scala and Python implementations using the Spark machine learning library with a few lines of code. These wine quality prediction, wine classification, spam filter, and synonym finder machine learning use cases have great potential to be developed into full-blown real-world use cases. Spark 2.0 brings flexibility to model creation, pipeline creation, and their usage in different programs written in a different languages by enabling the model and pipeline persistence.

Pair-wise relationships are very common in real-world use cases. Backed by a strong mathematical theoretical base, computer scientists have developed many data structures and the...