Book Image

Apache Spark 2.x for Java Developers

By : Sourav Gulati, Sumit Kumar
Book Image

Apache Spark 2.x for Java Developers

By: Sourav Gulati, Sumit Kumar

Overview of this book

Apache Spark is the buzzword in the big data industry right now, especially with the increasing need for real-time streaming and data processing. While Spark is built on Scala, the Spark Java API exposes all the Spark features available in the Scala version for Java developers. This book will show you how you can implement various functionalities of the Apache Spark framework in Java, without stepping out of your comfort zone. The book starts with an introduction to the Apache Spark 2.x ecosystem, followed by explaining how to install and configure Spark, and refreshes the Java concepts that will be useful to you when consuming Apache Spark's APIs. You will explore RDD and its associated common Action and Transformation Java APIs, set up a production-like clustered environment, and work with Spark SQL. Moving on, you will perform near-real-time processing with Spark streaming, Machine Learning analytics with Spark MLlib, and graph processing with GraphX, all using various Java packages. By the end of the book, you will have a solid foundation in implementing components in the Spark framework in Java to build fast, real-time applications.
Table of Contents (19 chapters)
Title Page
Credits
Foreword
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Foreword

Sumit Kumar and Sourav Gulati are technology evangelists with deep experience in envisioning and implementing solutions, as well as complex problems dealing with large and high-velocity data. Every time I talk to them about any complex problem statement, they have provided an innovative and scalable solution.

I have over 17 years of experience in the IT industry, specializing in envisioning, architecting and implementing various enterprise solutions revolving around a variety of business domains, such as hospitality, healthcare, risk management, and insurance.

I have known Sumit and Sourav for 5 years as developers/architects who have worked closely with me implementing various complex big data solutions. From their college days, they were inclined toward exploring/implementing distributed systems. As if implementing solutions around big data systems were not enough, they also started sharing their knowledge and experience with the big data community. They have actively contributed to various blogs and tech talks, and in no circumstances do they pass up on any opportunity to help their fellow technologists.

Knowing Sumit and Sourav, I am not surprised that they have started authoring a book on Spark and I am writing foreword for their book - Apache Spark 2.x for Java Developers.

Their passion for technology has again resulted in the terrific book you now have in your hands.

This book is the product of Sumit's and Sourav's deep knowledge and extensive implementation experience in Spark for solving real problems that deal with large, fast and diverse data.

Several books on distributed systems exist, but Sumit's and Sourav's book closes a substantial gap between theory and practice. Their book offers comprehensive, detailed, and innovative techniques for leveraging Spark and its extensions/API for implementing big data solutions. This book is a precious resource for practitioners envisioning big data solutions for enterprises, as well as for undergraduate and graduate students keen to master the Spark and its extensions using its Java API.

This book starts with an introduction to Spark and then covers the overall architecture and concepts such as RDD, transformation, and partitioning. It also discuss in detail various Spark extensions, such as Spark Streaming, MLlib, Spark SQL, and GraphX.

Each chapter is dedicated to a topic and includes an illustrative case study that covers state-of-the-art Java-based tools and software. Each chapter is self-contained, providing great flexibility of usage. The accompanying website provides the source code and data. This is truly a gem for both students and big data architects/developers, who can experiment first-hand the methods just learned, or can deepen their understanding of the methods by applying them to real-world scenarios.

As I was reading the various chapters of the book, I was reminded of the passion and enthusiasm of Sumit and Sourav have for distributed frameworks. They have communicated the concepts described in the book with clarity and with the same passion. I am positive that you, as reader, will feel the same. I will certainly keep this book as a personal resource for the solutions I implement, and strongly recommend it to my fellow architects.

Sumit Gupta

Director of Engineering, Big Data, Sapient Global Markets