Sumit Kumar and Sourav Gulati are technology evangelists with deep experience in envisioning and implementing solutions, as well as complex problems dealing with large and high-velocity data. Every time I talk to them about any complex problem statement, they have provided an innovative and scalable solution.
I have over 17 years of experience in the IT industry, specializing in envisioning, architecting and implementing various enterprise solutions revolving around a variety of business domains, such as hospitality, healthcare, risk management, and insurance.
I have known Sumit and Sourav for 5 years as developers/architects who have worked closely with me implementing various complex big data solutions. From their college days, they were inclined toward exploring/implementing distributed systems. As if implementing solutions around big data systems were not enough, they also started sharing their knowledge and experience with the big data community. They have actively contributed to various blogs and tech talks, and in no circumstances do they pass up on any opportunity to help their fellow technologists.
Knowing Sumit and Sourav, I am not surprised that they have started authoring a book on Spark and I am writing foreword for their book - Apache Spark 2.x for Java Developers.
Their passion for technology has again resulted in the terrific book you now have in your hands.
This book is the product of Sumit's and Sourav's deep knowledge and extensive implementation experience in Spark for solving real problems that deal with large, fast and diverse data.
Several books on distributed systems exist, but Sumit's and Sourav's book closes a substantial gap between theory and practice. Their book offers comprehensive, detailed, and innovative techniques for leveraging Spark and its extensions/API for implementing big data solutions. This book is a precious resource for practitioners envisioning big data solutions for enterprises, as well as for undergraduate and graduate students keen to master the Spark and its extensions using its Java API.
This book starts with an introduction to Spark and then covers the overall architecture and concepts such as RDD, transformation, and partitioning. It also discuss in detail various Spark extensions, such as Spark Streaming, MLlib, Spark SQL, and GraphX.
Each chapter is dedicated to a topic and includes an illustrative case study that covers state-of-the-art Java-based tools and software. Each chapter is self-contained, providing great flexibility of usage. The accompanying website provides the source code and data. This is truly a gem for both students and big data architects/developers, who can experiment first-hand the methods just learned, or can deepen their understanding of the methods by applying them to real-world scenarios.
As I was reading the various chapters of the book, I was reminded of the passion and enthusiasm of Sumit and Sourav have for distributed frameworks. They have communicated the concepts described in the book with clarity and with the same passion. I am positive that you, as reader, will feel the same. I will certainly keep this book as a personal resource for the solutions I implement, and strongly recommend it to my fellow architects.
Sumit Gupta
Director of Engineering, Big Data, Sapient Global Markets