Book Image

Mastering Apache Spark 2.x - Second Edition

Book Image

Mastering Apache Spark 2.x - Second Edition

Overview of this book

Apache Spark is an in-memory, cluster-based Big Data processing system that provides a wide range of functionalities such as graph processing, machine learning, stream processing, and more. This book will take your knowledge of Apache Spark to the next level by teaching you how to expand Spark’s functionality and build your data flows and machine/deep learning programs on top of the platform. The book starts with a quick overview of the Apache Spark ecosystem, and introduces you to the new features and capabilities in Apache Spark 2.x. You will then work with the different modules in Apache Spark such as interactive querying with Spark SQL, using DataFrames and DataSets effectively, streaming analytics with Spark Streaming, and performing machine learning and deep learning on Spark using MLlib and external tools such as H20 and Deeplearning4j. The book also contains chapters on efficient graph processing, memory management and using Apache Spark on the cloud. By the end of this book, you will have all the necessary information to master Apache Spark, and use it efficiently for Big Data processing and analytics.
Table of Contents (21 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface
10
Deep Learning on Apache Spark with DeepLearning4j and H2O

About the Reviewer

Md. Rezaul Karim is a research scientist at Fraunhofer Institute for Applied Information Technology FIT, Germany. He is also a PhD candidate at the RWTH Aachen University, Aachen, Germany. He holds a BSc and an MSc degree in computer science. Before joining the Fraunhofer-FIT, he worked as a researcher at Insight Centre for Data Analytics, Ireland. Prior to that, he worked as a lead engineer with Samsung Electronics' distributed R&D Institutes in Korea, India, Vietnam, Turkey, and Bangladesh. Previously, he worked as a research assistant in the Database Lab at Kyung Hee University, Korea. He also worked as an R&D engineer with BMTech21 Worldwide, Korea. Prior to that, he worked as a software engineer with i2SoftTechnology, Dhaka, Bangladesh.

He has more than 8 years' experience in the area of Research and Development with a solid knowledge of algorithms and data structures in C/C++, Java, Scala, R, and Python focusing on big data technologies (such as Spark, Kafka, DC/OS, Docker, Mesos, Zeppelin, Hadoop, and MapReduce) and Deep Learning technologies such as TensorFlow, DeepLearning4j, and H2O-Sparking Water. His research interests include machine learning, deep learning, semantic web/linked data, big data, and bioinformatics. He is the author of the following books with Packt Publishing:

  • Large-Scale Machine Learning with Spark
  • Deep Learning with TensorFlow
  • Scala and Spark for Big Data Analytics

I am very grateful to my parents, who have always encouraged me to pursue knowledge. I also want to thank my wife, Saroar, son, Shadman, elder brother, Mamtaz, elder sister, Josna, and friends, who have always been encouraging and have listened to me.