Book Image

Spark for Data Science

By : Srinivas Duvvuri, Bikramaditya Singhal
Book Image

Spark for Data Science

By: Srinivas Duvvuri, Bikramaditya Singhal

Overview of this book

This is the era of Big Data. The words ‘Big Data’ implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages. Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R. With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects.
Table of Contents (18 chapters)
Spark for Data Science
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface

The big data trends


Big data processing has been an integral part of the IT industry, more so in the past decade. Apache Hadoop and other similar endeavors are focused on building the infrastructure to store and process massive amounts of data. After being around for over 10 years, the Hadoop platform is considered mature and almost synonymous with big data processing. Apache Spark, a general computing engine that works well with is and not limited to the Hadoop ecosystem, was quite successful in the year 2015.

Building data science applications requires knowledge of the big data landscape and what software products are available out of that box. We need to carefully map the right blocks that fit our requirements. There are several options with overlapping functionality, and picking the right tools is easier said than done. The success of the application very much depends on assembling the right mix of technologies and processes. The good news is that there are several open source options...