Book Image

Spark for Data Science

By : Srinivas Duvvuri, Bikramaditya Singhal
Book Image

Spark for Data Science

By: Srinivas Duvvuri, Bikramaditya Singhal

Overview of this book

This is the era of Big Data. The words ‘Big Data’ implies big innovation and enables a competitive advantage for businesses. Apache Spark was designed to perform Big Data analytics at scale, and so Spark is equipped with the necessary algorithms and supports multiple programming languages. Whether you are a technologist, a data scientist, or a beginner to Big Data analytics, this book will provide you with all the skills necessary to perform statistical data analysis, data visualization, predictive modeling, and build scalable data products or solutions using Python, Scala, and R. With ample case studies and real-world examples, Spark for Data Science will help you ensure the successful execution of your data science projects.
Table of Contents (18 chapters)
Spark for Data Science
Credits
Foreword
About the Authors
About the Reviewers
www.PacktPub.com
Preface

Developing the hypothesis


A hypothesis is your best guess about what the outcome will be. You form your initial hypothesis based on the question, conversations with stakeholders, and also by looking at the data. You may form one or more hypotheses for a given problem. This initial hypothesis serves as a roadmap that guides you through the exploratory analysis. Developing a hypothesis is very important to statistically approve or not approve a statement, and not just by looking at the data as a data matrix or even through visuals. This is because our perception built by just looking at the data may be incorrect and rather deceptive at times.

Now you know that your final result may or may not prove the hypothesis to be correct. Coming to the case study we have considered for this lesson, we arrive at the following initial hypotheses:

  • Award winners are mostly white

  • Most of the award winners are from the USA

  • Best actors and actresses tend to be younger than best directors

Now that we have formalized...