Book Image

Spark Cookbook

By : Rishi Yadav
Book Image

Spark Cookbook

By: Rishi Yadav

Overview of this book

Table of Contents (19 chapters)
Spark Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Doing hypothesis testing


Hypothesis testing is a way of determining probability that a given hypothesis is true. Let's say a sample data suggests that females tend to vote more for the Democratic Party. This may or may not be true for the larger population. What if this pattern is there in the sample data just by chance?

Another way to look at the goal of hypothesis testing is to answer this question: If a sample has a pattern in it, what are the chances of the pattern being there just by chance?

How do we do it? There is a saying that the best way to prove something is to try to disprove it.

The hypothesis to disprove is called null hypothesis. Hypothesis testing works with categorical data. Let's look at the example of a gallop poll of party affiliations.

Party

Male

Female

Democratic Party

32

41

Republican Party

28

25

Independent

34

26

How to do it…

  1. Start the Spark shell:

    $ spark-shell
    
  2. Import the relevant classes:

    scala> import org.apache.spark.mllib.stat.Statistics
    scala&gt...