Book Image

Mastering Python for Data Science

By : Samir Madhavan
Book Image

Mastering Python for Data Science

By: Samir Madhavan

Overview of this book

Table of Contents (19 chapters)
Mastering Python for Data Science
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
7
Estimating the Likelihood of Events
Index

The chi-square test of independence


The chi-square test of independence is a statistical test used to determine whether two categorical variables are independent of each other or not.

Let's take the following example to see whether there is a preference for a book based on the gender of people reading it:

Flavour

Total

Biography

Suspense

Romance

Gender

280

60

120

100

Men

640

90

200

350

Women

920

150

320

450

 

The Chi-Square test of independence can be performed using the chi2_contingency function in the SciPy package:

>>> men_women = np.array([[100, 120, 60],[350, 200, 90]])
>>> stats.chi2_contingency(men_women)
(28.362103174603167, 6.9382117170577439e-07, 2, array([[ 136.95652174,   97.39130435,   45.65217391],
       [ 313.04347826,  222.60869565,  104.34782609]]))

The first value is the chi-square value:

The second value is the p-value, which is very small, and means that there is an association between the gender of people and the genre of the book they...