The chi-square test of independence is a statistical test used to determine whether two categorical variables are independent of each other or not.
Let's take the following example to see whether there is a preference for a book based on the gender of people reading it:
Flavour | ||||
---|---|---|---|---|
Total |
Biography |
Suspense |
Romance |
Gender |
280 |
60 |
120 |
100 |
Men |
640 |
90 |
200 |
350 |
Women |
920 |
150 |
320 |
450 |
The Chi-Square test of independence can be performed using the chi2_contingency
function in the SciPy package:
>>> men_women = np.array([[100, 120, 60],[350, 200, 90]]) >>> stats.chi2_contingency(men_women) (28.362103174603167, 6.9382117170577439e-07, 2, array([[ 136.95652174, 97.39130435, 45.65217391], [ 313.04347826, 222.60869565, 104.34782609]]))
The first value is the chi-square value:
The second value is the p-value, which is very small, and means that there is an association between the gender of people and the genre of the book they...