Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Significance testing proportions


Let's return to the question of whether the measured differences in male or female fatality rates could be due to chance alone. As in Chapter 2, Inference, our z-test is simply the difference in proportions divided by the pooled standard error:

In the preceding formula, p1 denotes the proportion of women who survived, that is, 339/466 = 0.73. And p2 denotes the proportion of men who survived, that is, 161/843 = 0.19.

To calculate the z-statistic, we need to pool our standard errors for the two proportions. Our proportions measure the survival rates of males and females respectively, so the pooled standard error is simply the standard error of the males and females combined, or the total survival rate overall, as follows:

Substituting the values into the equation for the z-statistic:

Using a z-score means we'll use the normal distribution to look up the p-value:

(defn ex-4-11 []
  (let [dataset     (load-data "titanic.tsv")
        proportions (fatalities-by-sex...