Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Chi-squared multiple significance testing


Not all categories are dichotomous (such as male and female, survived and perished). Although we would expect categorical variables to have a finite number of categories, there is no hard upper limit on the number of categories a particular attribute can have.

We could use other categorical variables to separate out the passengers on the Titanic, such as the class in which they were traveling. There were three class levels on the Titanic, and the frequency-table function we constructed at the beginning of this chapter is already able to handle multiple classes.

(defn ex-4-12 []
  (->> (load-data "titanic.tsv")
       (frequency-table :count [:survived :pclass])))

This code generates the following frequency table:

| :pclass | :survived | :count |
|---------+-----------+--------|
|   third |         y |    181 |
|   third |         n |    528 |
|  second |         y |    119 |
|  second |         n |    158 |
|   first |         n |    123 |
|  ...