Book Image

Clojure for Data Science

By : Henry Garner
Book Image

Clojure for Data Science

By: Henry Garner

Overview of this book

Table of Contents (18 chapters)
Clojure for Data Science
Credits
About the Author
Acknowledgments
About the Reviewer
www.PacktPub.com
Preface
Index

Inspecting the data


We encountered categorical variables in the previous chapter as the dichotomous variable "sex" in the athlete dataset. That dataset also contained many other categorical variables including "sport", "event", and "country".

Let's take a look at the Titanic dataset (using the clojure.java.io library to access the file resource and the incanter.io library to read it in):

(defn load-data [file]
  (-> (io/resource file)
      (str)
      (iio/read-dataset :delim \tab :header true)))

(defn ex-4-1 []
  (i/view (load-data :titanic)))

The preceding code generates the following table:

The Titanic dataset includes categorical variables too. For example—:sex, :pclass (the passenger class), and :embarked (a letter signifying the port of boarding). These are all string values, taking categories such as female, first, and C, but classes don't always have to be string values. Columns such as :ticket, :boat, and :body can be thought of as containing categorical variables too. Despite...