Book Image

Clojure Data Analysis Cookbook

By : Eric Rochester
Book Image

Clojure Data Analysis Cookbook

By: Eric Rochester

Overview of this book

<p>Data is everywhere and it's increasingly important to be able to gain insights that we can act on. Using Clojure for data analysis and collection, this book will show you how to gain fresh insights and perspectives from your data with an essential collection of practical, structured recipes.<br /><br />"The Clojure Data Analysis Cookbook" presents recipes for every stage of the data analysis process. Whether scraping data off a web page, performing data mining, or creating graphs for the web, this book has something for the task at hand.<br /><br />You'll learn how to acquire data, clean it up, and transform it into useful graphs which can then be analyzed and published to the Internet. Coverage includes advanced topics like processing data concurrently, applying powerful statistical techniques like Bayesian modelling, and even data mining algorithms such as K-means clustering, neural networks, and association rules.</p>
Table of Contents (18 chapters)
Clojure Data Analysis Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Reading data from JDBC databases


Reading data from a relational database is only slightly more complicated than reading from Excel. And much of the extra complication is involved in connecting to the database.

Fortunately, there's a Clojure-contributed package that sits on top of JDBC and makes working with databases much easier. In this example, we'll load a table from an SQLite database (http://www.sqlite.org/).

Getting ready

First, list the dependencies in our Leiningen project.clj file. We also need to include the database driver library. For this example that's org.xerial/sqlite-jdbc.

  :dependencies [[org.clojure/clojure "1.4.0"]
                 [incanter/incanter-core "1.4.1"]
                 [org.clojure/java.jdbc "0.2.3"]
                 [org.xerial/sqlite-jdbc "3.7.2"]]

Then load the modules into our REPL interpreter or script file:

(use '[clojure.java.jdbc :exclude (resultset-seq)]
     'incanter.core)

Finally, get the database connection information. I have my data in a SQLite database file named data/small-sample.sqlite. You can download this from http://www.ericrochester.com/clj-data-analysis/data/small-sample.sqlite.

How to do it…

Loading the data is not complicated, but we'll make it even easier with a wrapper function.

  1. Create a function that takes a database connection map and a table name and returns a dataset created from that table:

    (defn load-table-data
      "This loads the data from a database table."
      [db table-name]
      (let [sql (str "SELECT * FROM "
                     table-name ";")]
        (with-connection db
          (with-query-results rs [sql]
            (to-dataset (doall rs))))))
  2. Next, define a database map with the connection parameters suitable for our database:

    (def db {:subprotocol "sqlite"
             :subname "data/small-sample.sqlite"
             :classname "org.sqlite.JDBC"})
  3. Finally, call load-table-data with db and a table name as a symbol or string:

    user=> (load-table-data db 'people)
    [:relation :surname :given_name]
    ["father" "Addams" "Gomez"]
    ["mother" "Addams" "Morticia"]
    ["brother" "Addams" "Pugsley"]
    …

How it works…

The load-table-data function sets up a database connection using clojure.java.jdbc/with-connection. It creates a SQL query that queries all the fields of the table passed in. It then retrieves the results using clojure.java.jdbc/with-query-results. Each result row is a sequence of maps of column names to values. This sequence is wrapped in a dataset by incanter.core/to-dataset.

See also

Connecting to different database systems using JDBC isn't necessarily a difficult task, but it's very dependent on what database we wish to connect to. Oracle has a tutorial for working with JDBC at http://docs.oracle.com/javase/tutorial/jdbc/basics/, and the documentation for the clojure.java.jdbc library has some good information also (http://clojure.github.com/java.jdbc/). If you're trying to find out what the connection string looks like for a database system, there are lists online. This one, http://www.java2s.com/Tutorial/Java/0340__Database/AListofJDBCDriversconnectionstringdrivername.htm, includes the major drivers.