Reading data from a relational database is only slightly more complicated than reading from Excel. And much of the extra complication is involved in connecting to the database.
Fortunately, there's a Clojure-contributed package that sits on top of JDBC and makes working with databases much easier. In this example, we'll load a table from an SQLite database (http://www.sqlite.org/).
First, list the dependencies in our Leiningen project.clj
file. We also need to include the database driver library. For this example that's org.xerial/sqlite-jdbc
.
:dependencies [[org.clojure/clojure "1.4.0"] [incanter/incanter-core "1.4.1"] [org.clojure/java.jdbc "0.2.3"] [org.xerial/sqlite-jdbc "3.7.2"]]
Then load the modules into our REPL interpreter or script file:
(use '[clojure.java.jdbc :exclude (resultset-seq)] 'incanter.core)
Finally, get the database connection information. I have my data in a SQLite database file named data/small-sample.sqlite
. You can download this from http://www.ericrochester.com/clj-data-analysis/data/small-sample.sqlite.
Loading the data is not complicated, but we'll make it even easier with a wrapper function.
Create a function that takes a database connection map and a table name and returns a dataset created from that table:
(defn load-table-data "This loads the data from a database table." [db table-name] (let [sql (str "SELECT * FROM " table-name ";")] (with-connection db (with-query-results rs [sql] (to-dataset (doall rs))))))
Next, define a database map with the connection parameters suitable for our database:
(def db {:subprotocol "sqlite" :subname "data/small-sample.sqlite" :classname "org.sqlite.JDBC"})
Finally, call
load-table-data
withdb
and a table name as a symbol or string:user=> (load-table-data db 'people) [:relation :surname :given_name] ["father" "Addams" "Gomez"] ["mother" "Addams" "Morticia"] ["brother" "Addams" "Pugsley"] …
The load-table-data
function sets up a database connection using clojure.java.jdbc/with-connection
. It creates a SQL
query that queries all the fields of the table passed in. It then retrieves the results using clojure.java.jdbc/with-query-results
. Each result row is a sequence of maps of column names to values. This sequence is wrapped in a dataset by incanter.core/to-dataset
.
Connecting to different database systems using JDBC isn't necessarily a difficult task, but it's very dependent on what database we wish to connect to. Oracle has a tutorial for working with JDBC at http://docs.oracle.com/javase/tutorial/jdbc/basics/, and the documentation for the clojure.java.jdbc
library has some good information also (http://clojure.github.com/java.jdbc/). If you're trying to find out what the connection string looks like for a database system, there are lists online. This one, http://www.java2s.com/Tutorial/Java/0340__Database/AListofJDBCDriversconnectionstringdrivername.htm, includes the major drivers.