Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Finding sentences


Words (tokens) aren't the only structures that we're interested in, however. Another interesting and useful grammatical structure is the sentence. In this recipe, we'll use a process similar to the one we used in the previous recipe, Tokenizing text, in order to create a function that will pull sentences from a string in the same way that tokenize pulled tokens from a string in the last recipe.

Getting ready

We'll need to include clojure-opennlp in our project.clj file:

(defproject com.ericrochester/text-data "0.1.0-SNAPSHOT"
  :dependencies [[org.clojure/clojure "1.6.0"]
                 [clojure-opennlp "0.3.2"]])

We will also need to require it into the current namespace:

(require '[opennlp.nlp :as nlp])

Finally, we'll download a model for a statistical sentence splitter. I downloaded en-sent.bin from http://opennlp.sourceforge.net/models-1.5/. I then saved it into models/en-sent.bin.

How to do it…

As in the Tokenizing text recipe, we will start by loading the sentence identification...