Book Image

Clojure Data Analysis Cookbook - Second Edition

By : Eric Richard Rochester
Book Image

Clojure Data Analysis Cookbook - Second Edition

By: Eric Richard Rochester

Overview of this book

Table of Contents (19 chapters)
Clojure Data Analysis Cookbook Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 10. Working with Unstructured and Textual Data

In this chapter, we will cover the following recipes:

  • Tokenizing text

  • Finding sentences

  • Focusing on content words with stoplists

  • Getting document frequencies

  • Scaling document frequencies by document size

  • Scaling document frequencies with TF-IDF

  • Finding people, places, and things with Named Entity Recognition

  • Mapping documents to a sparse vector space representation

  • Performing topic modeling with MALLET

  • Performing naïve Bayesian classification with MALLET