In this chapter, we will cover the following recipes:
Tokenizing text
Finding sentences
Focusing on content words with stoplists
Getting document frequencies
Scaling document frequencies by document size
Scaling document frequencies with TF-IDF
Finding people, places, and things with Named Entity Recognition
Mapping documents to a sparse vector space representation
Performing topic modeling with MALLET
Performing naïve Bayesian classification with MALLET