Book Image

Mastering Clojure Data Analysis

By : Eric Richard Rochester
Book Image

Mastering Clojure Data Analysis

By: Eric Richard Rochester

Overview of this book

Table of Contents (17 chapters)
Mastering Clojure Data Analysis
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding topic modeling


A topic model is a statistical model of the topics in a document. The assumption is that if 10 percent of a document talks about the military and 40 percent of it talks about the economy (and 50 percent talks about other things), then there should be roughly four times as many words about economics as about the military.

An early form of topic modeling was described by Christos Papadimitriou and others in their 1998 paper, Latent Semantic Indexing: A probabilistic analysis (http://www.cs.berkeley.edu/~christos/ir.ps). This was refined by Thomas Hofmann in 1999 with Probabilistic Latent Semantic Indexing (http://www.cs.brown.edu/~th/papers/Hofmann-SIGIR99.pdf).

In 2003, David Blei, Andrew Ng, and Michael I. Jordan published their paper, Latent Dirichlet Allocation (http://jmlr.csail.mit.edu/papers/v3/blei03a.html). Currently, this is the most common type of topic modeling. It's simple, easy to get started, and widely available. Most work in the field since then...