-
Book Overview & Buying
-
Table Of Contents
The Unsupervised Learning Workshop
By :
In 2003, David Blei, Andrew Ng, and Michael Jordan published their article on the topic modeling algorithm known as Latent Dirichlet Allocation (LDA). LDA is a generative probabilistic model. This means that the modeling process starts with the text and works backward through the process that is assumed to have generated it in order to identify the parameters of interest. In this case, it is the topics that generated the data that are of interest. The process discussed here is the most basic form of LDA, but for learning, it is also the most comprehensible.
There are M documents available for topic modeling within the corpus. Each document can be considered as the sequence of N words, i.e., a sequence (w1,w2… wN).
For each document in the corpus, the assumed generative process is:
is the distribution of topics.