In the previous chapter, we grouped text documents using clustering. This is a very useful tool, but it is not always the best. Clustering results in each text belonging to exactly one cluster. This book is about machine learning and Python. Should it be grouped with other Python-related works or with machine-related works? In a physical bookstore, we will need a single place to stock the book. In an Internet store, however, the answer is this book is about both machine learning and Python and the book should be listed in both the sections in an online bookstore. This does not mean that the book will be listed in all the sections, of course. We will not list this book with other baking books.
In this chapter, we will learn methods that do not cluster documents into completely separate groups but allow each document to refer to several topics. These topics will be identified automatically from a collection of text documents. These documents may be whole books or shorter...