Book Image

Mastering Predictive Analytics with R

By : Rui Miguel Forte, Rui Miguel Forte
Book Image

Mastering Predictive Analytics with R

By: Rui Miguel Forte, Rui Miguel Forte

Overview of this book

Table of Contents (19 chapters)
Mastering Predictive Analytics with R
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

Modeling the topics of online news stories


To see how topic models perform on real data, we will look at two data sets containing articles originating from BBC News during the period of 2004-2005. The first data set, which we will refer to as the BBC data set, contains 2,225 articles that have been grouped into five topics. These are business, entertainment, politics, sports, and technology.

The second data set, which we will call the BBCSports data set, contains 737 articles only on sports. These are also grouped into five categories according to the type of sport being described. The five sports in question are athletics, cricket, football, rugby, and tennis. Our objective will be to see if we can build topic models for each of these two data sets that will group together articles from the same major topic.

Note

Both BBC data sets were presented in a paper by D. Greene and P. Cunningham, titled Producing Accurate Interpretable Clusters from High-Dimensional Data and published in the proceedings...