In this chapter, we will build a system that takes a live feed of news articles and groups them together such that the groups have similar topics. You could run the system multiple times over several weeks (or longer) to see how trends change over that time.
Our system will start with the popular link aggregation website (https://www.reddit.com), which stores lists of links to other websites, as well as a comments section for discussion. Links on reddit are broken into several categories of links, called subreddits. There are subreddits devoted to particular TV shows, funny images, and many other things. What we are interested in are the subreddits for news. We will use the /r/worldnews subreddit in this chapter, but the code should work with any other text-based subreddit.
In this chapter, our goal is to download popular stories and then cluster them to see any major themes or concepts that occur. This will give us an insight into the popular focus, without having...