In most of the earlier chapters, we performed data mining knowing what we were looking for. Our use of target classes allowed us to learn how our features model those targets during the training phase, which lets the algorithm set internal parameters to maximize its learning. This type of learning, where we have targets to train against, is called supervised learning. In this chapter, we'll consider what we do without those targets. This is unsupervised learning and it's much more of an exploratory task. Rather than wanting to classify with our model, the goal in unsupervised learning is to explore the data to find insights.
In this chapter, we will look at clustering news articles to find trends and patterns in the data. We'll look at how we can extract data from different websites using a link aggregation website to show a variety of news stories.
The key concepts covered in this chapter include:
- Using the reddit API to collect interesting news stories...