Book Image

Python: Real-World Data Science

By : Fabrizio Romano, Dusty Phillips, Phuong Vo.T.H, Martin Czygan, Robert Layton, Sebastian Raschka
Book Image

Python: Real-World Data Science

By: Fabrizio Romano, Dusty Phillips, Phuong Vo.T.H, Martin Czygan, Robert Layton, Sebastian Raschka

Overview of this book

The Python: Real-World Data Science course will take you on a journey to become an efficient data science practitioner by thoroughly understanding the key concepts of Python. This learning path is divided into four modules and each module are a mini course in their own right, and as you complete each one, you’ll have gained key skills and be ready for the material in the next module. The course begins with getting your Python fundamentals nailed down. After getting familiar with Python core concepts, it’s time that you dive into the field of data science. In the second module, you'll learn how to perform data analysis using Python in a practical and example-driven way. The third module will teach you how to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis to more complex data types including text, images, and graphs. Machine learning and predictive analytics have become the most important approaches to uncover data gold mines. In the final module, we'll discuss the necessary details regarding machine learning concepts, offering intuitive yet informative explanations on how machine learning algorithms work, how to use them, and most importantly, how to avoid the common pitfalls.
Table of Contents (12 chapters)
Free Chapter
1
Table of Contents
2
Python: Real-World Data Science
3
Meet Your Course Guide
4
What's so cool about Data Science?
5
Course Structure
6
Course Journey
7
The Course Roadmap and Timeline
12
Index

Chapter 10. Clustering News Articles

In most of the previous chapters, we performed data mining knowing what we were looking for. Our use of target classes allowed us to learn how our variables model those targets during the training phase. This type of learning, where we have targets to train against, is called supervised learning. In this chapter, we consider what we do without those targets. This is unsupervised learning and is much more of an exploratory task. Rather than wanting to classify with our model, the goal in unsupervised learning is more about exploring the data to find insights.

In this chapter, we look at clustering news articles to find trends and patterns in the data. We look at how we can extract data from different websites using a link aggregation website to show a variety of news stories.

The key concepts covered in this chapter include:

  • Obtaining text from arbitrary websites
  • Using the reddit API to collect interesting news stories
  • Cluster analysis for unsupervised...