The links that we get from reddit go to arbitrary websites run by many different organizations. To make it harder, those pages were designed to be read by a human, not a computer program. This can cause a problem when trying to get the actual content/story of those results, as modern websites have a lot going on in the background. JavaScript libraries are called, style sheets are applied, advertisements are loaded using AJAX, extra content is added to sidebars, and various other things are done to make the modern webpage a complex document. These features make the modern Web what it is, but make it difficult to automatically get good information from!
Learning Data Mining with Python
Learning Data Mining with Python
Overview of this book
Table of Contents (20 chapters)
Learning Data Mining with Python
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Getting Started with Data Mining
Classifying with scikit-learn Estimators
Predicting Sports Winners with Decision Trees
Recommending Movies Using Affinity Analysis
Extracting Features with Transformers
Social Media Insight Using Naive Bayes
Discovering Accounts to Follow Using Graph Mining
Beating CAPTCHAs with Neural Networks
Authorship Attribution
Clustering News Articles
Classifying Objects in Images Using Deep Learning
Working with Big Data
Next Steps…
Index
Customer Reviews