Learning Data Mining with Python

Learning Data Mining with Python - Second Edition

By : Robert Layton

Buy this Book

Learning Data Mining with Python - Second Edition

By: Robert Layton

Buy this Book

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Getting Started with Data Mining

Introducing data mining

Using Python and the Jupyter Notebook

A simple affinity analysis example

Product recommendations

A simple classification example

What is classification?

Summary

Classifying with scikit-learn Estimators

scikit-learn estimators

Preprocessing

Pipelines

Summary

Predicting Sports Winners with Decision Trees

Loading the dataset

Decision trees

Sports outcome prediction

Random forests

Summary

Recommending Movies Using Affinity Analysis

Affinity analysis

Dealing with the movie recommendation problem

Understanding the Apriori algorithm and its implementation

Summary

Features and scikit-learn Transformers

Feature extraction

Feature selection

Feature creation

Principal Component Analysis

Creating your own transformer

Unit testing

Putting it all together

Summary

Social Media Insight using Naive Bayes

Disambiguation

Downloading data from a social network

Text transformers

Naive Bayes

Applying of Naive Bayes

Getting useful features from models

Summary

Follow Recommendations Using Graph Mining

Loading the dataset

Getting follower information from Twitter

Creating a graph

Finding subgraphs

Summary

Beating CAPTCHAs with Neural Networks

Artificial neural networks

Creating the dataset

Training and classifying

Predicting words

Summary

Authorship Attribution

Attributing documents to authors

Getting the data

Using function words

Support Vector Machines

Character n-grams

The Enron dataset

Putting it all together

Evaluation

Summary

Clustering News Articles

Trending topic discovery

Extracting text from arbitrary websites

Grouping news articles

The k-means algorithm

Clustering ensembles

Online learning

Summary

Object Detection in Images using Deep Neural Networks

Object classification

Application scenario

Deep neural networks

An Introduction to TensorFlow

Using Keras

GPU optimization

Application

Summary

Working with Big Data

Big data

MapReduce

Applying MapReduce

Naive Bayes prediction

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

Summary

Next Steps...

Getting Started with Data Mining

Classifying with scikit-learn Estimators

Predicting Sports Winners with Decision Trees

Recommending Movies Using Affinity Analysis

Extracting Features with Transformers

Social Media Insight Using Naive Bayes

Discovering Accounts to Follow Using Graph Mining

Beating CAPTCHAs with Neural Networks

Authorship Attribution

Clustering News Articles

Classifying Objects in Images Using Deep Learning

Working with Big Data

More resources

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Summary

In this chapter, we looked at graphs from social networks and how to do cluster analysis on them. We also looked at saving and loading models from scikit-learn by using the classification model we created in Chapter 6

,

Social Media Insight Using Naive Bayes

.

We created a graph of friends from the social network Twitter. We then examined how similar two users were, based on their friends. Users with more friends in common were considered more similar, although we normalize this by considering the overall number of friends they have. This is a commonly used way to infer knowledge (such as age or general topic of discussion) based on similar users. We can use this logic for recommending users to others—if they follow user X and user Y is similar to user X, they will probably like user Y. This is, in many ways, similar to our transaction-led similarity of previous chapters.

The aim of this analysis was to recommend users, and our use of cluster analysis allowed us to find clusters of similar...

Learning Data Mining with Python - Second Edition

By : Robert Layton

Learning Data Mining with Python - Second Edition

By: Robert Layton

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with Python - Second Edition

Hands-On Recommendation Systems with Python

Building Machine Learning Systems with Python

Hands-On Automated Machine Learning

Summary