Learning Data Mining with Python

Learning Data Mining with Python - Second Edition

By : Robert Layton

Buy this Book

Learning Data Mining with Python - Second Edition

By: Robert Layton

Buy this Book

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Getting Started with Data Mining

Introducing data mining

Using Python and the Jupyter Notebook

A simple affinity analysis example

Product recommendations

A simple classification example

What is classification?

Summary

Classifying with scikit-learn Estimators

scikit-learn estimators

Preprocessing

Pipelines

Summary

Predicting Sports Winners with Decision Trees

Loading the dataset

Decision trees

Sports outcome prediction

Random forests

Summary

Recommending Movies Using Affinity Analysis

Affinity analysis

Dealing with the movie recommendation problem

Understanding the Apriori algorithm and its implementation

Summary

Features and scikit-learn Transformers

Feature extraction

Feature selection

Feature creation

Principal Component Analysis

Creating your own transformer

Unit testing

Putting it all together

Summary

Social Media Insight using Naive Bayes

Disambiguation

Downloading data from a social network

Text transformers

Naive Bayes

Applying of Naive Bayes

Getting useful features from models

Summary

Follow Recommendations Using Graph Mining

Loading the dataset

Getting follower information from Twitter

Creating a graph

Finding subgraphs

Summary

Beating CAPTCHAs with Neural Networks

Artificial neural networks

Creating the dataset

Training and classifying

Predicting words

Summary

Authorship Attribution

Attributing documents to authors

Getting the data

Using function words

Support Vector Machines

Character n-grams

The Enron dataset

Putting it all together

Evaluation

Summary

Clustering News Articles

Trending topic discovery

Extracting text from arbitrary websites

Grouping news articles

The k-means algorithm

Clustering ensembles

Online learning

Summary

Object Detection in Images using Deep Neural Networks

Object classification

Application scenario

Deep neural networks

An Introduction to TensorFlow

Using Keras

GPU optimization

Application

Summary

Working with Big Data

Big data

MapReduce

Applying MapReduce

Naive Bayes prediction

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

Summary

Next Steps...

Getting Started with Data Mining

Classifying with scikit-learn Estimators

Predicting Sports Winners with Decision Trees

Recommending Movies Using Affinity Analysis

Extracting Features with Transformers

Social Media Insight Using Naive Bayes

Discovering Accounts to Follow Using Graph Mining

Beating CAPTCHAs with Neural Networks

Authorship Attribution

Clustering News Articles

Classifying Objects in Images Using Deep Learning

Working with Big Data

More resources

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Summary

In this chapter, we looked at features and transformers and how they can be used in the data mining pipeline. We discussed what makes a good feature and how to algorithmically choose good features from a standard set. However, creating good features is more art than science and often requires domain knowledge and experience.

We then created our own transformer using an interface that allows us to use it in scikit-learn's helper functions. We will be creating more transformers in later chapters so that we can perform effective testing using existing functions.

To take the lessons learned in this chapter further, I recommend signing up to the online data mining competition website Kaggle.com and trying some of the competitions. Their recommended starting place is the Titanic dataset, which allows you to practice the feature creation aspects of this chapter. Many of the features are not numerical, requiring you to convert them to numerical features before applying a data mining algorithm...

Learning Data Mining with Python - Second Edition

By : Robert Layton

Learning Data Mining with Python - Second Edition

By: Robert Layton

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with Python - Second Edition

Hands-On Recommendation Systems with Python

Building Machine Learning Systems with Python

Hands-On Automated Machine Learning

Summary