Learning Data Mining with Python

Learning Data Mining with Python - Second Edition

By : Robert Layton

Buy this Book

Learning Data Mining with Python - Second Edition

By: Robert Layton

Buy this Book

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Getting Started with Data Mining

Introducing data mining

Using Python and the Jupyter Notebook

A simple affinity analysis example

Product recommendations

A simple classification example

What is classification?

Summary

Classifying with scikit-learn Estimators

scikit-learn estimators

Preprocessing

Pipelines

Summary

Predicting Sports Winners with Decision Trees

Loading the dataset

Decision trees

Sports outcome prediction

Random forests

Summary

Recommending Movies Using Affinity Analysis

Affinity analysis

Dealing with the movie recommendation problem

Understanding the Apriori algorithm and its implementation

Summary

Features and scikit-learn Transformers

Feature extraction

Feature selection

Feature creation

Principal Component Analysis

Creating your own transformer

Unit testing

Putting it all together

Summary

Social Media Insight using Naive Bayes

Disambiguation

Downloading data from a social network

Text transformers

Naive Bayes

Applying of Naive Bayes

Getting useful features from models

Summary

Follow Recommendations Using Graph Mining

Loading the dataset

Getting follower information from Twitter

Creating a graph

Finding subgraphs

Summary

Beating CAPTCHAs with Neural Networks

Artificial neural networks

Creating the dataset

Training and classifying

Predicting words

Summary

Authorship Attribution

Attributing documents to authors

Getting the data

Using function words

Support Vector Machines

Character n-grams

The Enron dataset

Putting it all together

Evaluation

Summary

Clustering News Articles

Trending topic discovery

Extracting text from arbitrary websites

Grouping news articles

The k-means algorithm

Clustering ensembles

Online learning

Summary

Object Detection in Images using Deep Neural Networks

Object classification

Application scenario

Deep neural networks

An Introduction to TensorFlow

Using Keras

GPU optimization

Application

Summary

Working with Big Data

Big data

MapReduce

Applying MapReduce

Naive Bayes prediction

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

Summary

Next Steps...

Getting Started with Data Mining

Classifying with scikit-learn Estimators

Predicting Sports Winners with Decision Trees

Recommending Movies Using Affinity Analysis

Extracting Features with Transformers

Social Media Insight Using Naive Bayes

Discovering Accounts to Follow Using Graph Mining

Beating CAPTCHAs with Neural Networks

Authorship Attribution

Clustering News Articles

Classifying Objects in Images Using Deep Learning

Working with Big Data

More resources

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Classifying with scikit-learn Estimators

A naïve implementation of the nearest neighbor algorithm is quite slow—it checks all pairs of points to find those that are close together. Better implementations exist, with some implemented in scikit-learn.

Scalability with the nearest neighbor

URL: https://github.com/jnothman/scikit-learn/tree/pr2532

For instance, a kd-tree can be created that speeds up the algorithm (and this is already included in scikit-learn).

Another way to speed up this search is to use locality-sensitive hashing, Locality-Sensitive Hashing (LSH). This is a proposed improvement for scikit-learn, and hasn't made it into the package at the time of writing. The preceding link gives a development branch of scikit-learn that will allow you to test out LSH on a dataset. Read through the documentation attached to this branch for details on doing this.

To install it, clone the repository and follow the instructions to install the Bleeding Edge code available at http://scikit-learn.org...

Learning Data Mining with Python - Second Edition

By : Robert Layton

Learning Data Mining with Python - Second Edition

By: Robert Layton

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with Python - Second Edition

Hands-On Recommendation Systems with Python

Building Machine Learning Systems with Python

Hands-On Automated Machine Learning

Classifying with scikit-learn Estimators

Scalability with the nearest neighbor