Learning Data Mining with Python

Learning Data Mining with Python - Second Edition

By : Robert Layton

Buy this Book

Learning Data Mining with Python - Second Edition

By: Robert Layton

Buy this Book

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.

Title Page

Credits

About the Author

About the Reviewer

www.PacktPub.com

Customer Feedback

Preface

Free Chapter

Getting Started with Data Mining

Introducing data mining

Using Python and the Jupyter Notebook

A simple affinity analysis example

Product recommendations

A simple classification example

What is classification?

Summary

Classifying with scikit-learn Estimators

scikit-learn estimators

Preprocessing

Pipelines

Summary

Predicting Sports Winners with Decision Trees

Loading the dataset

Decision trees

Sports outcome prediction

Random forests

Summary

Recommending Movies Using Affinity Analysis

Affinity analysis

Dealing with the movie recommendation problem

Understanding the Apriori algorithm and its implementation

Summary

Features and scikit-learn Transformers

Feature extraction

Feature selection

Feature creation

Principal Component Analysis

Creating your own transformer

Unit testing

Putting it all together

Summary

Social Media Insight using Naive Bayes

Disambiguation

Downloading data from a social network

Text transformers

Naive Bayes

Applying of Naive Bayes

Getting useful features from models

Summary

Follow Recommendations Using Graph Mining

Loading the dataset

Getting follower information from Twitter

Creating a graph

Finding subgraphs

Summary

Beating CAPTCHAs with Neural Networks

Artificial neural networks

Creating the dataset

Training and classifying

Predicting words

Summary

Authorship Attribution

Attributing documents to authors

Getting the data

Using function words

Support Vector Machines

Character n-grams

The Enron dataset

Putting it all together

Evaluation

Summary

Clustering News Articles

Trending topic discovery

Extracting text from arbitrary websites

Grouping news articles

The k-means algorithm

Clustering ensembles

Online learning

Summary

Object Detection in Images using Deep Neural Networks

Object classification

Application scenario

Deep neural networks

An Introduction to TensorFlow

Using Keras

GPU optimization

Application

Summary

Working with Big Data

Big data

MapReduce

Applying MapReduce

Naive Bayes prediction

Extracting the blog posts

Training Naive Bayes

Putting it all together

Training on Amazon's EMR infrastructure

Summary

Next Steps...

Getting Started with Data Mining

Classifying with scikit-learn Estimators

Predicting Sports Winners with Decision Trees

Recommending Movies Using Affinity Analysis

Extracting Features with Transformers

Social Media Insight Using Naive Bayes

Discovering Accounts to Follow Using Graph Mining

Beating CAPTCHAs with Neural Networks

Authorship Attribution

Clustering News Articles

Classifying Objects in Images Using Deep Learning

Working with Big Data

More resources

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Pipelines

As experiments grow, so does the complexity of the operations. We may split up our dataset, binarize features, perform feature-based scaling, perform sample-based scaling, and many more operations.

Keeping track of these operations can get quite confusing and can result in being unable to replicate the result. Problems include forgetting a step, incorrectly applying a transformation, or adding a transformation that wasn't needed.

Another issue is the order of the code. In the previous section, we created our X_transformed dataset and then created a new estimator for the cross validation.If we had multiple steps, we would need to track these changes to the dataset in code.

Pipelines are a construct that addresses these problems (and others, which we will see in the next chapter). Pipelines store the steps in your data mining workflow. They can take your raw data in, perform all the necessary transformations, and then create a prediction. This allows us to use pipelines in functions...

Learning Data Mining with Python - Second Edition

By : Robert Layton

Learning Data Mining with Python - Second Edition

By: Robert Layton

Overview of this book

Related Content you might be interested in

Current Title:

Learning Data Mining with Python - Second Edition

Hands-On Recommendation Systems with Python

Building Machine Learning Systems with Python

Hands-On Automated Machine Learning

Pipelines