Sign In Start Free Trial

Book Overview & Buying
Table Of Contents

Learning Data Mining with Python

By : Robert Layton

3.7 (7)

Learning Data Mining with Python

3.7 (7)

By: Robert Layton

Overview of this book

If you are a programmer who wants to get started with data mining, then this book is for you.

Preface

Preface

What this book covers

What you need for this book

Who this book is for

Conventions

Reader feedback

Customer support

Free Chapter

1. Getting Started with Data Mining

1. Getting Started with Data Mining

Introducing data mining

Using Python and the IPython Notebook

A simple affinity analysis example

A simple classification example

What is classification?

Summary

2. Classifying with scikit-learn Estimators

2. Classifying with scikit-learn Estimators

scikit-learn estimators

Preprocessing using pipelines

Pipelines

Summary

3. Predicting Sports Winners with Decision Trees

3. Predicting Sports Winners with Decision Trees

Loading the dataset

Decision trees

Sports outcome prediction

Random forests

Summary

4. Recommending Movies Using Affinity Analysis

4. Recommending Movies Using Affinity Analysis

Affinity analysis

The movie recommendation problem

The Apriori implementation

Extracting association rules

Summary

5. Extracting Features with Transformers

5. Extracting Features with Transformers

Feature extraction

Feature selection

Feature creation

Creating your own transformer

Summary

6. Social Media Insight Using Naive Bayes

6. Social Media Insight Using Naive Bayes

Disambiguation

Text transformers

Naive Bayes

Application

Summary

7. Discovering Accounts to Follow Using Graph Mining

7. Discovering Accounts to Follow Using Graph Mining

Loading the dataset

Finding subgraphs

Summary

8. Beating CAPTCHAs with Neural Networks

8. Beating CAPTCHAs with Neural Networks

Artificial neural networks

Creating the dataset

Training and classifying

Improving accuracy using a dictionary

Summary

9. Authorship Attribution

9. Authorship Attribution

Attributing documents to authors

Function words

Support vector machines

Character n-grams

Using the Enron dataset

Summary

10. Clustering News Articles

10. Clustering News Articles

Obtaining news articles

Extracting text from arbitrary websites

Grouping news articles

Clustering ensembles

Online learning

Summary

11. Classifying Objects in Images Using Deep Learning

11. Classifying Objects in Images Using Deep Learning

Object classification

Application scenario and goals

Deep neural networks

GPU optimization

Setting up the environment

Application

Summary

12. Working with Big Data

12. Working with Big Data

Big data

Application scenario and goals

MapReduce

Application

Summary

A. Next Steps…

A. Next Steps…

Chapter 1 – Getting Started with Data Mining

Chapter 2 – Classifying with scikit-learn Estimators

Chapter 3: Predicting Sports Winners with Decision Trees

Chapter 4 – Recommending Movies Using Affinity Analysis

Chapter 5 – Extracting Features with Transformers

Chapter 6 – Social Media Insight Using Naive Bayes

Chapter 7 – Discovering Accounts to Follow Using Graph Mining

Chapter 8 – Beating CAPTCHAs with Neural Networks

Chapter 9 – Authorship Attribution

Chapter 10 – Clustering News Articles

Chapter 11: Classifying Objects in Images Using Deep Learning

Chapter 12 – Working with Big Data

More resources

Index

Index

Chapter 1. Getting Started with Data Mining

We are collecting information at a scale that has never been seen before in the history of mankind and placing more day-to-day importance on the use of this information in everyday life. We expect our computers to translate Web pages into other languages, predict the weather, suggest books we would like, and diagnose our health issues. These expectations will grow, both in the number of applications and also in the efficacy we expect. Data mining is a methodology that we can employ to train computers to make decisions with data and forms the backbone of many high-tech systems of today.

The Python language is fast growing in popularity, for a good reason. It gives the programmer a lot of flexibility; it has a large number of modules to perform different tasks; and Python code is usually more readable and concise than in any other languages. There is a large and an active community of researchers, practitioners, and beginners using Python for data mining.

In this chapter, we will introduce data mining with Python. We will cover the following topics:

What is data mining and where can it be used?
Setting up a Python-based environment to perform data mining
An example of affinity analysis, recommending products based on purchasing habits
An example of (a classic) classification problem, predicting the plant species based on its measurement

CONTINUE READING

83

Tech Concepts

36

Programming languages

73

Tech Tools

Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.

Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.

50+ new titles added per month and exclusive early access to books as they are being written.

Learning Data Mining with Python

Search

Your notes and bookmarks