Mastering Data Mining with Python - Find patterns hidden in your data

Mastering Data Mining with Python - Find patterns hidden in your data

By : Megan Squire

Buy this Book

Mastering Data Mining with Python - Find patterns hidden in your data

By: Megan Squire

Buy this Book

Overview of this book

Data mining is an integral part of the data science pipeline. It is the foundation of any successful data-driven strategy – without it, you'll never be able to uncover truly transformative insights. Since data is vital to just about every modern organization, it is worth taking the next step to unlock even greater value and more meaningful understanding. If you already know the fundamentals of data mining with Python, you are now ready to experiment with more interesting, advanced data analytics techniques using Python's easy-to-use interface and extensive range of libraries. In this book, you'll go deeper into many often overlooked areas of data mining, including association rule mining, entity matching, network mining, sentiment analysis, named entity recognition, text summarization, topic modeling, and anomaly detection. For each data mining technique, we'll review the state-of-the-art and current best practices before comparing a wide variety of strategies for solving each problem. We will then implement example solutions using real-world data from the domain of software engineering, and we will spend time learning how to understand and interpret the results we get. By the end of this book, you will have solid experience implementing some of the most interesting and relevant data mining techniques available today, and you will have achieved a greater fluency in the important field of Python data analytics.

Mastering Data Mining with Python – Find patterns hidden in your data

Credits

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Expanding Your Data Mining Toolbox

What is data mining?

How do we do data mining?

What are the techniques used in data mining?

How do we set up our data mining work environment?

Summary

Association Rule Mining

What are frequent itemsets?

Towards association rules

A project – discovering association rules in software project tags

Summary

Entity Matching

What is entity matching?

Entity matching project

Summary

Network Analysis

What is a network?

Measuring a network

Representing graph data

A real project

Summary

Sentiment Analysis in Text

What is sentiment analysis?

The basics of sentiment analysis

Sentiment analysis algorithms

Sentiment mining application

Summary

Named Entity Recognition in Text

Why look for named entities?

Techniques for named entity recognition

Building and evaluating NER systems

Named entity recognition project

Summary

Automatic Text Summarization

What is automatic text summarization?

Tools for text summarization

Summary

Topic Modeling in Text

What is topic modeling?

Latent Dirichlet Allocation

Gensim for topic modeling

Gensim LDA for a larger project

Summary

Mining for Data Anomalies

What are data anomalies?

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Building and evaluating NER systems

Based on our discussion so far in this chapter, we know that building an NER system will start with the following steps:

Separate our document into sentences.
Separate our sentences into tokens.
Tag each token with a part of speech.
Identify named entities from this tagged token set.
Identify the class of each named entity.

To help us correctly find tokens at step 2, separate the real named entities from the impostors at step 4, and to ensure that the entities are placed into the correct class at step 5, it is common to leverage a machine learning approach, similar to what NLTK and its sentiment mining functions did for us in Chapter 5, Sentiment Analysis in Text. Relying on a large set of pre-classified examples will help us work out some of the more complicated issues we introduced above for recognizing named entities, for example, choosing the correct boundary in multi-word noun phrases, or recognizing novel approaches to capitalization, or knowing what kind...

Mastering Data Mining with Python - Find patterns hidden in your data

By : Megan Squire

Mastering Data Mining with Python - Find patterns hidden in your data

By: Megan Squire

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Data Mining with Python - Find patterns hidden in your data

Building and evaluating NER systems