Book Image

Mastering Data Mining with Python - Find patterns hidden in your data

By : Megan Squire
Book Image

Mastering Data Mining with Python - Find patterns hidden in your data

By: Megan Squire

Overview of this book

Data mining is an integral part of the data science pipeline. It is the foundation of any successful data-driven strategy – without it, you'll never be able to uncover truly transformative insights. Since data is vital to just about every modern organization, it is worth taking the next step to unlock even greater value and more meaningful understanding. If you already know the fundamentals of data mining with Python, you are now ready to experiment with more interesting, advanced data analytics techniques using Python's easy-to-use interface and extensive range of libraries. In this book, you'll go deeper into many often overlooked areas of data mining, including association rule mining, entity matching, network mining, sentiment analysis, named entity recognition, text summarization, topic modeling, and anomaly detection. For each data mining technique, we'll review the state-of-the-art and current best practices before comparing a wide variety of strategies for solving each problem. We will then implement example solutions using real-world data from the domain of software engineering, and we will spend time learning how to understand and interpret the results we get. By the end of this book, you will have solid experience implementing some of the most interesting and relevant data mining techniques available today, and you will have achieved a greater fluency in the important field of Python data analytics.
Table of Contents (16 chapters)
Mastering Data Mining with Python – Find patterns hidden in your data
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Representing graph data


The theoretical aspects of networks are important, but in order to be able to apply these ideas to a real-world problem, we have to first transform our data into a format that a network analysis program can understand. In this section, we will discover the common formats for representing data in a network-friendly way.

Adjacency matrix

An adjacency matrix is a convenient way to represent graph data. To construct an adjacency matrix for an undirected, unweighted graph, we can create a grid that has all the nodes listed across the top as columns, and also down the side of the grid as rows. Then we use a 1 or 0 to indicate whether there is a link between those two nodes. Consider the unweighted, undirected graph shown in Figure 12:

Figure 12. A simple unweighted, undirected graph

The adjacency matrix for the Figure 12 graph can be written like this:

  A B C D E F
A 0 1 1 1 0 0
B 1 0 1 1 0 0
C 1 1 0 0 1 1
D 1 1 0 0 0 0
E 0 0 1 0 0 0
F 0 0 1 0 0 0

A few things jump out right...