Book Image

Learning Data Mining with Python - Second Edition

By : Robert Layton
Book Image

Learning Data Mining with Python - Second Edition

By: Robert Layton

Overview of this book

This book teaches you to design and develop data mining applications using a variety of datasets, starting with basic classification and affinity analysis. This book covers a large number of libraries available in Python, including the Jupyter Notebook, pandas, scikit-learn, and NLTK. You will gain hands on experience with complex data types including text, images, and graphs. You will also discover object detection using Deep Neural Networks, which is one of the big, difficult areas of machine learning right now. With restructured examples and code samples updated for the latest edition of Python, each chapter of this book introduces you to new algorithms and techniques. By the end of the book, you will have great insights into using Python for data mining and understanding of the algorithms as well as implementations.
Table of Contents (20 chapters)
Title Page
Credits
About the Author
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

A simple affinity analysis example


In this section, we jump into our first example. A common use case for data mining is to improve sales, by asking a customer who is buying a product if he/she would like another similar product as well. You can perform this analysis through affinity analysis, which is the study of when things exist together, namely. correlate to each other.

To repeat the now-infamous phrase taught in statistics classes, correlation is not causation. This phrase means that the results from affinity analysis cannot give a cause. In our next example, we perform affinity analysis on product purchases. The results indicate that the products are purchased together, but not that buying one product causes the purchase of the other. The distinction is important, critically so when determining how to use the results to affect a business process, for instance.

What is affinity analysis?

Affinity analysis is a type of data mining that gives similarity between samples (objects). This could be the similarity between the following:

  • Users on a website, to provide varied services or targeted advertising
  • Items to sell to those users, to provide recommended movies or products
  • Human genes, to find people that share the same ancestors

We can measure affinity in several ways. For instance, we can record how frequently two products are purchased together. We can also record the accuracy of the statement when a person buys object 1 and when they buy object 2. Other ways to measure affinity include computing the similarity between samples, which we will cover in later chapters.