Book Image

Applied Unsupervised Learning with Python

By : Benjamin Johnston, Aaron Jones, Christopher Kruger
Book Image

Applied Unsupervised Learning with Python

By: Benjamin Johnston, Aaron Jones, Christopher Kruger

Overview of this book

Unsupervised learning is a useful and practical solution in situations where labeled data is not available. Applied Unsupervised Learning with Python guides you in learning the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The book begins by explaining how basic clustering works to find similar data points in a set. Once you are well-versed with the k-means algorithm and how it operates, you’ll learn what dimensionality reduction is and where to apply it. As you progress, you’ll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. Finally, you will be able to put your knowledge to work through interesting activities such as performing a Market Basket Analysis and identifying relationships between different products. By the end of this book, you will have the skills you need to confidently build your own models using Python.
Table of Contents (12 chapters)
Applied Unsupervised Learning with Python
Preface

Apriori Algorithm


The Apriori algorithm is a data mining methodology for identifying and quantifying frequent item sets in transaction data, and is the foundational component of association rule learning. Extending the results of the Apriori algorithm to association rule learning will be discussed in the next section. The minimum value to qualify as frequent in the Apriori algorithm is an input into the model and, as such, is adjustable. Frequency is quantified here as support, so the value inputted into the model is the minimum support acceptable for the analysis being done. The model then identifies all item sets whose support is greater than, or equal to, the minimum support provided to the model. Note that the minimum support parameter is not a parameter that can be optimized via a grid search because there is no evaluation metric for the Apriori algorithm. Instead, the minimum support parameter is set based on the data, the use case, and domain expertise.

The main idea behind the Apriori...