Book Image

Applied Unsupervised Learning with Python

By : Benjamin Johnston, Aaron Jones, Christopher Kruger
Book Image

Applied Unsupervised Learning with Python

By: Benjamin Johnston, Aaron Jones, Christopher Kruger

Overview of this book

Unsupervised learning is a useful and practical solution in situations where labeled data is not available. Applied Unsupervised Learning with Python guides you in learning the best practices for using unsupervised learning techniques in tandem with Python libraries and extracting meaningful information from unstructured data. The book begins by explaining how basic clustering works to find similar data points in a set. Once you are well-versed with the k-means algorithm and how it operates, you’ll learn what dimensionality reduction is and where to apply it. As you progress, you’ll learn various neural network techniques and how they can improve your model. While studying the applications of unsupervised learning, you will also understand how to mine topics that are trending on Twitter and Facebook and build a news recommendation engine for users. Finally, you will be able to put your knowledge to work through interesting activities such as performing a Market Basket Analysis and identifying relationships between different products. By the end of this book, you will have the skills you need to confidently build your own models using Python.
Table of Contents (12 chapters)
Applied Unsupervised Learning with Python
Preface

Hotspot Analysis


To start, hotspots are areas of higher concentrations of data points, such as particular neighborhoods where the crime rate is abnormally high or swaths of the country that are impacted by an above-average number of tornadoes. Hotspot analysis is the process of finding these hotspots, should any exist, in a population using sampled data. This process is generally done by leveraging kernel density estimation.

Hotspot analysis can be described in four high-level steps:

  1. Collect the data: The data should include the locations of the objects or events. As we have briefly mentioned, the amount of data needed to run and achieve actionable results is relatively flexible. The optimal state is to have a sample dataset that is representative of the population.

  2. Identify the base map: The next step is to identify which base map would best suit the analytical and presentational needs of the project. On this base map, the results of the model will be overlaid, so that the locations of the...