Book Image

Mastering Machine Learning with Spark 2.x

By : Michal Malohlava, Alex Tellez, Max Pumperla
Book Image

Mastering Machine Learning with Spark 2.x

By: Michal Malohlava, Alex Tellez, Max Pumperla

Overview of this book

The purpose of machine learning is to build systems that learn from data. Being able to understand trends and patterns in complex data is critical to success; it is one of the key strategies to unlock growth in the challenging contemporary marketplace today. With the meteoric rise of machine learning, developers are now keen on finding out how can they make their Spark applications smarter. This book gives you access to transform data into actionable knowledge. The book commences by defining machine learning primitives by the MLlib and H2O libraries. You will learn how to use Binary classification to detect the Higgs Boson particle in the huge amount of data produced by CERN particle collider and classify daily health activities using ensemble Methods for Multi-Class Classification. Next, you will solve a typical regression problem involving flight delay predictions and write sophisticated Spark pipelines. You will analyze Twitter data with help of the doc2vec algorithm and K-means clustering. Finally, you will build different pattern mining models using MLlib, perform complex manipulation of DataFrames using Spark and Spark SQL, and deploy your app in a Spark streaming environment.
Table of Contents (9 chapters)
Ensemble Methods for Multi-Class Classification

Frequent pattern mining

When presented with a new data set, a natural sequence of questions is:
  • What kind of data do we look at; that is, what structure does it have?
  • Which observations in the data can be found frequently; that is, which patterns or rules can we identify within the data?
  • How do we assess what is frequent; that is, what are the good measures of relevance and how do we test for it?

On a very high level, frequent pattern mining addresses precisely these questions. While it's very easy to dive head first into more advanced machine learning techniques, these pattern mining algorithms can be quite informative and help build an intuition about the data.

To introduce some of the key notions of frequent pattern mining, let's first consider a somewhat prototypical example for such cases, namely shopping carts. The study of customers being interested in and buying...