Machine Learning with Apache Spark Quick Start Guide

By : Jillur Quddus

Machine Learning with Apache Spark Quick Start Guide

By: Jillur Quddus

Overview of this book

Every person and every organization in the world manages data, whether they realize it or not. Data is used to describe the world around us and can be used for almost any purpose, from analyzing consumer habits to fighting disease and serious organized crime. Ultimately, we manage data in order to derive value from it, and many organizations around the world have traditionally invested in technology to help process their data faster and more efficiently. But we now live in an interconnected world driven by mass data creation and consumption where data is no longer rows and columns restricted to a spreadsheet, but an organic and evolving asset in its own right. With this realization comes major challenges for organizations: how do we manage the sheer size of data being created every second (think not only spreadsheets and databases, but also social media posts, images, videos, music, blogs and so on)? And once we can manage all of this data, how do we derive real value from it? The focus of Machine Learning with Apache Spark is to help us answer these questions in a hands-on manner. We introduce the latest scalable technologies to help us manage and process big data. We then introduce advanced analytical algorithms applied to real-world use cases in order to uncover patterns, derive actionable insights, and learn from this big data.

Preface

Who this book is for

What this book covers

To get the most out of this book

Get in touch

Free Chapter

The Big Data Ecosystem

A brief history of data

Big data ecosystem

Summary

Setting Up a Local Development Environment

CentOS Linux 7 virtual machine

Summary

Artificial Intelligence and Machine Learning

Artificial intelligence

Machine learning

Deep learning

NLP

Cognitive computing

Machine learning pipelines in Apache Spark

Summary

Supervised Learning Using Apache Spark

Linear regression

Logistic regression

Classification and Regression Trees

Summary

Unsupervised Learning Using Apache Spark

Clustering

Principal component analysis

Summary

Natural Language Processing Using Apache Spark

Feature transformers

Feature extractors

Case study – sentiment analysis

Summary

Deep Learning Using Apache Spark

Artificial neural networks

Summary

Real-Time Machine Learning Using Apache Spark

Distributed streaming platform

Distributed stream processing engines

Stream processing pipeline

Summary

Other Books You May Enjoy

Leave a review - let other readers know what you think

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Machine learning pipelines in Apache Spark

To end this chapter, we will take a look at how Apache Spark can be used to implement the algorithms that we have previously discussed by taking a look at how its machine learning library, MLlib, works under the hood. MLlib provides a suite of tools designed to make machine learning accessible, scalable, and easy to deploy.

Note that as of Spark 2.0, the MLlib RDD-based API is in maintenance mode. The examples in this book will use the DataFrame-based API, which is now the primary API for MLlib. For more information, please visit https://spark.apache.org/docs/latest/ml-guide.html.

At a high level, the typical implementation of machine learning models can be thought of as an ordered pipeline of algorithms, as follows:

Feature extraction, transformation, and selection
Train a predictive model based on these feature vectors and labels
Make...

Machine Learning with Apache Spark Quick Start Guide

By : Jillur Quddus

Machine Learning with Apache Spark Quick Start Guide

By: Jillur Quddus

Overview of this book

Related Content you might be interested in

Current Title:

Machine Learning with Apache Spark Quick Start Guide

Essential PySpark for Scalable Data Analytics

Apache Spark Quick Start Guide

Apache Kafka Quick Start Guide