Large Scale Machine Learning with Spark

Large Scale Machine Learning with Spark

By : Md. Rezaul Karim, Md. Mahedi Kaysar

Buy this Book

Large Scale Machine Learning with Spark

By: Md. Rezaul Karim, Md. Mahedi Kaysar

Buy this Book

Overview of this book

Data processing, implementing related algorithms, tuning, scaling up and finally deploying are some crucial steps in the process of optimising any application. Spark is capable of handling large-scale batch and streaming data to figure out when to cache data in memory and processing them up to 100 times faster than Hadoop-based MapReduce.This means predictive analytics can be applied to streaming and batch to develop complete machine learning (ML) applications a lot quicker, making Spark an ideal candidate for large data-intensive applications. This book focuses on design engineering and scalable solutions using ML with Spark. First, you will learn how to install Spark with all new features from the latest Spark 2.0 release. Moving on, you’ll explore important concepts such as advanced feature engineering with RDD and Datasets. After studying developing and deploying applications, you will see how to use external libraries with Spark. In summary, you will be able to develop complete and personalised ML applications from data collections,model building, tuning, and scaling up to deploying on a cluster or the cloud.

Large Scale Machine Learning with Spark

Credits

About the Authors

About the Reviewer

www.Packtpub.com

Preface

Free Chapter

Introduction to Data Analytics with Spark

Spark overview

New computing paradigm with Spark

Spark ecosystem

Spark machine learning libraries

Installing and getting started with Spark

Packaging your application with dependencies

Running a sample machine learning application

References

Summary

Machine Learning Best Practices

What is machine learning?

Machine learning tasks

Practical machine learning problems

Most widely used machine learning problems

Large scale machine learning APIs in Spark

Practical machine learning best practices

Choosing the right algorithm for your application

Summary

Understanding the Problem by Understanding the Data

Analyzing and preparing your data

Resilient Distributed Dataset basics

Dataset basics

Dataset from string and typed class

Spark and data scientists workflow

Deeper into Spark

Summary

Extracting Knowledge through Feature Engineering

The state of the art of feature engineering

Best practices in feature engineering

Feature engineering with Spark

Advanced feature engineering

Summary

Supervised and Unsupervised Learning by Examples

Machine learning classes

Supervised learning with Spark - an example

Unsupervised learning

Recommender system

Advanced learning and generalizations

Summary

Building Scalable Machine Learning Pipelines

Spark machine learning pipeline APIs

Cancer-diagnosis pipeline with Spark

Cancer-prognosis pipeline with Spark

Market basket analysis with Spark Core

OCR pipeline with Spark

Topic modeling using Spark MLlib and ML

Credit risk analysis pipeline with Spark

Scaling the ML pipelines

Tips and performance considerations

Summary

Tuning Machine Learning Models

Details about machine learning model tuning

Typical challenges in model tuning

Evaluating machine learning models

Validation and evaluation techniques

Parameter tuning for machine learning models

Hypothesis testing

Machine learning model selection

Summary

Adapting Your Machine Learning Models

Adapting machine learning models

The generalization of ML models

Adapting through incremental algorithms

Adapting through reusing ML models

Machine learning in dynamic environments

Summary

Advanced Machine Learning with Streaming and Graph Data

Developing real-time ML pipelines

Time series and social network analysis

Movie recommendation using Spark

Developing a real-time ML pipeline from streaming

ML pipeline on graph data and semi-supervised graph-based learning

Summary

Configuring and Working with External Libraries

Third-party ML libraries with Spark

Using external libraries with Spark Core

Time series analysis using the Cloudera Spark-TS package

Configuring SparkR with RStudio

Configuring Hadoop run-time on Windows

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Machine learning tasks

Machine learning tasks or machine learning processes are typically classified into three broad categories, depending on the nature of the learning feedback available to a learning system. Supervised learning, unsupervised learning, and reinforcement learning; these three kinds of machine learning tasks are shown in Figure 3, and will be discussed in this section:

Figure 3: Machine learning tasks.

Supervised learning

A supervised learning application makes predictions based on a set of examples, and the goal is to learn general rules that map inputs to outputs aligning with the real world. For example, a dataset for spam filtering usually contains spam messages as well as non-spam messages. Therefore, we could know which messages in a training set are spams or non-spams. Nevertheless, we might have the opportunity to use this information to train our model in order to classify new and unseen messages. Figure 4 shows the schematic diagram of the supervised learning.

In other...

Large Scale Machine Learning with Spark

By : Md. Rezaul Karim, Md. Mahedi Kaysar

Large Scale Machine Learning with Spark

By: Md. Rezaul Karim, Md. Mahedi Kaysar

Overview of this book

Related Content you might be interested in

Current Title:

Large Scale Machine Learning with Spark

Machine learning tasks

Supervised learning