Chapter 2: Data Processing with Spark | Apache Spark for Machine Learning

Book Overview & Buying
Table Of Contents

Apache Spark for Machine Learning

By : Deepak Gowda

4.5 (2)

Buy this Book

Apache Spark for Machine Learning

4.5 (2)

By: Deepak Gowda

Buy this Book

Overview of this book

In the world of big data, efficiently processing and analyzing massive datasets for machine learning can be a daunting task. Written by Deepak Gowda, a data scientist with over a decade of experience and 30+ patents, this book provides a hands-on guide to mastering Spark’s capabilities for efficient data processing, model building, and optimization. With Deepak’s expertise across industries such as supply chain, cybersecurity, and data center infrastructure, he makes complex concepts easy to follow through detailed recipes. This book takes you through core machine learning concepts, highlighting the advantages of Spark for big data analytics. It covers practical data preprocessing techniques, including feature extraction and transformation, supervised learning methods with detailed chapters on regression and classification, and unsupervised learning through clustering and recommendation systems. You’ll also learn to identify frequent patterns in data and discover effective strategies to deploy and optimize your machine learning models. Each chapter features practical coding examples and real-world applications to equip you with the knowledge and skills needed to tackle complex machine learning tasks. By the end of this book, you’ll be ready to handle big data and create advanced machine learning models with Apache Spark.

Preface

Who this book is for

What this book covers

To get the most out of this book

Downloading the example code files

Conventions used

Get in touch

Share Your Thoughts

Download a free PDF copy of this book

Free Chapter

Part 1: Introduction and Fundamentals

Chapter 1: An Overview of Machine Learning Concepts

Technical requirements

Understanding machine learning

An introduction to Apache Spark

Why Apache Spark for machine learning?

Setting up Apache Spark

Summary

Chapter 2: Data Processing with Spark

Technical requirements

Understanding data preprocessing

Ingesting data

Cleaning and transforming data

Aggregating data

Windowing in Spark

Data joining

Summary

Chapter 3: Feature Extraction and Transformation

Technical requirements

Learning about feature extractors

Working with feature transformers

Exploring feature selectors

Summary

Part 2: Supervised Learning

Chapter 4: Building a Regression System

Technical requirements

Learning about regression

Learning regression algorithms

Evaluating the model’s performance

Improving the model’s performance

Summary

Chapter 5: Building a Classification System

Technical requirements

Learning about classification

Learning about classification algorithms

Evaluating the model’s performance

Improving the model’s performance

Summary

Part 3: Unsupervised Learning

Chapter 6: Building a Clustering System

Technical requirements

Learning about clustering

Learning clustering algorithms

Evaluating the model performance

Improving the model performance

Summary

Chapter 7: Building a Recommendation System

Technical requirements

An overview of recommendation systems

The need for a recommendation system

The working mechanism of recommendation systems

The key problems and challenges in recommendation systems

Improving the quality of recommendations

Building a recommendation system using Apache Spark

Summary

Chapter 8: Mining Frequent Patterns

Technical requirements

The basic concepts of frequent patterns and the significance of discovering patterns and rules

Frequent pattern mining applications and case studies

The key challenges in frequent pattern mining

Frequent pattern mining algorithms

Developing a model using scalable frequent pattern mining algorithms

Summary

Part 4: Model Deployment

Chapter 9: Deploying a Model

Technical requirements

Importance of model deployment

Exploring ML pipelines

Model serialization and storage

Model deployment strategies

Model monitoring and management

Scalability and performance optimization

Summary

Index

Why subscribe?

Other Books You May Enjoy

Packt is searching for authors like you

Share Your Thoughts

Download a free PDF copy of this book

Apache Spark for Machine Learning

By : Deepak Gowda

Apache Spark for Machine Learning

By: Deepak Gowda

Overview of this book

Ingesting data

Filesystems

Confirmation

Buy this book with your credits?

Submit Your Feedback

Create a Free Account To Continue Reading

Sign in to activate your 7-day free access