Mastering Big Data Analytics with PySpark [Video]

Mastering Big Data Analytics with PySpark [Video]

By : Danny Meijer

Buy this Video

Mastering Big Data Analytics with PySpark [Video]

By: Danny Meijer

Buy this Video

Overview of this book

PySpark helps you perform data analysis at-scale; it enables you to build more scalable analyses and pipelines. This course starts by introducing you to PySpark's potential for performing effective analyses of large datasets. You'll learn how to interact with Spark from Python and connect Jupyter to Spark to provide rich data visualizations. After that, you'll delve into various Spark components and its architecture. You'll learn to work with Apache Spark and perform ML tasks more smoothly than before. Gathering and querying data using Spark SQL, to overcome challenges involved in reading it. You'll use the DataFrame API to operate with Spark MLlib and learn about the Pipeline API. Finally, we provide tips and tricks for deploying your code and performance tuning. By the end of this course, you will not only be able to perform efficient data analytics but will have also learned to use PySpark to easily analyze large datasets at-scale in your organization. All related code files are placed on a GitHub repository at: https://github.com/PacktPublishing/Mastering-Big-Data-Analytics-with-PySpark

Free Chapter

Python and Spark: A Match Made in Heaven

Course Overview

Python versus Spark

Preparing for the Course

Connecting Jupyter to Spark

Working with PySpark

Getting to Know Spark

The Power of Spark

The Power of Spark MLlib

Spark DataFrames

Spark Data Operations

Preparing Data Using Spark SQL

Loading Data from CSV Files

Fixing Issues in Our Data – Part One

Fixing Issues in Our Data – Part Two

Grouping, Joining, and Aggregating – Part One

Grouping, Joining, and Aggregating – Part Two

Machine Learning with Spark MLlib

Machine Learning with Spark

Building a Recommendation System with Spark MLlib – Part One

Building a Recommendation System with Spark MLlib – Part Two

Building a Recommendation System with Spark MLlib – Part Three

Finalizing our Recommendation System

What We Have Learned So Far

Classification and Regression

Machine Learning with Spark

Machine Learning Pipelines

Running a Logistic Regression Pipeline

Parameters, Features, and Persistence

Frequent Pattern Mining and Statistics

Analyzing Big Data

Natural Language Processing with Spark

Identifying Our Data

Data Preparation and Exploration

Creating Our Raw Training Data

Processing Natural Language in Spark

Data Preparation and Regular Expressions

Data Cleaning and Transformation

Training a Sentiment Analysis Model – Part One

Training a Sentiment Analysis Model – Part Two

Machine Learning in Real-Time

Fetching Data from Twitter

Spark Structured Streaming

Managing and Converting Streams

Assembling Our Streaming ML Solution

A Structured Approach to ML Streaming

The Power of PySpark

Running Spark in Production

Running Spark at Scale

Tips, Tricks, and Take-Aways

Customer Reviews

5 star

4 star

3 star

2 star

1 star

Chapter 5

Classification and Regression

Section 1

Machine Learning with Spark

In the last section you got to use Spark's machine learning library, specifically the recommendation part of it. There is, however, so much more to learn about MLlib. Here, we set out to discover which things (about MLlib) are important but not explicitly or easily available in the official documentation.

Mastering Big Data Analytics with PySpark [Video]

By : Danny Meijer

Mastering Big Data Analytics with PySpark [Video]

By: Danny Meijer

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Big Data Analytics with PySpark [Video]