Book Image

Taming Big Data with Apache Spark and Python - Hands On! [Video]

By : Frank Kane
Book Image

Taming Big Data with Apache Spark and Python - Hands On! [Video]

By: Frank Kane

Overview of this book

<p>“Big data" analysis is a hot and highly valuable skill – and this course will teach you the hottest technology in big data: Apache Spark.This has been updated to Spark 3. Employers including Amazon, EBay, NASA JPL, and Yahoo all use Spark to quickly extract meaning from massive data sets across a fault-tolerant Hadoop cluster. You'll learn those same techniques, using your own Windows system right at home. It's easier than you might think.</p><p>Learn and master the art of framing data analysis problems as Spark problems through over 15 hands-on examples, and then scale them up to run on cloud computing services in this course. You'll be learning from an ex-engineer and senior manager from Amazon and IMDb.</p><p></p><p>• Learn the concepts of Spark's Resilient Distributed Datastores</p><p>• Develop and run Spark jobs quickly using Python</p><p>• Translate complex analysis problems into iterative or multi-stage Spark scripts</p><p>• Scale up to larger data sets using Amazon's Elastic MapReduce service</p><p>• Understand how Hadoop YARN distributes Spark across computing clusters </p><p>• Learn about other Spark technologies, like Spark SQL, Spark Streaming, and GraphX</p><p></p><p>By the end of this course, you'll be running code that analyzes gigabytes worth of information – in the cloud – in a matter of minutes.</p><p></p><p>All the codes and supporting files for this course are available at - https://github.com/PacktPublishing/Taming-Big-Data-with-Apache-Spark-and-Python---Hands-On-..</p>
Table of Contents (7 chapters)
7
You Made It! Where to Go from Here.
Chapter 3
Advanced Examples of Spark Programs
Content Locked
Section 9
Running the Similar Movies Script Using Spark's Cluster
We'll review the code for finding similar movies in Spark with the MovieLens ratings data, run it on every available core of your desktop computer, and review the results.