Book Image

Streaming Big Data with Spark Streaming, Scala, and Spark 3! [Video]

By : Frank Kane
Book Image

Streaming Big Data with Spark Streaming, Scala, and Spark 3! [Video]

By: Frank Kane

Overview of this book

In this course, you will learn the basics of the Scala programming language; learn how Apache Spark operates on a cluster; set up discretized streams with Spark Streaming and transform them as data is received; analyze streaming data over sliding windows of time; maintain stateful information across streams of data; connect Spark Streaming with highly scalable sources of data, including Kafka, Flume, and Kinesis; dump streams of data in real-time to NoSQL databases such as Cassandra; run SQL queries on streamed data in real-time; train machine learning models in real-time with streaming data, and use them to make predictions that keep getting better over time; and also, package, deploy, and run self-contained Spark Streaming code to a real Hadoop cluster using Amazon Elastic MapReduce. This course is very hands-on, filled with achievable activities and exercises to reinforce your learning. By the end of this course, you will be confidently creating Spark Streaming scripts in Scala and be prepared to tackle massive streams of data in a whole new way. You will be surprised at how easy Spark Streaming makes it! All the codes and supporting files for this course are available at
Table of Contents (9 chapters)
You Made It!
Chapter 8
Spark Streaming in Production
Content Locked
Section 1
[Activity] Packaging and Running Spark Code in Production
Your production applications won't be run from within the Scala IDE; you will need to run them from a command line, and potentially on a cluster. The spark-submit command is used for this. We will show you how to package up your application and run it using spark-submit from a command prompt.