In this section, we will talk about the high-level architecture of Spark Streaming. We will also discuss the important components of Spark Streaming such as Discretized Streams, microbatching, and more. At the end, we will also write our first Spark streaming job for consuming and processing data in near real-time.
Spark Streaming is one of the powerful extensions provided by Spark for consuming and processing the events produced by various data sources in near real-time. Spark Streaming extended the Spark core architecture and produced a new architecture based on microbatching, where live/streaming data is received and collected from various data sources and further divided into a series of deterministic microbatches. The size of each microbatch is essentially governed by the batch duration provided by the user. In order to understand it better, let's take an example of an application receiving live/streaming data of 20 events per second where the batch duration provided...