So far in this book, we have discussed batch analytics with Spark using the core libraries and Spark SQL. In this chapter, we will learn about another vertical of Spark that is processing near real-time streams of data. Spark comes with a library known as Spark Streaming, which provides the capability to process data in near real time. This extension of Spark makes it a true general purpose system.
This chapter will the explain internals of Spark Streaming, reading streams of data in Spark from various data sources with examples, and newer extensions of stream processing in Spark known as structured streaming.