When studying calculus, one thing that remains clear is that life is not a discreet process, it is continuous; and life does not come in small packages, it is a continuously flowing stream.
As discussed in the first chapter, the fresher the information, the greater the benefit of the data. Many modern applications of machine-learning should be calculated in real-time.
Spark Streaming is the module for managing data flows. Much of Spark is built with the concept of RDD. Spark Streaming provides the concept of DStreams, or Discretized Streams. A DStream is a sequence of information related to time. It is very important to emphasize that an internal DStream is a sequence of RDD, hence the name discretized.
Just as RDDs have two transformations, DStreams also offer two types of operations:
- Transformations (whose result is another DStream)
- Output operations aimed at writing information to external systems
DStreams have many of the operations available in the RDDs, plus newer time-related...