-
Book Overview & Buying
-
Table Of Contents
Time Series Analysis with Spark
By :
In this first section, we will cover the methods of getting time series data from sources and persisting the dataset to storage.
Ingestion is the process by which data is retrieved from a source system for further processing and analysis. This process can be executed in batches to ingest a large amount of data as a one-off on demand or scheduled to run automatically at regular intervals, such as every night. Alternatively, if the data is available from the source system on a continual basis and is required as such, the other ingestion method is structured streaming.
Note
We can technically code the ingestion process as structured streaming and configure it to run at triggered intervals. This gives the flexibility to adjust to changing business requirements on data freshness without having to redevelop the ingestion process.
In this chapter, we will focus on batch ingestion, the most common method today. We will also briefly discuss...