The data ingestion is the process of bringing data from one or more sources to a target data storage layer for processing. It is the first step of building any data pipeline. If you think about the two processes, which are Extract, Transform, and Load (ETL) and Extract, Load, and Transform (ELT), the first process is the extraction of data from the source system. In big data processing, the ingestion process has been categorized into multiple types. We will look into some of the design considerations while following these design patterns across the implementation.
Batch ingestion is the process of extracting the data from the source system in longer duration intervals, for example, configuring the ingestion process to run on a daily basis at 5 am. The source of batch ingestion is generally the persistent storage such as database systems, persistent filesystems, and so on, where data is already available. The following diagram shows the design considerations...