Classic ETL (short for, Extract, Transform, Load) applications are common in the big data world. They typically extract data from one or more external sources, such as message queues, databases, and file systems; process them to perform some common operations, such as filtering, transforming, and enriching; and finally load the results to some data sinks, such as relational databases, files, or NoSQL datastores, where analytics queries can be run against them.
In this chapter, we will examine a sample ETL application in detail and illustrate how easy it is to construct such a pipeline using Apex and its library of operators, along with the built-in support for the SQL query planning and optimizing engine, Apache Calcite. Specifically, we will cover these topics:
- The operators that constitute the application pipeline
- Building the application and running the integration test
- Configuring the application
- Testing the application
- Understanding the logs
- Classes...