Once you get data into the cluster, the next step in a typical project is to get data ready for future consumption. This typically involves data cleaning, data quality, and aggregation; for example, checking phone number format, valid date of birth, and aggregate sales by region.
In our mini project case, we are in step two of the data pipeline to the Data Lake.