Developing a data aggregation pipeline
Before we start developing the aggregation pipeline, we need to deploy the following Azure resources:
In the following section, we will be creating the aggregation pipeline highlighted in the preceding diagram.
Preparing the Azure resources
Follow these steps to start the Azure resource deployment process:
- We will start by creating a new namespace in Azure Data Lake Storage for the gold layer.
I mentioned earlier that storage account names in Azure are globally unique. Throughout this exercise, we will be using
traininglakehouse
as the storage account name. You will need to edit it as per the account name that you created:STORAGEACCOUNTNAME="traininglakehouse" GOLDLAYER="gold" az storage fs create -n $GOLDLAYER --account-name $STORAGEACCOUNTNAME --only-show-errors
This results in the following output: