Developing a data curation pipeline
We are ready to start the development of the Electroniz curation pipeline. The data will now take the journey from being in an unclean state to a more cleansed and usable state. I had previously promised to keep you updated with which area of the architecture diagram is being addressed, so here it is:
In the following section, we will be creating the curation pipeline that is highlighted in the preceding figure.
Preparing Azure resources
We will begin by first preparing the required resources, as follows:
- We will start by creating a new namespace in Azure data lake storage for the
silver
layer.I had mentioned earlier that storage account names in Azure are globally unique. Throughout this exercise, we are using the storage account name
traininglakehouse
. You will need to edit it as per the account name that you created:SILVER_NAMESPACE="silver" STORAGEACCOUNTNAME...