-
Book Overview & Buying
-
Table Of Contents
-
Feedback & Rating
Distributed Data Systems with Azure Databricks
By :
Until now, we have been able to use data stored in either an S3 bucket or Azure Blob storage, transform it using PySpark or SQL, and then persist the transformed data into a table. Now, the question is: Which methods do we have to integrate this into a complete ETL? One of the options that we have is to use ADF to integrate our Azure Databricks notebook as one of the steps in our data architecture.
In the next example, we will use ADF in order to trigger our notebook by directly passing the name of the file that contains the data we want to process and use this to update our voting turnout table. For this, you will require the following:
Voting_Turnout_US_2020 dataset loaded into a Spark dataframeADF is the Azure cloud platform for the integration of serverless data transformation and aggregation processes. It can integrate...