Performing data transformation using activities such as join, sort, and filter
A common scenario in data engineering pipelines is combining two or more files based on a column, filtering by column, sorting the results, and storing them for querying. We will perform the following actions to achieve this:
- Read two CSV files
- Use a join transformation to combine the two files based on a column
- Use a filter transformation to filter the rows based on a condition
- Sort the filtered data based on a column value and store the result in Parquet format
Getting ready
Create a Synapse Analytics workspace, as explained in the Provisioning an Azure Synapse Analytics workspace recipe of Chapter 8, Processing Data Using Azure Synapse Analytic:.
- Download the files –
transaction_table-t1.zip
andtransaction_table-t2.zip
– from https://github.com/PacktPublishing/Azure-Data-Engineering-Cookbook-2nd-edition/upload/main/chapter9. - Unzip them.
- In...