Book Image

Data Engineering with Alteryx

By : Paul Houghton
Book Image

Data Engineering with Alteryx

By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
1
Part 1: Introduction
5
Part 2: Functional Steps in DataOps
11
Part 3: Governance of DataOps

Chapter 5: Data Processing and Transformations

Now that we have our initial raw dataset, we can start transforming data into the final state. When building your data pipeline, this processing and transformation process is the core of the entire pipeline and often requires separation into multiple subsets for different applications.

The core data processing is the simplest part of this process, and it is what we started looking at in Chapter 4, Sourcing the Data, where we began the process of creating the pipeline by taking the raw data, cleansing the titles and information headers, and setting the data types. This just provides us with an initial dataset to work with, and not a final dataset for use. When we look at the column headers, we see three different datasets making up the columns. Additionally, the records are shown across multiple different time periods – annually, quarterly, and monthly.

Our next step will be to improve the dataset to provide a more relevant...