Book Image

Data Engineering with Alteryx

By : Paul Houghton
Book Image

Data Engineering with Alteryx

By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
Part 1: Introduction
Part 2: Functional Steps in DataOps
Part 3: Governance of DataOps

General steps for deploying DataOps in your environment

Now that we know what the DataOps principles are and what parts of Alteryx build the DataOps pipeline, how would you implement DataOps into your company? The best way to demonstrate this would be to introduce an example we will use for our example process throughout the rest of this book.

The data pipeline example we will be following is this:

As a company, we want to enrich our marketing efforts by integrating regularly updated public datasets. We need to identify the source of these datasets (and make sure we have the legal authority to use them) and transform them to match our company areas. Then, we have to make them available to both the data science team for machine learning and to our operational teams across the organization.

This problem statement works well in identifying the process we need to implement, shown in Figure 3.3:

Figure 3.3 – An example process for starting a DataOps...