Book Image

Data Engineering with Alteryx

By : Paul Houghton
Book Image

Data Engineering with Alteryx

By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
1
Part 1: Introduction
5
Part 2: Functional Steps in DataOps
11
Part 3: Governance of DataOps

The functional steps in DataOps

In Part 2, The Functional Steps in DataOps, we saw how within a DataOps project, we take five overarching steps:

  1. Sourcing the data
  2. Data processing and transformations
  3. Destination management
  4. Value extraction
  5. Beginning advanced analytics

We detailed each step in a separate chapter. We also split the value extraction step into two chapters, allowing for the getting primary value extraction details and extending our analytic capabilities with the advanced spatial and machine learning features in Alteryx.

Sourcing the data

Chapter 4, Sourcing the Data, began our data pipeline construction by connecting to three different data source location types:

  • Internal organization data sources
  • Downloading public files
  • Using public APIs for data extraction

These three data locations provide the basis for most of the data sources you find as a data engineer. The internal data sources are any files or databases...