Book Image

Data Engineering with Alteryx

By : Paul Houghton
Book Image

Data Engineering with Alteryx

By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
1
Part 1: Introduction
5
Part 2: Functional Steps in DataOps
11
Part 3: Governance of DataOps

Chapter 4: Sourcing the Data

The first step of creating a new data pipeline is the process of sourcing the raw dataset. While scoping and defining the dataset are crucial parts of the entire data pipeline project, the framework for extracting the information is well established in general project management and the underlying Agile framework. Therefore, in this chapter, we will begin at the point of having the initial requirements defined and understood.

We will focus on the methods for accessing data sources from both internal sources, freely available public sources, and application programming interfaces (APIs) that have security applied.

We will also discuss some methods for validating the data sources you connect to and ensuring that the raw data structure has not changed. If the data source has changed, we will have automated methods to assess those changes.

In this chapter, we will cover the following topics:

  • How to connect to different internal data sources...