Book Image

Data Engineering with Alteryx

By : Paul Houghton
Book Image

Data Engineering with Alteryx

By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
Part 1: Introduction
Part 2: Functional Steps in DataOps
Part 3: Governance of DataOps

What this book covers

Chapter 1, Getting Started with Alteryx, introduces the Alteryx software suite and why you should use it as part of your data engineering processes.

Chapter 2, Data Engineering with Alteryx, focuses more on the specific application of Alteryx in a data engineering context. We understand the benefits of Alteryx for a data engineer and how to get started with Alteryx products.

Chapter 3, DataOps and Its Benefits, describes the DataOps process and why it is a good framework for data projects. It explores the principles for creating a good data product and how it can create high-performing data teams. We also explore how DataOps fits with the Alteryx products and how to leverage the principles when developing an Alteryx workflow.

Chapter 4, Sourcing the Data, explores the methods for extracting data with Alteryx. We look at the methods for connecting to local files and SQL databases in addition to the methods for extracting cloud-based data with application programming interfaces.

Chapter 5, Data Processing and Transformations, takes an example dataset from the previous chapter and describes common transformations required to process a raw dataset into an analytic resource for an organization.

Chapter 6, Destination Management, extends on the connection processes learned in Chapter 4, Sourcing the Data, and focuses on how to persist the dataset for future use. It examines the benefits of the saving methods and how each can be used for different applications.

Chapter 7, Extracting Value, introduces the methods for extracting insights and information from a dataset. We explore the methods for exploratory data analysis in Alteryx so that we can understand our dataset and gain organizational value from our data resources.

Chapter 8, Beginning Advanced Analytics, extends the skills learned in Chapter 7, Extracting Value, into the areas of spatial analytics and machine learning. We explore how to extract the geographic insights in our dataset using spatial tools. We also explore how to build a machine learning project in Alteryx using the predictive tools and the Intelligence Suite add-on.

Chapter 9, Testing Workflows and Outputs, describes how to use the message tool and the test tool to integrate testing processes and validation into our data pipeline. These checks improve the robustness of our dataset and provide early warning systems for data drift or data structure changes.

Chapter 10, Monitoring DataOps and Managing Changes, describes how to deploy continuous integration principles to an Alteryx pipeline. It allows for version and change management processes and confidence in dataset quality.

Chapter 11, Securing and Managing Access, introduces the best practices for managing an Alteryx server environment. We will learn how to manage access to workflows published to Alteryx Server and how to manage the infrastructure Alteryx Server is deployed on.

Chapter 12, Making Data Easy to Use and Discoverable with Alteryx, describes how Alteryx Connect can be used as a central data dictionary to help break the information silos in your organization and allow for the reuse of datasets across an organization.

Chapter 13, Conclusion, provides an overview of the data pipeline process we created throughout this book. It provides a final recap of all the skills you have acquired throughout the book so you can confidently apply these skills in your daily use.