Book Image

Data Engineering with Alteryx

By : Paul Houghton
Book Image

Data Engineering with Alteryx

By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
1
Part 1: Introduction
5
Part 2: Functional Steps in DataOps
11
Part 3: Governance of DataOps

Exploratory data analysis in Alteryx and surfacing the datasets for BI tools

To extract value from your dataset, you need to perform Exploratory Data Analysis (EDA). EDA is the process of finding patterns and anomalies in your dataset. It is also a method for testing any assumptions or hypotheses you may have. Typically, you will create visualizations to find the insight and confirm the assumptions and hypotheses. You can also create statistical summary tables to get an overview and general understanding of the metrics in your data.

A general process for EDA would cover the following areas:

  • Identifying whether any fields are missing values and summarizing their properties
  • Understanding the distribution of the fields in your dataset
  • Finding any significant relationships between fields in your dataset
  • Searching for any outlier values in your dataset

Each of these steps will explain what your data represents and provide a strong starting position for further...