Sign In Start Free Trial
Account

Add to playlist

Create a Playlist

Modal Close icon
You need to login to use this feature.
  • Book Overview & Buying Data Engineering with Alteryx
  • Table Of Contents Toc
Data Engineering with Alteryx

Data Engineering with Alteryx

By : Paul Houghton
4.8 (11)
close
close
Data Engineering with Alteryx

Data Engineering with Alteryx

4.8 (11)
By: Paul Houghton

Overview of this book

Alteryx is a GUI-based development platform for data analytic applications. Data Engineering with Alteryx will help you leverage Alteryx’s code-free aspects which increase development speed while still enabling you to make the most of the code-based skills you have. This book will teach you the principles of DataOps and how they can be used with the Alteryx software stack. You’ll build data pipelines with Alteryx Designer and incorporate the error handling and data validation needed for reliable datasets. Next, you’ll take the data pipeline from raw data, transform it into a robust dataset, and publish it to Alteryx Server following a continuous integration process. By the end of this Alteryx book, you’ll be able to build systems for validating datasets, monitoring workflow performance, managing access, and promoting the use of your data sources.
Table of Contents (18 chapters)
close
close
1
Part 1: Introduction
5
Part 2: Functional Steps in DataOps
11
Part 3: Governance of DataOps

Initial cleansing of datasets

We now have an initial dataset, which we can keep as a raw dataset but doesn't otherwise provide a source that our end users can use to extract value. For example, when we investigate the public consumer price inflation dataset (which we downloaded in the Integrating public data sources with Download tool use section), all the fields are text fields because the reference is a CSV text file. In contrast, the Google Places API data is a complete JSON file but not arranged into usable tables. In both situations, applying any statistical process controls (SPCs) is difficult as the data type doesn't allow for the appropriate statistical measure.

To cleanse our dataset, we will use a generic cleansing process. We will take the concepts needed to cleanse a dataset and apply them to our example raw file.

A simple cleansing process

The preview of our dataset in Figure 4.15 shows four initial problems that we want to address:

  • Titles...
CONTINUE READING
83
Tech Concepts
36
Programming languages
73
Tech Tools
Icon Unlimited access to the largest independent learning library in tech of over 8,000 expert-authored tech books and videos.
Icon Innovative learning tools, including AI book assistants, code context explainers, and text-to-speech.
Icon 50+ new titles added per month and exclusive early access to books as they are being written.
Data Engineering with Alteryx
notes
bookmark Notes and Bookmarks search Search in title playlist Add to playlist download Download options font-size Font size

Change the font size

margin-width Margin width

Change margin width

day-mode Day/Sepia/Night Modes

Change background colour

Close icon Search
Country selected

Close icon Your notes and bookmarks

Confirmation

Modal Close icon
claim successful

Buy this book with your credits?

Modal Close icon
Are you sure you want to buy this book with one of your credits?
Close
YES, BUY

Submit Your Feedback

Modal Close icon
Modal Close icon
Modal Close icon