Book Image

Tableau Prep Cookbook

By : Hendrik Kleine
Book Image

Tableau Prep Cookbook

By: Hendrik Kleine

Overview of this book

Tableau Prep is a tool in the Tableau software suite, created specifically to develop data pipelines. This book will describe, in detail, a variety of scenarios that you can apply in your environment for developing, publishing, and maintaining complex Extract, Transform and Load (ETL) data pipelines. The book starts by showing you how to set up Tableau Prep Builder. You’ll learn how to obtain data from various data sources, including files, databases, and Tableau Extracts. Next, the book demonstrates how to perform data cleaning and data aggregation in Tableau Prep Builder. You’ll also gain an understanding of Tableau Prep Builder and how you can leverage it to create data pipelines that prepare your data for downstream analytics processes, including reporting and dashboard creation in Tableau. As part of a Tableau Prep flow, you’ll also explore how to use R and Python to implement data science components inside a data pipeline. In the final chapter, you’ll apply the knowledge you’ve gained to build two use cases from scratch, including a data flow for a retail store to prepare a robust dataset using multiple disparate sources and a data flow for a call center to perform ad hoc data analysis. By the end of this book, you’ll be able to create, run, and publish Tableau Prep flows and implement solutions to common problems in data pipelines.
Table of Contents (11 chapters)

Setting up an incremental refresh

Your flow may process a significant amount of data whenever it runs, taking up system resources, impacting database performance, and taking time to run. Much of your input data may be processed repeatedly as you run your flow. For example, your flow may process data from an order system. Running the flow daily might process all data just to capture the most recently placed orders.

In order to make your flow more efficient, reduce the burden on input databases, and minimize flow runtime, Tableau Prep allows you to set up an incremental refresh. In the example described, an incremental refresh would only process orders that have not previously been processed by Tableau Prep. To achieve this, Tableau Prep compares the data in the flow output to the flow input.

In this recipe, we'll configure a flow to achieve this.

Getting ready

To follow along, open up Tableau Prep Builder and, from the home screen, select the Superstore sample flow.

How to do it…

To get started, select the orders (USCA) input step, and then follow these steps:

  1. From the bottom pane, select the Settings tab, then scroll to the bottom to reveal the Incremental Refresh setting and check the Enable incremental refresh box. This will result in an error message, which will disappear as we configure the incremental refresh in the next steps:
    Figure 2.42 – Incremental Refresh settings

    Figure 2.42 – Incremental Refresh settings

    Tableau Prep needs to know three bits of information in the input step to configure a incremental refresh.

  2. Firstly, which field indicates whether or not a row in the data is new. In this example, we want to identify new Superstore rows by Order Date. Select this from the Input field dropdown to reveal the additional settings:
Figure 2.43 – Incremental Refresh field settings

Figure 2.43 – Incremental Refresh field settings

Next, we need to tell Tableau Prep in which output it can find a field to compare the selected input field with, to determine whether a row is new or not. In this case, the fields are named identically, and so Tableau Prep has automatically selected Order Date as the output field in the Superstore Sales output, which is exactly what we want. No further changes are needed; your incremental refresh for this input is now configured. If you have multiple inputs, an incremental refresh must be configured for each input separately.

Important note

Replacing Output with Incremental Data Only: When you select the Create 'Superstore Sales.hyper', output step notice the Incremental Refresh dropdown in the settings area. There are two options here. By default, Tableau Prep will append data, meaning only the newly processed rows are added. However, you can change this to Create Table to replace any existing output with new output containing only those newly processed rows.

How it works…

Tableau has achieved a marvelously easy method to process data incrementally by comparing the existing output to the input for a particular field only. This method can save you hours of unnecessarily processing data that's already been processed previously.