Book Image

Tableau Prep Cookbook

By : Hendrik Kleine
Book Image

Tableau Prep Cookbook

By: Hendrik Kleine

Overview of this book

Tableau Prep is a tool in the Tableau software suite, created specifically to develop data pipelines. This book will describe, in detail, a variety of scenarios that you can apply in your environment for developing, publishing, and maintaining complex Extract, Transform and Load (ETL) data pipelines. The book starts by showing you how to set up Tableau Prep Builder. You’ll learn how to obtain data from various data sources, including files, databases, and Tableau Extracts. Next, the book demonstrates how to perform data cleaning and data aggregation in Tableau Prep Builder. You’ll also gain an understanding of Tableau Prep Builder and how you can leverage it to create data pipelines that prepare your data for downstream analytics processes, including reporting and dashboard creation in Tableau. As part of a Tableau Prep flow, you’ll also explore how to use R and Python to implement data science components inside a data pipeline. In the final chapter, you’ll apply the knowledge you’ve gained to build two use cases from scratch, including a data flow for a retail store to prepare a robust dataset using multiple disparate sources and a data flow for a call center to perform ad hoc data analysis. By the end of this book, you’ll be able to create, run, and publish Tableau Prep flows and implement solutions to common problems in data pipelines.
Table of Contents (11 chapters)

Connecting to cloud databases

In this recipe, we'll connect to a local Amazon AWS Athena database. Just like on-premises data connections, Tableau has made it as easy as possible to connect securely to cloud data sources. You'll find many connections for popular cloud providers including Microsoft, Google, and Amazon. Each data connection dialog has been customized to the technology you're attempting to connect to. This means you won't see irrelevant fields for the selected connection type, reducing the complexity of cloud connections.

Getting ready

In order to follow along with this recipe, you must have data stored and have access to that data in Amazon AWS Athena.

Tip

Getting set up on AWS Athena is beyond the scope of this book. However, if you wish to explore this option, the simplest way to get started is to create an account at https://aws.amazon.com/, then upload data to S3, and make it available to Athena by using AWS Glue. To use the same sample data as this recipe, download the Sample files 2.5 folder from the book's GitHub repository.

How to do it…

To get started, ensure you have Tableau Prep Builder open, then follow these steps:

  1. From the home screen, click the Connect to Data button, then search for Athena in the Connect pane. Select Amazon Athena to continue.
  2. In the Connection dialog, enter the details for your AWS Athena instance and click Sign In to continue:

    a) The Server field for Athena needs to be populated with the region information. The format for this is athena.[region].amazonaws.com. For example, athena.us-east-1.amazonaws.com or athena.eu-west-1.amazonaws.com.

    b) The staging directory is where your Athena results are stored in AWS S3 and follows the format s3://[s3 bucket]/[s3 folder]. For example, s3://company/orders.

    c) Finally, you'll need your AWS access key information. For information on how to obtain this, see the AWS documentation at https://docs.aws.amazon.com/general/latest/gr/aws-sec-cred-types.html#access-keys-and-secret-access-keys.

    d) You'll also need to install the Amazon Athena JDBC driver, which Tableau provides on its download page at https://www.tableau.com/support/drivers.

  3. Next, select the appropriate Catalogue from the dropdown. In Athena terminology, this is the data source:
    Figure 2.24 – Select the Athena data source from the Catalogue dropdown

    Figure 2.24 – Select the Athena data source from the Catalogue dropdown

  4. In the last step, select the database of your choice and drag the table you need onto the flow canvas. In our example, I've selected a database named opssalesdb and dragged a table named results onto the flow canvas:
Figure 2.25 – Selecting an Athena table

Figure 2.25 – Selecting an Athena table

By following the steps in this recipe, you are now able to connect Tableau Prep to cloud databases.

How it works…

Similar to on-premises data connections, Tableau Prep provides a simplified user interface on top of the database driver, so you can easily configure the connection. In this recipe, we've used the Athena JDBC driver in the background and configuring it is as easy as any other connections.

There's more…

The following screenshot shows the clear mapping between the Athena web interface and the Tableau Prep UI:

Figure 2.26 – Mapping Athena terminology to Tableau Prep

Figure 2.26 – Mapping Athena terminology to Tableau Prep

Let's move on to the next recipe!