Book Image

The Tableau Workshop

By : Sumit Gupta, Sylvester Pinto, Shweta Sankhe-Savale, JC Gillet, Kenneth Michael Cherven
Book Image

The Tableau Workshop

By: Sumit Gupta, Sylvester Pinto, Shweta Sankhe-Savale, JC Gillet, Kenneth Michael Cherven

Overview of this book

Learning Tableau has never been easier, thanks to this practical introduction to storytelling with data. The Tableau Workshop breaks down the analytical process into five steps: data preparation, data exploration, data analysis, interactivity, and distribution of dashboards. Each stage is addressed with a clear walkthrough of the key tools and techniques you'll need, as well as engaging real-world examples, meaningful data, and practical exercises to give you valuable hands-on experience. As you work through the book, you'll learn Tableau step by step, studying how to clean, shape, and combine data, as well as how to choose the most suitable charts for any given scenario. You'll load data from various sources and formats, perform data engineering to create new data that delivers deeper insights, and create interactive dashboards that engage end-users. All concepts are introduced with clear, simple explanations and demonstrated through realistic example scenarios. You'll simulate real-world data science projects with use cases such as traffic violations, urban populations, coffee store sales, and air travel delays. By the end of this Tableau book, you'll have the skills and knowledge to confidently present analytical results and make data-driven decisions.
Table of Contents (12 chapters)
Preface

Data Blending

There might be times when the linking fields vary between the different worksheets. Also, if the data sources are too large, joining them with the conventional joins might be very time consuming. In that case, you can perform a data blend instead of joining the data.

In data blending, you query the data between the two data sources and then combine the result at the aggregation level defined in the worksheet of the primary data source. The primary data source will be the one from which the first dimension or measure is added in the view. Also, the results would be similar to a left join since all the records from the primary data will appear in the worksheet.

Exercise 2.08: Creating a Data Blend Using the Orders and People Tables

In this exercise, you will learn how to create a data blend for the Orders table with the People table. The following steps will help you complete this exercise:

  1. Load the Sample – Superstore dataset in your Tableau instance.
  2. Connect to the Orders table and go to Sheet 1.
Figure 2.66: Adding the Orders table in Tableau

Figure 2.66: Adding the Orders table in Tableau

  1. In a data blend, create the linking at the worksheet level and not at the data source level. Inside the worksheet, you will be able to see the Orders table and its columns. Add a new data source, as follows (see the highlighted option):
Figure 2.67: Adding data option inside a worksheet

Figure 2.67: Adding data option inside a worksheet

  1. This should lead to the same menu that you get for connecting to a data source. Click on Microsoft Excel, navigate to the location of the Sample – Superstore.xls Excel file, and click on Open to open the Connect pane.
Figure 2.68: Adding another data source in Tableau

Figure 2.68: Adding another data source in Tableau

  1. Now, drag the People table to the canvas and go to Sheet 1 like before:
Figure 2.69: Adding the People data to Tableau

Figure 2.69: Adding the People data to Tableau

Now, you will be able to see the two data sources, as follows:

Figure 2.70: Data sources listed inside the worksheet

Figure 2.70: Data sources listed inside the worksheet

  1. Add a relationship between these data sources to use them. To do that, click on Data | Edit Relationships… to open the popup.

    Note

    If you are using a Tableau version later than 2020.1, this may be called Edit Blend Relationships... to differentiate between relationships made directly in the Data Source tab.

Figure 2.71: Edit data properties window

Figure 2.71: Edit data properties window

  1. Based on the field names, the relationship can be set to Automatic by default. To change it, click on Custom and add the relationship. Edit the relationship to Customer Name and Person, as highlighted in the following figure. Select Region and then Edit… before making the selections in the popup. Click OK to add the relationship:
Figure 2.72: Selecting the matching columns between the two data sources

Figure 2.72: Selecting the matching columns between the two data sources

Thus, you have successfully blended the two data sources and can visualize your data in the next exercise.

Exercise 2.09: Visualizing Data Created from a Data Blend

In the previous exercise, you learned how to perform data blending between two data sources. In this exercise, you will create a visualization on the blended data to understand the application of a data blend – again, you will continue using the Orders table and the People table for this purpose. Note that a blend will only be active if you use the fields from these two data sources; otherwise, it will remain inactive.

Perform the following steps to complete this exercise:

  1. On the Orders data, click and drag Customer Name to Rows.

    Note

    Tableau versions later than 2020.1 may give a warning at this step that the field may contain more than 1000 rows. If this is the case, select Add all members to proceed.

Figure 2.73: Adding the primary data source

Figure 2.73: Adding the primary data source

This will now become your primary data source, indicated by the blue tick on the data source.

  1. Repeat the step for the People data source.
Figure 2.74: Adding the secondary data source

Figure 2.74: Adding the secondary data source

This will become your secondary data source, indicated by the orange tick on the data source. Also, notice the red linking icon that is used to link the two data sources.

Figure 2.75: Primary and secondary data source icons

Figure 2.75: Primary and secondary data source icons

  1. When you filter on Person for the four people that you have in the People data, you will see that you have linked these values between these data sources. Click on the Person column dropdown and then Filter…, uncheck the Null value, and click OK to add the filter.
Figure 2.76: Filtering to remove unmatched values

Figure 2.76: Filtering to remove unmatched values

You will get the following output, which shows the customer name matching Person:

Figure 2.77: Data blend output

Figure 2.77: Data blend output

Using data blending, you can display data from various sources at multiple aggregation levels in different sheets. For instance, in one sheet, you can blend the data at the Year aggregation level, while in the other you can blend at the Month level.

This is possible because, in a data blend, the data sources are not joined at the input source. This provides the flexibility to have large data sources and blend only in certain sheets where required. This can help make the dashboard render faster.

Limitations of Data Blending

Data blending does not work with certain aggregation levels, such as MEDIAN and COUNTD (count distinct).

You cannot publish the blended data sources on Tableau Server directly. First, you need to publish the data sources individually on the server and then blend the published data sources in your Tableau Desktop instance. Publishing data sources means uploading your data and directly storing it on Tableau Server.

Another limitation is that the data used from the secondary data source must be at a higher aggregation level compared to the primary data source. If the aggregation level is not correct, an asterisk (*) will appear in the visualizations, indicating a one-to-many join aggregation level. You can swap the data sources to resolve this error.

This concludes the theory sections of this lesson. Next, you will put all you have learned into practice in the following activities.

Activity 2.01: Identifying the Returned Orders

As an analyst, you may encounter a situation where you would like to assess business performance by sales. It is therefore important to understand how many orders are fulfilled and how many are returned. If certain products are being returned frequently, it is a point of investigation as it can have serious consequences on the business.

Usually, order information is kept separate from returns information. Hence, to bring this information together, you need to join the two data sources.

For this activity, you will use the Orders and Returns tables from the Sample - Superstore Excel file. You are already aware of the Orders table.

The Returns table consists of the Order ID and Returned columns. Order ID is the ID that would match with the Orders table. The Returned column indicates Yes for the order ID.

Figure 2.78: Returns sheet columns

Figure 2.78: Returns sheet columns

The objective is to identify the returned orders after combining them with the main Orders table so that you may determine which orders were both fulfilled and returned.

The steps are as follows:

  1. Open the Sample - Superstore dataset in your Tableau instance.
  2. Rename the data source to Activity 1.
  3. Drag the Orders table onto the canvas.
  4. Repeat the same steps for the Returns table.
  5. You need to bring all the Orders and Returns table values into the combined dataset. Can you identify the correct join based on the requirement? Remember that for an order to be returned, it should always be completed first. What can be interpreted if you change the join types to left, right, or full outer in this case?
  6. Identify how many products were returned from the data grid. (An order can have multiple products clubbed in it.)

Final Output Expected:

Figure 2.79: Choose the correct join

Figure 2.79: Choose the correct join

In this activity, you strengthened your knowledge of various joins and their outputs. You also learned how to interpret the results by changing the join types.

Note

The solution to this activity can be found here: https://packt.link/CTCxk.

Activity 2.02: Preparing Data for Visualization

Now that you have joined the data, the next step is to make sure that the data is ready for visualization. This involves performing data transformation activities such as cleaning the data by removing the null values. You may also be required to rename certain columns or add aliases, split the columns, and so on.

In this activity, you will perform some data transformation steps based on the left join output of the previous activity.

This activity will help you to strengthen the concepts of data transformation in Tableau. This is a very important process in any Tableau project. Hence, it becomes crucial that you are well experienced in doing these in Tableau.

The objective of this activity is to transform the data into a cleaned form for visualization. You need to first create an extract for this data source. Then you need to display the data only for the Furniture and Office Supplies categories. Is there a way to do this using the extract properties? You will also clean up the final data by changing any nulls to blanks. Let's also remove repeated terms such as Class from the Ship Mode column.

Once done, your data should be ready for visualization.

Continuing from Activity 2.01, the following steps will help you complete this activity:

  1. Open the Sample - Superstore dataset in your Tableau instance.
  2. Create a data extract for this data.
  3. Add a filter on the data to pull the Furniture and Office Supplies categories. Check the row count.
  4. Transform the data by aliasing a few columns.
  5. Alias the null values from the columns of the Returns table to blanks.
  6. Remove the word Class from the Ship Mode column.

Once completed, you should get the following output:

Final Output Expected:

Figure 2.80: Final output for the activity

Figure 2.80: Final output for the activity

In this activity, you learned how to extract the data. You also added filters for the Category column to just pull the selected categories. Many times, you will work on projects that require the data to be segregated at the beginning, such as regional data. These filters help you to achieve exactly this. You also transformed the data using aliases, making it much cleaner by removing repeated words and nulls.

Note

The solution to this activity can be found here: https://packt.link/CTCxk.