Book Image

Tableau 10 Business Intelligence Cookbook

By : Donabel Santos, Paul Banoub
Book Image

Tableau 10 Business Intelligence Cookbook

By: Donabel Santos, Paul Banoub

Overview of this book

Tableau is a software tool that can speed up data analysis through its rich visualization capabilities, and help uncover insights for better and smarter decision making. This book is for the business, technology, data and analytics professionals who use and analyze data and data-driven approaches to support business operations and strategic initiatives in their organizations. This book provides easy-to-follow recipes to get the reader up and running with Tableau 10, and covers basic to advanced use cases and scenarios. The book starts with building basic charts in Tableau and moves on to building more complex charts by incorporating different Tableau features and interactivity components. There is an entire chapter dedicated to dashboard techniques and best practices. A number of recipes specifically for geospatial visualization, analytics, and data preparation are also covered. By the end of this book, you’ll have gained confidence and competence to analyze and communicate data and insights more efficiently and effectively by creating compelling interactive charts, dashboards, and stories in Tableau.
Table of Contents (17 chapters)
Tableau 10 Business Intelligence Cookbook
Credits
About the Author
Acknowledgements
About the Reviewer
www.PacktPub.com
Preface
Index

Different types of calculated fields


There are different types of calculated fields. They differ in function and behavior. They also differ in where they are executed and in what order.

Regular calculated fields

Let us look some different levels in the fields:

Row level

Simple calculated fields are fields that include expressions and/or formulas and are typically handled by the data source.

Row-level calculated fields are fields that can be calculated at the row level or the most granular level. For example, look at the following formula for concatenating a series of strings—first name, a space, and a last name:

FirstName + " " + LastName

The preceding code can derive the full name using only the information within the row itself. It does not need to refer to the result of a group first before it can concatenate the first and last names.

Group level (or aggregation)

An aggregated calculated field involves some calculation after some records are grouped. For example, the expression below checks the total profits of a group:

SUM(Profit) > 0

The term group seems somewhat vague. However, in Tableau, the group is determined by all other discrete fields present in the view.

In the aforementioned expression, a single record is not sufficient to determine the total profit of the group. All the individual records for the group must be processed to come up with a collective profit amount.

Row level versus aggregations

There is a distinct difference between row-level calculations and aggregations. Row-level calculations are processed first since they occur for each record. Aggregations happen after the row-level calculations are processed.

To illustrate, have a look at the following comparison:

Operation

Formula 1

Formula 2

 

Profit/Sales

SUM(Profit)/SUM(Sales)

1

Each record gets a ratio

All of the profits will be summed up, and all sales will be summed up

2

 

Total of profit will be divided by total of sales

When dragged to the view

Will be aggregated

Default aggregation is SUM, so all the individual ratio values will be added together

This will produce an incorrect ratio for the group

Will not be aggregated, since the aggregation is already embedded in the formula

Will show as AGG

This will produce a correct ratio for the group

Table calculations

Table calculations, as the name implies, are calculations that happen at the table. Unlike regular calculations, table calculations do not happen at the data source. These happen in Tableau locally and are calculated based on what is in the view and how the data is laid out.

When I present the concept of table calculations in my training sessions, I usually start by asking students for a random number. Let's say they gave me the following numbers:

If we were trying to get the running total of the numbers, and we start with the top left number, what will be our first result? That's easy, it's 3.

But what if I asked you what the next number is? It's not so straightforward, is it? The following points will help answer the question:

  1. Which direction is next? Should you go across and add 5, or go down and add 2?

  2. What happens when we get to the next group? Do we keep going? Or do we start again?

This is the pivotal moment of the discussion. Students start to really understand how table calculations work at this point, and it makes it clear that table calculations are affected by the following factors:

  • What is in the view and what is filtered out of the view. If we were to take all the 5s out, the running total numbers would change.

  • How the data is laid out. If we were to swap the rows and columns or move the columns around, the running total numbers would be different.

You can add table calculations from a continuous (green) pill on your view. When you click on the dropdown, you can find some predefined Quick Table Calculations:

If the fields required to do the quick table calculation are not present, some of the options will be disabled. In the aforementioned case, if date fields are missing, any YTD (year to date) or YOY (year over year) calculations will be grayed out.

You can also create a customized table right off the bat by selecting the Add Table Calculations… option. Here is the window that appears when you click on Add Table Calculations..., which looks different from the Tableau V10 version. The window is modeless starting in V10, meaning you don't have to close the window to work on your view. You can still work on your view with this window open:

Some table calculations also allow for secondary table calculations, such as Running Total. Once you select the option for Secondary Table Calculation, a second window opens with additional options:

You can also use table calculation functions in your formulas. If you open the help window in the calculated field editor and choose Table Calculation from the dropdown, you will see a list of table calculation functions supported in Tableau, as shown in the following screenshot:

Level of detail expressions

Granularity or level of detail in Tableau, by default, is based on the discrete values you have in your view. Each discrete value is a by or per value—a slicer for the measures in the view.

For example, here we are getting the overall sum:

Once we add another discrete value, for example Category, to the Columns shelf, we are breaking the SUM(Sales) value down to a subtotal for each Category:

When we add one more discrete pill to the other Marks cards, the number is further broken down. In the following example, each bar color represents the SUM(Sales) for that Category and Segment:

But what if we wanted to have different levels of aggregation in one view, such as show a grand total but also the average per customer? A grand total will be more aggregated, and an average per customer will be less aggregated. Level of detail (LOD) expressions allow us to combine different aggregation levels in the same view.

The main syntax of an LOD expression is as follows:

LOD expressions produce a separate call to the data source and produce a row-level value as far as Tableau is concerned.

Tableau V10 now allows expressions to be used instead of dimension field names. In previous versions, if you needed to derive a dimension based on some logic, you needed to create a new field first and then use it in the LOD expression. In Tableau V10, this step can be eliminated by embedding another expression in place of the dimension name. For example, if you want to fix average sales per the company code, and the company code is the second substring in the Company field, you can simply use the corresponding expression that splits the company code in the LOD expression:

{FIXED TRIM(SPLIT([Company], "-", 2)): AVG(Sales)}

There are three types of LOD expression: FIXED, INCLUDE, and EXCLUDE.

FIXED

A FIXED LOD expression only considers the dimensions or expressions in the dimension declaration in its calculation. It is a fairly static calculation, which is not affected by what is dragged to or off the view. FIXED LOD expressions are not affected by dimension filters, but they respect data source and context filters.

In the following example, the profit values for each of the categories are plotted in individual line graphs. There is also a big, thick line graph that represents the average of the categories. This average is calculated using some FIXED LOD expressions. This means that the average value will not change even if some of the categories are unchecked from the filter. This is shown in the following screenshot:

The formula for the average fixes the sum of profit for category, year, and month first. This is then passed to an outer expression, which then calculates the average of the sums based on a time period (year and month):

INCLUDE

An INCLUDE LOD expression computes values as if the fields in the dimension declaration have been dragged onto the view. INCLUDE LODs are affected by dimension filters.

INCLUDE LODs will be used when you want a lower level of detail aggregation in the view.

In the following graph, although the view shows only the sum of profit for each category, a finer level of aggregation is introduced, which is the average per customer. This is done using an INCLUDE LOD expression, as shown next:

The formula for the Avg per Customer calculation is as follows:

This means that the calculation will create an aggregation based on all the dimensions in the view and the Customer Name, as if the Customer Name was dragged onto the view.

EXCLUDE

An EXCLUDE LOD expression computes values as if the fields in the dimension declaration have been removed from the view. EXCLUDE LODs are affected by dimension filters.

EXCLUDE LODs will be used when you want a higher level of detail aggregation in the view:

The formula for the Exclude State for Region Sales field is as follows:

In the aforementioned example, the view presents a level of detail at the State level represented by the map, but the tooltip shows one aggregation by State and another by Region regardless of State.

By default, if we drag Sales onto the view, it will be SUM(Sales) per Region per State. However, if we want to display both SUM(Sales) per State (less aggregated) and SUM(Sales) per Region (more aggregated), we can use the EXCLUDE LOD expression to ignore the State field when calculating SUM(Sales) per Region.

Significance of LODs

Level of detail (LOD) expressions formed one of the most anticipated features introduced in Tableau v9. LODs add immense flexibility to Tableau and simplify how complex calculations were done in the past versions.

Some of the most common use cases where LODs can be used are listed in Bethany Lyons' article on Top 15 LOD expressions, which can be found at http://bit.ly/top15LOD. A few use cases are as follows:

  • Cohort analysis: Depending on the situation, we may need to keep or fix specific data points in order to perform group analysis. For example, we may need to know when customers first ordered, first joined a program, first got paid and so on, and from there perform additional aggregations.

  • Aggregating an aggregate: LODs can be used to calculate the aggregate of aggregates. For example, we may need to know the average of maximum or minimum values and make the calculation immune to dimension filters.

  • Comparing to others: LODs can be used to compare one data point to others. For example, to compare one vendor to other vendors, or one customer to other vendors, we may be required to have certain fields fixed to that customer or vendor. LOD calcs can help with this.

  • Relative period filtering: LOD expressions allow us to find the maximum date in a data set to facilitate performance analysis through relative filtering, for example if we want to compare YTD values of this year with last year's.

  • Proportional brushing: LOD expressions facilitate proportional brushing. This is a technique that allows us to compare one value to another without filtering the "other" values while showing the two values together, relative to the whole.

Despite being quite powerful and flexible, LODs have limitations, and we need to be aware of them. The limitations are listed at http://bit.ly/lod-overview, and include the following:

  • Avoiding using floating point measures in LOD aggregations

  • Using the linking field from the primary data source in the view before using LOD expressions from the secondary data source

Order of operations

It is important to understand the order of operations in Tableau, especially with the filters. For example, do dimension filters affect LOD expressions? If this is not understood, there will be some unexpected behaviors.

The following is Tableau's updated illustration for order of operations (as of August 2016), as shown in the Tableau official online documentation:

Source – Tableau online documentation

You can find the original at http://bit.ly/tableau-orderofoperations.

Deep dive

What this section has covered is barely the tip of the iceberg, and I encourage you to learn more about calculations in depth. Understanding the types of calculations and when they are appropriate to use can make you really efficient and effective in Tableu.

Note

Here is a great article by Alan Aldridge on choosing the right calculation type:

https://blog.databender.net/2016/03/06/choosing-the-right-calculation-type/