Book Image

Microsoft Tabular Modeling Cookbook

By : Paul te Braak
Book Image

Microsoft Tabular Modeling Cookbook

By: Paul te Braak

Overview of this book

Business Intelligence Semantic Models (BISM) is a technology that is designed to deliver analytical information to users through a variety of mechanisms that include model structure, definition, and design. This book demonstrates how to create BISM models so that information can be presented to users in an intuitive and easy-to-use format. Once the model is defined, we also show you how it can be managed and maintained so that the data in it remains current and secure. Microsoft Tabular Modeling Cookbook is an all-encompassing guide to developing, managing, creating, and using analytical models using the Business Intelligence Semantic Model (BISM). This title covers a range of modeling situations and common data analysis related problems to show you the techniques required to turn data into information using tabular modeling. Microsoft Tabular Modeling Cookbook examines three areas of tabular modeling: model development, model management and maintenance, and reporting. This book is a practical guide on how to develop semantic models and turn business data into information. It covers all phases of the model lifecycle from creation to administration and finally reporting. It also shows you how to create models which are designed to analyze data. All sections of BISM modeling from development to management and finally reporting are covered. The sections on development examine a wide range of techniques and tricks required to build models, including moving data into the model, structuring the model to manipulate the data, and finally the formulas required to answer common business questions; all of these are discussed in this book in detail. Finally, the book examines methods of reporting on the data within the model, including the creation of data-driven workbooks and reports for a powerful end user experience.
Table of Contents (18 chapters)
Microsoft Tabular Modeling Cookbook
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using tabular relationships to filter data


In addition to table names, column names, and data formats, a semantic model defines how tables within a model relate to each other. This relationship is important because it defines the output of calculations (which are defined in the model). This recipe shows how to create relationships and the effect that these relationships have on the model.

Getting ready

This recipe assumes that the model in the recipe Managing the appearance of tables and fields has been created.

The reader should recognize that the model is designed to show sales information by product, date, and customer. This type of modeling scenario is commonly referred to as a star schema and is shown in the following diagram. The Sales table is referred to as a fact table (since it stores the data facts that we wish to analyze—sales amount, tax amount, and so on) and the other tables are referred to as dimension (subject) tables because they hold descriptive information.

Extending the model further, the Products table is linked to the Subcategory table, and the Subcategory table is linked to the Category table. This is shown in the following diagram and is sometimes called a snowflake schema, since the dimension tables are not directly connected to the fact table:

An important point to note, is that each dimension table has a unique identifying field, for example, a product can be uniquely identified in the Products table through the product_id field. This is commonly referred to as the primary key for the table.

In contrast, the referring column (product_id in the Sales table) can have many occurrences of the product_id field and is commonly referred to as the foreign key.

How to do it…

Start with the workbook that was developed in the prior recipe.

  1. Drag the Product Name field onto the rows of the pivot table (under the Row Labels column) and the Sales column total_price onto values. Your screen should look like the following screenshot:

  2. Return to the PowerPivot window and select the product_id field and then click on the Create Relationship button (in the Design tab). A new window will open asking you to define the related (lookup) table and column.

  3. Select the Products option from the Related Lookup Table drop-down list and Product ID from the Related Lookup Column drop-down list. The Create button is now enabled. Click on Create.

  4. Return to the pivot table and refresh the model. The values for the Sum of total_price field have updated to reflect the total for each product.

  5. Now, create relationships between the following tables and columns:

    Source table

    Source column

     

    Related table

    Related column

    Sales

    customer_id

     

    Customers

    Customer ID

    Sales

    order_date

     

    Dates

    Day

    Tip

    Downloading the example code

    You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

  6. In the PowerPivot window, click on the Manage Relationships button. A new window will open showing all the relationships that have been built in the model.

    Tip

    Relationships can be created using this window. When the Create button is clicked, the same Create Relationships window opens. However, the Create Relationships window is not populated with the source table and columns.

  7. Click on the Diagram View button in the Home menu to switch to the diagram modeling view. Your screen will now show tables and columns (rather than data) and look like the following screenshot. Note that previously defined relationships appear as lines that connect tables.

    Tip

    You can also switch between the data and diagram views by toggling the two buttons at the bottom-right side of the PowerPivot application status bar.

  8. In the pivot table, replace the Product Name field with the Category field from the category table. The total value ($1,662,869.42) is repeated for all categories indicating that there is no relationship between the Sales table and Category table.

  9. From the Products table, select the Product ID field, and drag it to the product_id field of the Subcateogry table. A new relationship will be created between Products and Subcategory. Use this method to create a relationship between the Products table and the Category table. Refresh the pivot table to ensure that the total value is not duplicated.

  10. In the pivot table, drag the Product ID field from the Products table into the Values pane. The pivot table will now show the price and number of products for each category and will look like the following screenshot:

How it works…

The model has been extended to show two things. Firstly, by defining relationships between tables within the model, we have defined the filtering path for the data. This path is used to restrict rows between tables that have a relationship. Secondly, by adding a calculation (Sum of total_price and Count of Product ID), we have created measures that apply an aggregation function to the model fields. These are special types of measures within PowerPivot and are referred to as implicit measures (because the model implicitly defines a calculation for the field).

Relationships define how one table relates to another. In order to define a relationship, the join must occur on a field that has unique values in one of the tables (this is commonly called a primary key). The table that has the field with unique values is commonly called the related table. This can be seen in the diagram view, as shown in the following screenshot with the direction of the arrows on the relationships. Consider the Products table (which has a unique field product_id) that is related to the Sales table (through the product_id field in that table), but only the Products table needs to have a unique product_id. It is also said that the product_id field relates to many records in the Sales table. This can be seen by the direction of the arrow between Sales and Products, the related table has the arrow pointing towards it.

Relationships are important because they define how data is filtered and calculated when it is presented to the user.

Relationships are the primary mechanisms with the model that are used to filter data and perform calculations. That is, the relationship defines how data is filtered when values are shown to the user. Although this is a new concept, the concept of relationships is important because they have important implications with the way that the model determines what data to show to the user. Consider the pivot table shown in the following screenshot—Subcategory on rows and Sum of total_price, Count of Product ID, and Count of category_id as measures:

Now, consider the relationship defined in the model. This is summarized in the following screenshot:

The rows in the pivot show the subcategory which defines a filter for each row (that is a filter for each subcategory). This filter can then be applied to the Products table, which in turn is applied to the Sales table. It might be better to say that the rows of the Sales table are filtered by the Products table and then those rows are filtered by the Subcategory table. This is why the calculations Sum of total_price and Count of Product ID show the correct values. The filter on rows of the Sales table and rows of the Products table can be applied in the direction of the arrows of the relationships.

However, this is not the case when Subcategory is shown with data from the Category table—a filter will only be applied in the direction that a relationship is created. This is why the calculation Count of category_id shows the same number for each subcategory. With the subcategory on rows, a filter is created which can filter the Products table but this filter cannot then applied in an upstream manner to the Category table.

The application of filters may seem unintuitive at first, especially with a relationship design such as the one among Products, Category, and Subcategory, but in reality the model should be designed so that the filters can be applied in a single direction. There is also, the question of unmatched values between fields used in the relationship and how they are treated by the model. For example, what would happen if we had a product_id field in the Sales table that did not have a match in the Products table? Would this even be allowed in the model? The tabular model handles this situation very elegantly. The model allows this situation (without error), and unmatched values are assigned to a blank placeholder. For example, if there was a product in the Sales table and no product in the Products table, it would be shown as blank when Products, Category, or Subcategory is used in the pivot.

We have also indicated that the model automatically created implicit measures. The term measure is common in business intelligence tools to specify that a calculated value is returned. Often, this can be a simple calculation, for example, the count of rows or the sum of a field. The important thing to remember is that measure is a single value that is returned from the model (when the model is filtered). Usually, measures are defined by a model designer, but they need not be. This is the case with an implicit measure. An implicit measure is defined automatically, depending on the data type of the column that is being used. Numeric columns are automatically summed, whereas text columns are automatically counted.

There's more…

The aggregation function of an implicit measure is initially set by the underlying data type. However, the user can change this within the pivot table by editing the measure in the pivot table. This can be done in the following manner:

  1. Alter the pivot table so that it shows subcategory on rows and total_profit as values. By default, the measure will show Sum of total_profit. Right-click on the measure and select Edit Measure… from the pop-up window.

  2. A new window will open, displaying the aggregation function with the measure. Select Average from the function list and change the measure name to Average Profit. The Measure Settings window should look like the following screenshot:

  3. Also, notice that a formula is used to define the measure, for Average Profit, the formula is =AVERAGE('Sales'[total_profit]). Click on the OK button and note that the pivot table now contains the new measure Average Profit.

Implicit measures that have been created in the model can be seen by exposing the measures in the Advanced tab of the PowerPivot window (the Advanced tab must be activated). This is shown in the following screenshot: