In addition to table names, column names, and data formats, a semantic model defines how tables within a model relate to each other. This relationship is important because it defines the output of calculations (which are defined in the model). This recipe shows how to create relationships and the effect that these relationships have on the model.
This recipe assumes that the model in the recipe Managing the appearance of tables and fields has been created.
The reader should recognize that the model is designed to show sales information by product, date, and customer. This type of modeling scenario is commonly referred to as a star schema and is shown in the following diagram. The
Sales table is referred to as a fact table (since it stores the data facts that we wish to analyze—sales amount, tax amount, and so on) and the other tables are referred to as dimension (subject) tables because they hold descriptive information.
Extending the model further, the
Products table is linked to the
Subcategory table, and the
Subcategory table is linked to the
Category table. This is shown in the following diagram and is sometimes called a snowflake schema, since the dimension tables are not directly connected to the fact table:
An important point to note, is that each dimension table has a unique identifying field, for example, a product can be uniquely identified in the
Products table through the
product_id field. This is commonly referred to as the primary key for the table.
Start with the workbook that was developed in the prior recipe.
Product Namefield onto the rows of the pivot table (under the
Row Labelscolumn) and the Sales column
total_priceonto values. Your screen should look like the following screenshot:
Return to the PowerPivot window and select the
product_idfield and then click on the Create Relationship button (in the Design tab). A new window will open asking you to define the related (lookup) table and column.
Return to the pivot table and refresh the model. The values for the Sum of total_price field have updated to reflect the total for each product.
Now, create relationships between the following tables and columns:
Click on the Diagram View button in the Home menu to switch to the diagram modeling view. Your screen will now show tables and columns (rather than data) and look like the following screenshot. Note that previously defined relationships appear as lines that connect tables.
You can also switch between the data and diagram views by toggling the two buttons at the bottom-right side of the PowerPivot application status bar.
In the pivot table, replace the
Product Namefield with the
Categoryfield from the
categorytable. The total value (
$1,662,869.42) is repeated for all categories indicating that there is no relationship between the
Productstable, select the
Product IDfield, and drag it to the
product_idfield of the
Subcateogrytable. A new relationship will be created between
Subcategory. Use this method to create a relationship between the
Productstable and the
Categorytable. Refresh the pivot table to ensure that the total value is not duplicated.
In the pivot table, drag the
Product IDfield from the
Productstable into the Values pane. The pivot table will now show the price and number of products for each category and will look like the following screenshot:
The model has been extended to show two things. Firstly, by defining relationships between tables within the model, we have defined the filtering path for the data. This path is used to restrict rows between tables that have a relationship. Secondly, by adding a calculation (Sum of total_price and Count of Product ID), we have created measures that apply an aggregation function to the model fields. These are special types of measures within PowerPivot and are referred to as implicit measures (because the model implicitly defines a calculation for the field).
Relationships define how one table relates to another. In order to define a relationship, the join must occur on a field that has unique values in one of the tables (this is commonly called a primary key). The table that has the field with unique values is commonly called the related table. This can be seen in the diagram view, as shown in the following screenshot with the direction of the arrows on the relationships. Consider the
Products table (which has a unique field
product_id) that is related to the
Sales table (through the
product_id field in that table), but only the
Products table needs to have a unique
product_id. It is also said that the
product_id field relates to many records in the
Sales table. This can be seen by the direction of the arrow between
Products, the related table has the arrow pointing towards it.
Relationships are the primary mechanisms with the model that are used to filter data and perform calculations. That is, the relationship defines how data is filtered when values are shown to the user. Although this is a new concept, the concept of relationships is important because they have important implications with the way that the model determines what data to show to the user. Consider the pivot table shown in the following screenshot—
Subcategory on rows and
Sum of total_price,
Count of Product ID, and
Count of category_id as measures:
The rows in the pivot show the subcategory which defines a filter for each row (that is a filter for each subcategory). This filter can then be applied to the
Products table, which in turn is applied to the
Sales table. It might be better to say that the rows of the
Sales table are filtered by the
Products table and then those rows are filtered by the
Subcategory table. This is why the calculations
Sum of total_price and
Count of Product ID show the correct values. The filter on rows of the
Sales table and rows of the
Products table can be applied in the direction of the arrows of the relationships.
However, this is not the case when
Subcategory is shown with data from the
Category table—a filter will only be applied in the direction that a relationship is created. This is why the calculation
Count of category_id shows the same number for each subcategory. With the subcategory on rows, a filter is created which can filter the
Products table but this filter cannot then applied in an upstream manner to the
The application of filters may seem unintuitive at first, especially with a relationship design such as the one among
Subcategory, but in reality the model should be designed so that the filters can be applied in a single direction. There is also, the question of unmatched values between fields used in the relationship and how they are treated by the model. For example, what would happen if we had a
product_id field in the
Sales table that did not have a match in the
Products table? Would this even be allowed in the model? The tabular model handles this situation very elegantly. The model allows this situation (without error), and unmatched values are assigned to a blank placeholder. For example, if there was a product in the
Sales table and no product in the
Products table, it would be shown as blank when
Subcategory is used in the pivot.
We have also indicated that the model automatically created implicit measures. The term measure is common in business intelligence tools to specify that a calculated value is returned. Often, this can be a simple calculation, for example, the count of rows or the sum of a field. The important thing to remember is that measure is a single value that is returned from the model (when the model is filtered). Usually, measures are defined by a model designer, but they need not be. This is the case with an implicit measure. An implicit measure is defined automatically, depending on the data type of the column that is being used. Numeric columns are automatically summed, whereas text columns are automatically counted.
The aggregation function of an implicit measure is initially set by the underlying data type. However, the user can change this within the pivot table by editing the measure in the pivot table. This can be done in the following manner:
Alter the pivot table so that it shows subcategory on rows and
total_profitas values. By default, the measure will show
Sum of total_profit. Right-click on the measure and select Edit Measure… from the pop-up window.
A new window will open, displaying the aggregation function with the measure. Select Average from the function list and change the measure name to
Average Profit. The Measure Settings window should look like the following screenshot:
Also, notice that a formula is used to define the measure, for
Average Profit, the formula is
=AVERAGE('Sales'[total_profit]). Click on the OK button and note that the pivot table now contains the new measure
Implicit measures that have been created in the model can be seen by exposing the measures in the Advanced tab of the PowerPivot window (the Advanced tab must be activated). This is shown in the following screenshot: