Book Image

Microsoft Power BI Cookbook - Second Edition

By : Greg Deckler, Brett Powell
Book Image

Microsoft Power BI Cookbook - Second Edition

By: Greg Deckler, Brett Powell

Overview of this book

The complete everyday reference guide to Power BI, written by an internationally recognized Power BI expert duo, is back with a new and updated edition. Packed with revised practical recipes, Microsoft Power BI Cookbook, Second Edition, helps you navigate Power BI tools and advanced features. It also demonstrates the use of end-to-end solutions that integrate those features to get the most out of Power BI. With the help of the recipes in this book, you’ll gain advanced design and development insight, practical tips, and guidance on enhancing existing Power BI projects. The updated recipes will equip you with everything you need to know to implement evergreen frameworks that will stay relevant as Power BI updates. You’ll familiarize yourself with Power BI development tools and services by going deep into the data connectivity, transformation, modeling, visualization, and analytical capabilities of Power BI. By the end of this book, you’ll make the most of Power BI’s functional programming languages of DAX and M and deliver powerful solutions to common business intelligence challenges.
Table of Contents (16 chapters)
14
Other Book You May Enjoy
15
Index

Transforming and Cleansing Data

The transformations applied within Power BI's M queries serve to protect the integrity of the data model and to support enhanced analysis and visualization. The specific transformations to implement varies based on data quality, integration needs, and the goals of the overall solution. However, at a minimum, developers should look to protect the integrity of the model's relationships and to simplify the user experience via denormalization and standardization. Additionally, developers should check with owners of the data source to determine whether certain required transformations can be implemented in the source, or perhaps made available via SQL view objects such that Power Query (M) expressions are not necessary.

This recipe demonstrates how to protect a model from duplicate values within the source data that can prevent forming proper relationships within the data model, which may even result in query failures. While a simple scenario is used, this recipe demonstrates scenarios you may run into while attempting to merge multiple data sources and eliminating duplicates.

Getting ready

To prepare, start by importing the DimProduct and FactResellerSales tables from the AdventureWorksDW2019 database by doing the following:

  1. Open Power BI Desktop and choose Transform data from the ribbon of the Home tab to open the Power Query Editor.
  2. Create an Import mode data source query called AdWorksDW. This query should be similar to the following:
    let
        Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019")
    in
        Source
    
  3. Isolate this query in a query group called Data Sources.
  4. Right-click AdWorksDW and choose Reference, select the DimProduct table in the data preview area, and rename this query DimProduct. Right-click the EnglishProductName column and select Remove Other Columns.
  5. Repeat the previous step, but this time choose FactResellerSales. Expand the DimProduct column and only choose EnglishProductName. Rename this column to EnglishProductName.
  6. Drag the DimProduct and FactResellerSales queries into the Other Queries group and apply the queries to the data model.
  7. In the Model view of Power BI Desktop, attempt to form a relationship between the tables using the EnglishProductName columns from both tables. Note the warning that is displayed.

Figure 2.31: Many-Many relationship cardinality warning

For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.

How to Transform and Cleanse Data

We wish to remove duplicates from the EnglishProductName column in our DimProduct query. To implement this recipe, use the following steps:

  1. Remove any leading and trailing empty spaces in the EnglishProductName column with a Text.Trim function.
  2. Create a duplicate column of the EnglishProductName key column with the Table.DuplicateColumn function and name this new column Product Name.
  3. Add an expression to force uppercase on the EnglishProductName column via the Table.TransformColumns function. This new expression must be applied before the duplicate removal expressions are applied.
  4. Add an expression to the DimProduct query with the Table.Distinct function to remove duplicate rows.
  5. Add another Table.Distinct expression to specifically remove duplicate values from the EnglishProductName column.
  6. Drop the capitalized EnglishProductName column via Table.RemoveColumns.

    The final query should resemble the following:

    let
        Source = AdWorksDW,
        dbo_DimProduct = Source{[Schema="dbo",Item="DimProduct"]}[Data],
        RemoveColumns = Table.SelectColumns(dbo_DimProduct,{"EnglishProductName"}),
        TrimText = 
            Table.TransformColumns(
                RemoveColumns,{"EnglishProductName",Text.Trim}
            ),
        DuplicateKey = 
            Table.DuplicateColumn(
                TrimText,"EnglishProductName","Product Name"
            ),
        UpperCase = 
            Table.TransformColumns(
                DuplicateKey,{{"EnglishProductName", Text.Upper}}
            ),
        DistinctProductRows = Table.Distinct(UpperCase),
        DistinctProductNames = 
            Table.Distinct(
                DistinctProductRows, {"EnglishProductName"}
            ),
        RemoveEnglishProductName = 
            Table.RemoveColumns(
                DistinctProductNames,"EnglishProductName"
            )
    in
        RemoveEnglishProductName
    

How it works

In the TrimText expression, the Trim.Text function removes white space from the beginning and end of a column. Different amounts of empty space make those rows distinct within the query engine, but not necessarily distinct within the model. Therefore, it is always a good idea to use Trim.Text first and then remove duplicate rows and values.

In the next expression, DuplicateKey, the Table.DuplicateColumn function is used to duplicate the column where we will be removing duplicate values. We give this new column the name that we desire for our final column. This is done because we will need to transform the values in the column we are removing duplicates from, in order to account for mixed cases such as "Fender Set" and "Fender set". Thus, we wish to preserve the original values and casing by using this duplicate column.

In order to eliminate mixed casing issues, the UpperCase expression changes all values in the EnglishProductName column to uppercase using the Table.TransformColumns function, and specifying Text.Upper. The M engine considers mixed casing values unique, but the data model engine does not.

The next two expressions, DistinctProductRows and DistinctProductNames, simply demonstrate two different methods of using the Table.Distinct function. The first, DistinctProductRows, eliminates rows where the entire row (all column values) are identical. The second version looks only at the values in a single column when determining whether or not the row is a duplicate.

At this point, the query is now resilient to duplicate values and rows, mixed cases, and spaces. However, the EnglishProductName column is now in the uppercase format. Since we preserved a copy of the original values and casing in our Product Name column, we can simply drop the EnglishProductName column using the Table.RemoveColumns function.

We can now form a one-to-many relationship between our DimProduct and FactResellerSales tables.

Figure 2.32: Simple one-to-many relationship model

There's more...

To support troubleshooting, create a query that accesses the same source table and retrieves the values from the EnglishProductName column with more than one row.

let
    Source = AdWorksDW,
    dbo_DimProduct = Source{[Schema="dbo",Item="DimProduct"]}[Data],
    RemoveColumns = Table.SelectColumns(dbo_DimProduct,{"EnglishProductName"}),
    TrimText = 
        Table.TransformColumns(
            RemoveColumns,{"EnglishProductName",Text.Trim}
        ),
    UpperCase = 
        Table.TransformColumns(
            TrimText,{{"EnglishProductName", Text.Upper}}
        ),
    GroupedRows = 
        Table.Group(
            UpperCase, {"EnglishProductName"},
            {{"Rows", each Table.RowCount(_), Int64.Type}}
        ),
    Duplicates = Table.SelectRows(GroupedRows, each [Rows] > 1)
in
    Duplicates

The EnglishProductName column is selected, trimmed, converted to uppercase, grouped, and then filtered to always retrieve any duplicate key values. Disable the loading of this query, as the query would only exist for troubleshooting purposes.

See also