Book Image

Microsoft Power BI Cookbook - Second Edition

By : Greg Deckler, Brett Powell
Book Image

Microsoft Power BI Cookbook - Second Edition

By: Greg Deckler, Brett Powell

Overview of this book

The complete everyday reference guide to Power BI, written by an internationally recognized Power BI expert duo, is back with a new and updated edition. Packed with revised practical recipes, Microsoft Power BI Cookbook, Second Edition, helps you navigate Power BI tools and advanced features. It also demonstrates the use of end-to-end solutions that integrate those features to get the most out of Power BI. With the help of the recipes in this book, you’ll gain advanced design and development insight, practical tips, and guidance on enhancing existing Power BI projects. The updated recipes will equip you with everything you need to know to implement evergreen frameworks that will stay relevant as Power BI updates. You’ll familiarize yourself with Power BI development tools and services by going deep into the data connectivity, transformation, modeling, visualization, and analytical capabilities of Power BI. By the end of this book, you’ll make the most of Power BI’s functional programming languages of DAX and M and deliver powerful solutions to common business intelligence challenges.
Table of Contents (16 chapters)
14
Other Book You May Enjoy
15
Index

Combining and Merging Queries

The full power of Power BI's querying capabilities is in the integration of distinct queries representing different data sources via its merge and append transformations. Retrieval processes that consolidate files from multiple network locations or integrate data from multiple data sources can be developed efficiently and securely. Additionally, the same join types and data transformation patterns SQL and ETL developers are familiar with can be achieved with the M language. This recipe provides examples of combining sources into a single query and leveraging the table join functions of M to support common transformation scenarios.

Getting ready

To follow along with this recipe, you can use the Merge Queries and Append Queries icons on the Home tab of the Power Query Editor to generate the join expressions used in this recipe. However, as joining queries is fundamental to the retrieval process, it is recommended to learn how to use the Table.Join, Table.NestedJoin, and Table.Combine functions.

To get ready for this recipe, import the DimCustomer and FactCallCenter tables from the AdventureWorksDW2019 database by doing the following:

  1. Open Power BI Desktop and choose Transform data from the ribbon of the Home tab to open the Power Query Editor.
  2. Create an Import mode data source query called AdWorksDW. This query should be similar to the following:
    let
        Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019")
    in
        Source
    
  3. Isolate this query in a query group called Data Sources.
  4. Right-click AdWorksDW and choose Reference, select the DimEmployee table in the data preview area, and rename this query DimEmployee.
  5. Repeat step 4 but choose the FactInternetSales table and name this query Sales2011. Filter the OrderDate column to be between January 1, 2011 and December 31, 2011.
    Let
        Source = AdWorksDW,
        dbo_FactInternetSales = Source{[Schema="dbo",Item="FactInternetSales"]}[Data],
        #"Filtered Rows" = 
            Table.SelectRows(
                dbo_FactInternetSales, each 
                    [OrderDate] >= #datetime(2011, 1, 1, 0, 0, 0) and 
                    [OrderDate] <= #datetime(2011, 12, 31, 0, 0, 0))
    in
        #"Filtered Rows"
    
  6. Right-click the Sales2011 query and choose Duplicate. Name this query Sales2012. Edit the Table.SelectRows expression to filter dates between January 1, 2012 and December 31, 2012.
  7. Repeat step 6, naming this new query Sales2013 and filtering for dates between January 1, 2013 and December 31, 2013.

For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.

How to Combine Queries

The goal of this example is to produce an integrated table based on three "Sales" queries. While in this example the three queries come from the same data source, it is important to realize that the three queries could point to completely different data sources, such as three text files or even a text file, an Excel spreadsheet, and a SQL database. What is important is that the three queries have the same column names and number of columns. To implement this recipe, perform the following steps:

  1. Move the Sales2011, Sales2012, and Sales2013 queries to a new query group called Staging Queries.
  2. Disable the load for the Sales2011, Sales2012, and Sales2013 queries.
  3. Create a new Blank query in the Other Queries group and call this query FactInternetSales2011to2012.
  4. Open the FactInternetSales2011to2012 query in Advanced Editor.
  5. Use the Table.Combine (or Append Queries feature in the ribbon) function to return a single table based on the rows of the Sales2011, Sales2012, and Sales2013 queries.
    Let
        Source = Table.Combine( {Sales2011, Sales2012, Sales2013} )
    in
        Source
    
  6. Move the DimEmployees query to the Staging Queries group and disable loading.
  7. Duplicate the DimEmployees query and call this new query Managers.
  8. Disable loading for the Managers query.
  9. Open the Managers query in the Advanced Editor.
  10. Add a Manager Name column using the Table.AddColumn function.
        ManagerName = 
            Table.AddColumn(
                dbo_DimEmployee,
                "Manager Name", each [FirstName] & " " & [LastName]
            )
    
  11. Select only the EmployeeKey, Manager Name, and Title columns using the Table.SelectColumns function.
        SelectCols = 
            Table.SelectColumns(
                ManagerName, 
                { "EmployeeKey", "Manager Name", "Title" }
            )
    
  12. Create a new Blank query in the Other Queries group and call this query Employees.
  13. Open the Employees query in Advanced Editor.
  14. Join the Managers query and DimEmployee query using the Table.NestedJoin function or the Merge Queries feature in the ribbon.
    Let
        Source = 
            Table.NestedJoin(
                DimEmployee, "ParentEmployeeKey", Managers, "EmployeeKey",
                "ManagerColumn", JoinKind.LeftOuter
            )
    in
        Source
    
  15. Use the Table.ExpandTableColumn function to add the Manager Name and Manager Title columns.
        ManagerColumns = 
            Table.ExpandTableColumn(
                Source, "ManagerColumn",
                { "Manager Name", "Title" },
                { "Manager Name", "Manager Title" }
            )
    
  16. Add an Employee Name column.
        EmployeeName = 
            Table.AddColumn(
                ManagerColumns,
                "Employee Name", each [FirstName] & " " & [LastName]
            )
    
  17. Rename the Title column to Employee Title.
        RenameTitle = 
            Table.RenameColumns(
                EmployeeName,
                { "Title", "Employee Title" }
            )
    
  18. Select the EmployeeKey, Employee Name, Employee Title, Manager Name, and Manager Title columns.
        SelectCols =
            Table.SelectColumns(
                RenameTitle,
                { 
                    "EmployeeKey", "Employee Name", "Employee Title",
                    "Manager Name", "Manager Title"
                }
            )
    

How it works

For the FactInternetSales2011to2013 query, only a single expression is required using the Table.Combine function. No other expressions are necessary in this example given that the staging queries have identical column names and the same number of columns. The Table.Combine function performs an append operation and does not remove duplicate rows similar to a SQL UNION statement.

Any columns which are unique to one of the input tables in a Table.Combine function will be added to the result set with null values for the rows from the other tables. Depending on the scenario, the developer could apply the Table.Distinct function to avoid any duplicate rows from reaching the data model.

The Employees query references the DimEmployees query as the left table in a Table.NestedJoin function, and is joined to the Managers query via a left outer join. The left join is required to retain all employee rows in this scenario, as the DimEmployees table includes one employee that does not have a parent employee key, the Chief Executive Officer.

The join is performed on the ParentEmployeeKey column in the DimEmployees query to the EmployeeKey column in the Managers query. After this step, all of the rows from the matching Managers table are stored within the row as a Table object in the column ManagerColumn. When expanding the ManagerColumn column using the Table.ExpandTableColumn function, the Manager Name column can retain the same name, but the Title column is renamed to Manager Title in order to avoid conflicting with the Title column in the DimEmployees query.

In implementing the table joins, you can choose to use the Table.Join and Table.NestedJoin functions. All six join types—inner, left outer, right outer, full outer, left anti, and right anti—are supported by both functions. The Table.NestedJoin function enters the results of the join (the right or second table) into a new column of table values and will use local resources to execute the join operation, unless the Table.ExpandTableColumn function is used to replace this new column with columns from the right table. A left outer join type is the default if the JoinKind parameter is not specified. For performance reasons, Table.NestedJoin should not be used without a Table.ExpandTableColumn function removing the column of tables.

Conversely, the Table.Join function automatically expands the left table with the columns from the right table input (a flat join) and defaults to an inner join if the JoinKind parameter is not specified. The Table.Join function gets folded to the source without any additional functions but requires that there are no matching column names between the joined tables for a JoinKind other than inner join. For inner joins, the matching column names from both tables must be specified in the join key parameters. A Table.SelectColumns function is required to exclude any columns from the right table added with the join.

Whether implemented via Table.NestedJoin or Table.Join, developers should look to use inner joins if the source tables have referential integrity, such as with foreign key constraints and whether this meets requirements. For joins against larger tables, developers should confirm that query folding is occurring and can evaluate the different query plans generated by alternative retrieval designs in terms of performance.

Note that the two rows for Rob Walters are due to a Slowly Changing Dimension (SCD) Type 2 process applied in the source database. For more information on SCDs, refer to this Wikipedia article: https://bit.ly/3yIQeI5.

There's more...

Rather than creating separate lookup/join staging queries, it is possible to consolidate these expressions into a single let…in M expression. For example, the following single query returns the exact same results as the Sales2011, Sales2012, Sales2013, and FactInternetSales2011to2013 queries:

let
    Source = AdWorksDW,
    Sales = Source{[Sche"a=""bo",It"m="FactInternetSa"es"]}[Data],
    Sales2011Rows = 
        Table.SelectRows(
            Sales, each 
                [OrderDate] >= #datetime(2011, 1, 1, 0, 0, 0) and 
                [OrderDate] <= #datetime(2011, 12, 31, 0, 0, 0)
        ),
    Sales2012Rows = 
        Table.SelectRows(
            Sales, each 
                [OrderDate] >= #datetime(2012, 1, 1, 0, 0, 0) and 
                [OrderDate] <= #datetime(2012, 12, 31, 0, 0, 0)
        ),
    Sales2013Rows = 
        Table.SelectRows(
            Sales, each 
                [OrderDate] >= #datetime(2013, 1, 1, 0, 0, 0) and 
                [OrderDate] <= #datetime(2013, 12, 31, 0, 0, 0)
        ),
    Append = Table.Combine( {Sales2011Rows, Sales2012Rows, Sales2013Rows} )
in
    Append

Inline query approaches are helpful in limiting the volume of queries, but you lose the management benefits provided by group folders and the Query Dependencies view. The graphical support makes it easier to explain and quickly troubleshoot a data retrieval process over a single but complex M expression. Staging queries are recommended for all but the most trivial projects and retrieval processes. Staging queries should generally never be loaded to the data model, as staging tables could both confuse the user and would require the data model to use additional resources to process and store the additional data.

Similarly, merge queries can also be combined into a single query. The following table breaks down the six different join types that can be specified in both the Table.NestedJoin and Table.Join functions. Both the Parameter and Parameter Value can be used, though the recipes in this book use Parameter as this makes the expressions easier to follow.

Join type

Parameter

Parameter value

Inner

JoinKind.Inner

0

Left Outer

JoinKind.LeftOuter

1

Right Outer

JoinKind.RightOuter

2

Full Outer

JoinKind.FullOuter

3

Left Anti

JoinKind.LeftAnti

4

Right Anti

JoinKind.RightAnti

5

Table 2.3: Power Query (M) join types, parameters, and parameter values

One final note is that, for data source files with the same structure stored in the same network directory folder, Power BI offers the Combine Binaries transformation, which can be used with text, CSV, Excel, JSON, and other file formats. This feature can be used when creating a Folder query. The Combine Binaries feature automatically creates an example query and a function linked to this query, such that any required modification to the source files can be applied to all files, and the source location of the files can be easily revised.

See also