Book Image

Microsoft Power BI Cookbook - Second Edition

By : Greg Deckler, Brett Powell
Book Image

Microsoft Power BI Cookbook - Second Edition

By: Greg Deckler, Brett Powell

Overview of this book

The complete everyday reference guide to Power BI, written by an internationally recognized Power BI expert duo, is back with a new and updated edition. Packed with revised practical recipes, Microsoft Power BI Cookbook, Second Edition, helps you navigate Power BI tools and advanced features. It also demonstrates the use of end-to-end solutions that integrate those features to get the most out of Power BI. With the help of the recipes in this book, you’ll gain advanced design and development insight, practical tips, and guidance on enhancing existing Power BI projects. The updated recipes will equip you with everything you need to know to implement evergreen frameworks that will stay relevant as Power BI updates. You’ll familiarize yourself with Power BI development tools and services by going deep into the data connectivity, transformation, modeling, visualization, and analytical capabilities of Power BI. By the end of this book, you’ll make the most of Power BI’s functional programming languages of DAX and M and deliver powerful solutions to common business intelligence challenges.
Table of Contents (16 chapters)
14
Other Book You May Enjoy
15
Index

Selecting and Renaming Columns

The columns selected in data retrieval queries impact the performance and scalability of both import and DirectQuery data models. For Import models, the resources required by the refresh process and the size of the compressed data model are directly impacted by column selection. Specifically, the cardinality of columns drives their individual memory footprint and memory per column. This correlates closely to query duration when these columns are referenced in measures and report visuals. For DirectQuery models, the performance of report queries is directly affected. Regardless of the model type, the way in which this selection is implemented also impacts the robustness of the retrieval process. Additionally, the names assigned to columns (or accepted from the source) directly impact the Q&A or natural language query experience.

This recipe identifies columns to include or exclude in a data retrieval process and demonstrates how to select those columns as well as the impact of those choices on the data model. In addition, examples are provided for applying user-friendly names and other considerations for choosing to retrieve or eliminate columns of data for retrieval.

Getting ready

To get ready for this recipe, import the DimCustomer table from the AdventureWorksDW2019 database by doing the following:

  1. Open Power BI Desktop and choose Transform data from the ribbon of the Home tab to open the Power Query Editor.
  2. Create an Import mode data source query called AdWorksDW. This query should be similar to the following:
    let
        Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019")
    in
        Source
    
  3. Isolate this query in a query group called Data Sources.
  4. Right-click AdWorksDW and choose Reference.
  5. Select the DimCustomer table in the data preview area and rename this query DimCustomer.

For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.

How to Select and Rename Columns

To implement this recipe, use the following steps in Advanced Editor:

  1. Create a name column from the first and last names via the Table.AddColumn function.
        CustomerNameAdd = 
            Table.AddColumn(
                dbo_DimCustomer, "Customer Name",
                each [FirstName] & " " & [LastName],
                type text
            )
    
  2. Use the Table.SelectColumns function to select 10 of the 30 available columns now available in the DimCustomer table.
        SelectCustCols = 
            Table.SelectColumns(CustomerNameAdd,
                {
                    "CustomerKey", "Customer Name", "Annual Income", 
                    "Customer Gender", "Customer Education", "MaritalStatus", 
                    "Customer Phone Number", "CommuteDistance", "AddressLine1", 
                    "TotalChildren"
                }, MissingField.UseNull
            )
    

    Note that some of the column names specified do not actually exist. This is on purpose and will be fixed in the next step. But note that instead of generating an error, null values are displayed for those columns.

Figure 30: Non-existent columns return null instead of error

Use the Table.RenameColumns function to apply intuitive names for users and benefit the Q&A engine for natural language queries. Insert this statement above your SelectCustCols statement and adjust as appropriate. The full query should now be similar to the following:

let
    Source = AdWorksDW,
    dbo_DimCustomer = Source{[Schema="dbo",Item="DimCustomer"]}[Data],
    CustomerNameAdd = 
        Table.AddColumn(
            dbo_DimCustomer, "Customer Name",
            each [FirstName] & " " & [LastName],
            type text
        ),
    #"Renamed Columns" = 
        Table.RenameColumns(CustomerNameAdd,
            {
                {"YearlyIncome", "Annual Income"}, 
                {"Gender", "Customer Gender"},
                {"EnglishEducation", "Customer Education"},
                {"Phone", "Customer Phone Number"}
            }
        ),
    SelectCustCols = 
        Table.SelectColumns(#"Renamed Columns",
            {
                "CustomerKey", "Customer Name", "Annual Income", 
                "Customer Gender", "Customer Education", "MaritalStatus", 
                "Customer Phone Number", "CommuteDistance", "AddressLine1", 
                "TotalChildren"
            }, MissingField.UseNull
        )
in
    SelectCustCols

How it works

The Table.AddColumn function concatenates the FirstName and LastName columns and includes an optional final parameter that specifies the column type as text.

The Table.SelectColumns function specifies the columns to retrieve from the data source. Columns not specified are excluded from retrieval.

A different method of accomplishing this same effect would be to use the Table.RemoveColumns function. However, in this case, 20 columns would need to be removed versus explicitly defining 10 columns to keep. To avoid query failure if one of the source columns changes or is missing, it is better to specify and name 10 than 20 columns. Query resilience can further be improved by using the optional parameter for Table.SelectColumns, MissingField.UseNull. Using this parameter, if the column selected is not available, the query still succeeds and simply inserts null values for this column for all rows.

Another advantage of using the Table.SelectColumns function is that columns can be reordered as selected columns are retrieved and presented in the order specified. This can be helpful for the query design process and avoids the need for an additional expression with a Table.ReorderColumns function. The initial column order of a query loaded to the data model is respected in the Data view. However, the field list exposed in the Fields pane in both the Report and Data views of Power BI Desktop is automatically alphabetized.

For import data models, you might consider removing a column that represents a simple expression of other columns from the same table. For example, if the Extended Amount column is equal to the multiplication of the Unit Price and Order Quantity columns, you can choose to only import these latter two columns. A DAX measure can instead compute the Extended Amount value. This might be done to keep model sizes smaller. This technique is not recommended for DirectQuery models, however.

Use the Table.RenameColumns function to rename columns in order to remove any source system indicators, add a space between words for non-key columns, and apply dimension-specific names such as Customer Gender rather than Gender. The Table.RenameColumns function also offers the MissingField.UseNull option.

There's more...

Import models are internally stored in a columnar compressed format. The compressed data for each column contributes to the total disk size of the file. The primary factor of data size is a column's cardinality. Columns with many unique values do not compress well and thus consume more space. Eliminating columns with high cardinality can reduce the size of the data model and thus the overall file size of a PBIX file. However, it is the size of the individual columns being accessed by queries that, among other factors, drives query performance for import models.

See also