Book Image

Microsoft Power BI Cookbook - Second Edition

By : Greg Deckler, Brett Powell
Book Image

Microsoft Power BI Cookbook - Second Edition

By: Greg Deckler, Brett Powell

Overview of this book

The complete everyday reference guide to Power BI, written by an internationally recognized Power BI expert duo, is back with a new and updated edition. Packed with revised practical recipes, Microsoft Power BI Cookbook, Second Edition, helps you navigate Power BI tools and advanced features. It also demonstrates the use of end-to-end solutions that integrate those features to get the most out of Power BI. With the help of the recipes in this book, you’ll gain advanced design and development insight, practical tips, and guidance on enhancing existing Power BI projects. The updated recipes will equip you with everything you need to know to implement evergreen frameworks that will stay relevant as Power BI updates. You’ll familiarize yourself with Power BI development tools and services by going deep into the data connectivity, transformation, modeling, visualization, and analytical capabilities of Power BI. By the end of this book, you’ll make the most of Power BI’s functional programming languages of DAX and M and deliver powerful solutions to common business intelligence challenges.
Table of Contents (16 chapters)
14
Other Book You May Enjoy
15
Index

Viewing and Analyzing M Functions

Every time you click on a button to connect to any of Power BI Desktop's supported data sources or apply any transformation to a data source object, such as changing a column's data type, one or multiple M expressions are created reflecting your choices. These M expressions are automatically written to dedicated M documents and, if saved, are stored within the Power BI Desktop file as Queries. M is a functional programming language like F#, and it is important that Power BI developers become familiar with analyzing, understanding, and later, writing and enhancing the M code that supports their queries.

Getting ready

To prepare for this recipe, we will first build a query through the user interface that connects to the AdventureWorksDW2019 SQL Server database, retrieves the DimGeography table, and then filters this table to a single country, such as the United States:

  1. Open Power BI Desktop and click on Get Data from the Home tab of the ribbon. Select SQL Server from the list of database sources. For future reference, if the data source is not listed in Common data sources, more data sources are available by clicking More… at the bottom of the list.
  2. A dialog window is displayed asking for connectivity information. Ensure that Data Connectivity mode is set to Import. Enter the name of your SQL server as well as the AdventureWorksDW2019 database. In Figure 2.1, my SQL server is installed locally and running under the instance MSSQLSERVERDEV. Thus, I set the server to be localhost\MSSQLSERVERDEV to specify both the server (localhost) and the instance. If you leave the Database field blank, this will simply result in an extra navigation step to select the desired database.
    Graphical user interface, application

Description automatically generated

    Figure 2.1: SQL Server Get Data dialog

  3. If this is the first time connecting to this database from Power BI, you may be prompted for some credentials. In addition, you may also be warned that an encrypted connection cannot be made to the server. Simply enter the correct credentials for connecting and click the Connect button. For the encryption warning, simply click the OK button to continue.
  4. A navigation window will appear, with the different objects and schemas of the database. Select the DimGeography table from the Navigator window and click the Transform Data button.
  5. The Power Query Editor launches in a new window with a query called DimGeography; preview data from that table is displayed in the center of the window. In the Power Query Editor window, use the scroll bar at the bottom of the central display area to find the column called EnglishCountryRegionName. You can also select a column and then click Go to Column in the ribbon of the View menu to search for and navigate to a column quickly. Click the small button in the column header next to this column to display a sorting and filtering drop-down menu.

    Uncheck the (Select All) option to deselect all values and then check the box next to a country, such as the United States, before clicking the OK button.

Figure 2.2: Filtering for United States only in the Query Editor

Note that the button for the EnglishCountryRegionName column changes to display a funnel icon. Also notice that, in the Query Settings pane on the right side of the window, a new option under APPLIED STEPS has appeared called Filtered Rows.

Figure 2.3: The Query Settings pane in the Query Editor

How to View and Analyze M Functions

There are two methods for viewing and analyzing the M functions comprising a query; they are as follows:

  • Formula bar
  • Advanced Editor

The formula bar exposes the M function for the current step only. This formula bar appears just above the column headers for the preview data in the central part of the window. If you do not see this formula bar, click the View tab and check the box next to Formula Bar in the Layout section of the ribbon. All such areas of interest are boxed in red in Figure 2.4.

Figure 2.4: The Power Query Editor formula bar

When the Source step is selected under APPLIED STEPS in the Query Settings pane, as seen in Figure 2.3, we see the connection information specified on the initial dialog after selecting Get Data and then SQL Server. The M function being used is Sql.Database. This function is accepting two parameters: the server name, localhost\MSSQLSERVERDEV, and the database name, AdventureWorksDW2019. Clicking on other steps under APPLIED STEPS exposes the formulas for those steps, which are technically individual M expressions.

The formula bar is useful to quickly understand the M code behind a particular query step. However, it is more convenient and often essential to view and edit all the expressions in a centralized window. This is the purpose of the Advanced Editor. To launch the Advanced Editor, follow these steps:

  1. Click on the Home tab and then select Advanced Editor from the Query section of the ribbon, as shown in Figure 2.5. Alternatively, the Advanced Editor can also be accessed from the View tab, shown in Figure 2.4.

    Figure 2.5: Advanced Editor on the Home tab of the Query Editor

  2. The Advanced Editor dialog is displayed, exposing all M functions and comments that comprise the query. The M code can be directly edited from within this dialog.

Figure 2.6: The Advanced Editor view of the DimGeography query

As shown in Figure 2.6, using the Advanced Editor will mean that all of the Power Query code that comprises the query can be viewed in one place.

How it works

The majority of queries created for Power BI follow the let...in structure, as per this recipe. Within the let block, there are multiple steps with dependencies among those steps. For example, the second step, dbo_DimGeography, references the previous step, Source. Individual expressions are separated by commas, and the expression referred to following the in keyword is the expression returned by the query. The individual step expressions are technically known as "variables".

Variable names in M expressions cannot have spaces without being preceded by a hash sign and enclosed in double quotes. When the Query Editor graphical interface is used to create M queries, this syntax is applied automatically, along with a name describing the M transformation applied. This behavior can be seen in the Filtered Rows step in Figure 2.6. Applying short, descriptive variable names (with no spaces) improves the readability of M queries.

Note the three lines below the let statement. These three lines correspond to the three APPLIED STEPS in our query: Source, Navigation, and Filtered Rows. The query returns the information from the last step of the query, Filtered Rows. As more steps are applied, these steps will be inserted above the in statement and the line below this will change to reference the final step in the query.

M is a case-sensitive language. This includes referencing variables in M expressions (RenameColumns versus Renamecolumns) as well as the values in M queries. For example, the values "Apple" and "apple" are considered unique values in an M query.

It is recommended to use the Power Query Editor user interface when getting started with a new query and when learning the M language. After several steps have been applied, use Advanced Editor to review and optionally enhance or customize the M query. As a rich, functional programming language, there are many M functions and optional parameters not exposed via the Power Query Editor's graphical user interface. Going beyond the limits of the Power Query Editor enables more robust data retrieval, integration, and data mashup processes.

The M engine also has powerful "lazy evaluation" logic for ignoring any redundant or unnecessary variables, as well as short-circuiting evaluation (computation) once a result is determinate, such as when one side (operand) of an OR logical operator is computed as True. Lazy evaluation allows the M query engine to reduce the required resources for a given query by ignoring any unnecessary or redundant steps (variables). The order of evaluation of the expressions is determined at runtime—it doesn't have to be sequential from top to bottom.

In the following example, presented in Figure 2.7, a step for retrieving Canada was added and the "Filtered Rows" step for filtering the results for the United States was ignored. Since the CanadaOnly variable satisfies the overall let expression of the query, only the Canada query is issued to the server as if the "Filtered Rows" step were commented out or omitted.

Figure 2.7: Revised query that ignores the "Filtered Rows" step to evaluate Canada only

As a review of the concepts covered thus far and for future reference, Table 2.1 presents a glossary of the main concepts of the M language utilized in this book.

Concept

Definition

Expression

Formulas evaluated to yield a single value. Expressions can reference other values, such as functions, and may include operators.

Value

The result of evaluating an expression. Values can be categorized into types which are either primitive, such as text ("abc"), or structured kinds, such as tables and lists.

Function

A value that produces a new one based on the mapping of input values to the parameters of the function. Functions can be invoked by passing parameter values.

Type

A value that classifies other values. The structure and behavior of values are restricted based on the classification of their type, such as Record, List, or Table.

let

An expression that allows a set of unique expressions to be assigned names (variables) and evaluated (if necessary) when evaluating the expression following the in expression in a let...in construct.

Variable

A unique, named expression within an environment to be conditionally evaluated. Variables are represented as Applied Steps in the Query Editor.

Environment

A set of variables to be evaluated. The global environment containing the M library is exposed to root expressions.

Evaluation

The computation of expressions. Lazy evaluation is applied to expressions defined within let expressions; evaluation occurs only if needed.

Operators

A set of symbols used in expressions to define the computation. The evaluation of operators depends on the values to be operated on.

Table 2.1: M Language elements

There's more...

M queries are not intended as a substitute for the data loading and transformation workloads typically handled by enterprise data integration and orchestration tools such as Azure Data Factory (ADF), Azure Databricks, or SQL Server Integration Services (SSIS). However, just as BI professionals carefully review the logic and test the performance of SQL stored procedures and ETL packages supporting their various cubes and reporting environments, they should also review the M queries created to support Power BI models and reports. When developing retrieval processes for Power BI models, consider these common ETL questions:

  • How are queries impacting the source systems?
  • Can we make our queries more resilient to changes in source data so that they avoid failure?
  • Are our queries efficient and simple to follow and support, or are there unnecessary steps and queries?
  • Are our queries delivering sufficient performance to the BI application?
  • Is our process flexible, such that we can quickly apply changes to data sources and logic?
  • Can some or all of the required transformation logic be implemented in a source system such as the loading process for a data warehouse table or a SQL view?

One of the top performance and scalability features of M's query engine is called Query Folding. If possible, the M queries developed in Power BI Desktop are converted ("folded") into SQL statements and passed to source systems for processing.

If we use the original version of the query from this recipe, as shown in Figure 2.6, we can see Query Folding in action. The query from this recipe was folded into the following SQL statement and sent to the server for processing, as opposed to the M query engine performing the processing locally. To see how this works, perform the following:

  1. Right-click on the Filtered Rows step in the APPLIED STEPS section of the Query Settings pane, and select View Native Query.

    Figure 2.8: View Native Query in Query Settings

  2. The Native Query dialog is then displayed, as shown in Figure 2.9.

Figure 2.9: The SQL statement generated from the DimGeography M query

Finding and revising queries that are not being folded to source systems is a top technique for enhancing large Power BI datasets. See the Pushing Query Processing Back to Source Systems recipe of Chapter 11, Enhancing and Optimizing Existing Power BI Solutions, for an example of this process.

The M query engine also supports partial query folding. A query can be "partially folded", in which a SQL statement is created resolving only part of an overall query. The results of this SQL statement would be returned to Power BI Desktop (or the on-premises data gateway) and the remaining logic would be computed using M's in-memory engine with local resources. M queries can be designed to maximize the use of the source system resources, by using standard expressions supported by query folding early in the query process. Minimizing the use of local or on-premises data gateway resources is a top consideration for improving query performance.

There are limits, however, to query folding. For example, no folding takes place once a native SQL query has been passed to the source system, such as when passing a SQL query directly through the Get Data dialog using the Advanced options. Figure 2.10 displays a query specified in the Get Data dialog, which is included in the Source step.

Figure 2.10: Providing a user-defined native SQL query

Any transformations applied after this native query will use local system resources. Therefore, the general implication for query development with native or user-defined SQL queries is that if they are used, try to include all required transformations (that is, joins and derived columns), or use them to utilize an important feature of the source database that is not being utilized by the folded query, such as an index.

Some other things to keep in mind regarding Query Folding are the following:

  • Not all data sources support Query Folding, such as text and Excel files.
  • Not all transformations available in the Query Editor or via M functions are directly supported by some data sources.
  • The privacy levels defined for the data sources will also impact whether folding is used or not.
  • SQL statements are not parsed before they are sent to the source system.
  • The Table.Buffer function can be used to avoid query folding. The table output of this function is loaded into local memory, and transformations against it will remain local.

See also