Book Image

Microsoft Power BI Cookbook - Second Edition

By : Greg Deckler, Brett Powell
Book Image

Microsoft Power BI Cookbook - Second Edition

By: Greg Deckler, Brett Powell

Overview of this book

The complete everyday reference guide to Power BI, written by an internationally recognized Power BI expert duo, is back with a new and updated edition. Packed with revised practical recipes, Microsoft Power BI Cookbook, Second Edition, helps you navigate Power BI tools and advanced features. It also demonstrates the use of end-to-end solutions that integrate those features to get the most out of Power BI. With the help of the recipes in this book, you’ll gain advanced design and development insight, practical tips, and guidance on enhancing existing Power BI projects. The updated recipes will equip you with everything you need to know to implement evergreen frameworks that will stay relevant as Power BI updates. You’ll familiarize yourself with Power BI development tools and services by going deep into the data connectivity, transformation, modeling, visualization, and analytical capabilities of Power BI. By the end of this book, you’ll make the most of Power BI’s functional programming languages of DAX and M and deliver powerful solutions to common business intelligence challenges.
Table of Contents (16 chapters)
14
Other Book You May Enjoy
15
Index

Using DirectQuery

One of the most valuable features of Power BI is its deep support for real-time and streaming datasets, with the ability to provide immediate visibility to business processes and events as this data is created or updated. As Power BI Desktop's data modeling engine reflects the latest Analysis Services features, it becomes feasible to design DirectQuery models or composite models (DirectQuery and import) in Power BI Desktop, and thus avoid the scalability limitations and scheduled refresh requirements of models based on importing data.

The three most common candidates for DirectQuery or composite model projects are as follows:

  • The data model would consume an exorbitant amount of memory if all tables were fully loaded into memory. Even if the memory size is technically supported by large Power BI Premium capacity nodes, this would be a very inefficient and expensive use of company resources as most BI queries only access aggregated data representing a fraction of the size. Composite models which mix DirectQuery and Dual storage mode tables with in-memory aggregation tables is the recommended architecture for large models going forward.
  • Access to near-real-time data is of actionable or required value to users or other applications, such as is the case with notifications. For example, an updateable Nonclustered Columnstore index could be created on OLTP disk-based tables or memory-optimized tables in SQL Server to provide near-real-time access to database transactions.
  • A high-performance and/or read-optimized system is available to service report queries, such as a SQL Server or Azure SQL Database, with the Clustered Columnstore index applied to fact tables.

This recipe walks through the primary steps in designing the data access layer that supports a DirectQuery model in Power BI Desktop. As these models are not cached into memory and dynamically convert the DAX queries from report visualizations to SQL statements, guidance is provided to maintain performance. Additional details, resources, and documentation on DirectQuery's current limitations and comparisons with the default import mode are also included to aid your design decision.

Getting ready

  1. Choose a database to serve as the source for the DirectQuery data model.
  2. Create a logical and physical design of the fact and dimension tables of the model including the relationship keys and granularity of the facts. The AdventureWorksDW database is a good example of data designed in this manner.
  3. Determine or confirm that each fact-to-dimension relationship has referential integrity. Providing this information to the DirectQuery model allows for more performant inner join queries.
  4. Create view objects in the source database to provide efficient access to the dimensions and facts defined in the physical design.

Be aware that DirectQuery models are limited to a single source database and not all databases are supported for DirectQuery. If multiple data sources are needed, such as SQL Server and Oracle, or Teradata and Excel, then the default Import mode model, with a scheduled refresh to the Power BI Service, will be the only option.

How to use DirectQuery

For this recipe, we will use the AdventureWorksDW2019 database that has been used thus far in this chapter. To implement this recipe, follow these steps:

  1. Create a new Power BI Desktop file.
  2. From the Home tab, click on Get Data in the ribbon and then SQL Server.
  3. In the Data Connectivity mode section, choose the DirectQuery radio option.

    Figure 2.14: Creating a DirectQuery data source

  4. Select a table or view to be used by the model via the Navigator dialog, such as the FactResellerSales table, and then click the Transform Data button.
  5. Duplicate the initial query and revise the Navigation step to reference an additional view supporting the model, such as the DimReseller. This can be done by editing the Item in the formula bar or by clicking on the gear icon next to the Navigation step under APPLIED STEPS in the Query Settings pane. Also, rename this query to reflect the data being referenced.

    Figure 2.15: Editing the Navigation step in the formula bar

  6. Repeat step 5 for all required facts and dimensions. For example:
    • DimEmployee
    • DimPromotion
    • DimCurrency
    • DimSalesTerritory
  7. Click the Close and Apply button.

The Report Canvas view will confirm that the model is in DirectQuery mode via the status bar at the bottom right (see Figure 2.16). In addition, the Data view in the left-hand pane, which is visible for import models, will not be visible.

Figure 2.16: DirectQuery Status in Power BI Desktop

How it works

The M transformation functions supported in DirectQuery are limited by compatibility with the source system. The Power Query Editor will advise when a transformation is not supported in DirectQuery mode, per Figure 2.17.

Figure 2.17: A warning in Query Editor that the IsEven function is not supported in DirectQuery mode

Given this limitation and the additional complexity the M-based transforms would add to the solution, it is recommended that you embed all the necessary logic and transforms in the source relational layer. Ideally, the base tables in the source database themselves would reflect these needs. As a secondary option, a layer of views can be created to support the DirectQuery model.

If the database objects themselves cannot be revised, the Value.Native M function can be used to directly pass the SQL statement from Power BI Desktop to the source database, as per Figure 2.18.

Figure 2.18: The Value.Native function used to pass a SQL statement to a source system

As report visualizations are refreshed or interacted with in Power BI, the DAX queries from each visualization are translated into SQL statements, utilizing the source SQL statements to return the results. Be aware that Power BI does cache query results with DirectQuery models. Therefore, when accessing a recently utilized visual, a local cache may be used rather than a new query sent to the source.

The SQL statements passed from Power BI to the DirectQuery data source include all columns from the tables referenced by the visual.

For example, a Power BI visual with SalesAmount from the FactResellerSales table grouped by ResellerName from DimReseller would result in a SQL statement that selects the columns from both tables and implements the join defined in the model. However, as the SQL statement passed embeds these source views as derived tables, the relational engine is able to generate a query plan that only scans the required columns to support the join and aggregation.

There's more...

The performance and scalability of DirectQuery models are primarily driven by the relational data source. A denormalized star schema with referential integrity and a system that is isolated from OLTP workloads is recommended if near real-time visibility is not required. Additionally, in-memory and columnar features available to supported DirectQuery sources are recommended for reporting and analytical queries.

By default, DirectQuery models generate outer join SQL queries to ensure that measures return the correct value even if there's not a related dimension. However, you can configure DirectQuery models to send inner join queries. This is done by editing the relationship between tables in the modeling view by checking the Assume referential integrity setting (see Figure 2.19). Along with source system resources, this is one of the top factors contributing to the DirectQuery model's performance.

Figure 2.19: Activating referential integrity assumption in relationships

Of course, you should ensure that there is referential integrity in the source before enabling this setting; otherwise, incorrect results could be returned.

The design of the source relational schema and the hardware resources of this system can, of course, greatly impact the performance of DirectQuery models.

A classic star-schema design with denormalized tables is recommended to reduce the required join operations at query time. Optimizing relational fact tables with column store technologies such as the Clustered Columnstore Index for SQL Server and table partitions will also significantly benefit DirectQuery models.

See also

  • Power BI Desktop DirectQuery documentation: http://bit.ly/2nUoLOG
  • The Power BI data sources documentation provides a detailed list of data sources broken down by the connectivity options supported: http://bit.ly/30N5ofG