Book Image

Mastering Microsoft Power BI

By : Brett Powell
5 (1)
Book Image

Mastering Microsoft Power BI

5 (1)
By: Brett Powell

Overview of this book

This book is intended for business intelligence professionals responsible for the design and development of Power BI content as well as managers, architects and administrators who oversee Power BI projects and deployments. The chapters flow from the planning of a Power BI project through the development and distribution of content to the administration of Power BI for an organization. BI developers will learn how to create sustainable and impactful Power BI datasets, reports, and dashboards. This includes connecting to data sources, shaping and enhancing source data, and developing an analytical data model. Additionally, top report and dashboard design practices are described using features such as Bookmarks and the Power KPI visual. BI managers will learn how Power BI’s tools work together such as with the On-premises data gateway and how content can be staged and securely distributed via Apps. Additionally, both the Power BI Report Server and Power BI Premium are reviewed. By the end of this book, you will be confident in creating effective charts, tables, reports or dashboards for any kind of data using the tools and techniques in Microsoft Power BI.
Table of Contents (15 chapters)

Data profiling

The four-step dataset design process can be immediately followed by a technical analysis of the source data for the required fact and dimension tables of the dataset. Technical metadata, including database diagrams and data profiling results, such as the existence of null values in source columns, are essential for the project planning stage. This information is used to ensure the Power BI dataset reflects the intended business definitions and is built on a sound and trusted source.

For example, the following SQL Server database diagram describes the schema for the reseller sales business process:

SQL Server Datdbase diagram: reseller sales

The foreign key constraints identify the surrogate key columns to be used in the relationships of the Power BI dataset and the referential integrity of the source database. In this schema, the product dimension is modeled as three separate dimension tables—DimProduct, DimProductSubcategory, and DimProductCategory. Given the priorities of usability, manageability, and query performance, a single denormalized product dimension table that includes essential Product Subcategory and Product Category columns is generally recommended. This will reduce the volume of source queries, relationships, and tables in the data model and will improve report query performance, as fewer relationships will need to be scanned by the dataset engine.

Clear visibility to the source system, including referential and data integrity constraints, data quality, and any MDM processes, is essential. Unlike other popular BI tools, Power BI is capable of addressing many data integration and quality issues, particularly with relational database sources which Power BI can leverage to execute data transformation operations. However, Power BI's ETL capabilities are not a substitute for data warehouse architecture and enterprise ETL tools, such as SQL Server Integration Services (SSIS). For example, it's the responsibility of the data warehouse to support historical tracking with slowly changing dimension ETL processes that generate new rows and surrogate keys for a dimension when certain columns change. To illustrate a standard implementation of slowly changing dimensions, the following query of the DimProduct table in the Adventure Works data warehouse returns three rows for one product (FR-M94B-38):

Historical tracking of dimensions via slowly changing dimension ETL processes

It's the responsibility of the Power BI team and particularly the dataset designer to accurately reflect this historical tracking via relationships and DAX measures, such as the count of distinct products not sold. Like historical tracking, the data warehouse should also reflect all master data management processes that serve to maintain accurate master data for essential dimensions, such as customers, products, and employees. In other words, despite many line of business applications and ERP, CRM, HRM, and other large corporate systems which store and process the same master data, the data warehouse should reflect the centrally governed and cleansed standard. Creating a Power BI dataset which only reflects one of these source systems may later introduce version control issues and, similar to choosing an incorrect granularity for a fact table, can ultimately require costly and invasive revisions.

Different tools are available with data profiling capabilities. If the data source is the SQL Server, SSIS can be used to analyze source data to be used in a project.

In the following image, the Data Profiling Task is used in an SSIS package to analyze the customer dimension table:

Data Profiling Task in SQL Server integration services
The Data Profiling Task requires an ADO.NET connection to the data source and can write its output to an XML file or an SSIS variable. In this example, the ADO.NET data source is the Adventure Works data warehouse database in SQL Server 2016 and the destination is an XML file (DataProfilingData.xml). Once the task is executed, the XML file can be read via the SQL Server Data Profile Viewer as per the following example. Note that this application, Data Profile Viewer, requires an installation of the SQL Server and that the Data Profiling Task only works with SQL Server data sources.

All fact and dimension table sources can be analyzed quickly for the count and distribution of unique values, the existence of null values, and other useful statistics.

Each data profiling task can be configured to write its results to an XML file on a network location for access via tools such as the Data Profile Viewer. In this example, the Data Profile Viewer is opened from within SSIS to analyze the output of the Data Profiling Task for the Customer dimension table:

Data Profile Viewer: column null ratio profiles of DimCustomer table

Identifying and documenting issues in the source data via data profiling is a critical step in the planning process. For example, the cardinality or count of unique values largely determines the data size of a column in an import mode dataset. Similarly, the severity of data quality issues identified impacts whether a DirectQuery dataset is a feasible option.