Applying Multiple Filters
The application of precise and often complex filter conditions has always been at the heart of business intelligence, and Power BI Desktop supports rich filtering capabilities across its query, data model, and visualization components. In many scenarios, filtering at the query level via the Query Editor and M functions is the optimal choice, as this reduces the workload of both Import and DirectQuery data models and eliminates the need for re-applying the same filter logic across multiple reports or visualizations.
Although the Query Editor graphical interface can be used to configure filtering conditions, this recipe demonstrates M's core filtering functions and the use of M in common multi-condition filter scenarios. The M expression queries constructed in this recipe are intended to highlight some of the most common filtering use cases.
Note that applying data transformations as part of a data warehouse ETL (extract-transform-load) or ELT (extract-load-transform) process is generally preferable to using Power Query (M). BI teams and developers should be careful to avoid creating Power BI datasets that significantly deviate from existing "sources of truth".
The following eight filtering queries will be developed in this recipe:
- United States customers only
- Customers with three or more children
- Customers with null values for either the middle name or title columns
- Customers with first purchase dates between 2012 and 2013
- Customers in management with the female gender or a bachelor's education
- The top 100 customers based on income
- A list of distinct sales territory countries
- Dates less than or equal to the current date and more than ten years prior to the current date
Getting ready
To prepare for this recipe, import the DimCustomer
and DimDate tables from the AdventureWorksDW2019 database by doing the following:
- Open Power BI Desktop and choose Transform data from the ribbon of the Home tab to open the Power Query Editor.
- Create an Import mode data source query called
AdWorksDW
. This query should be similar to the following:let Source = Sql.Database("localhost\MSSQLSERVERDEV", "AdventureWorksDW2019") in Source
- Isolate this query in a query group called Data Sources.
- Right-click
AdWorksDW
and choose Reference. - Choose the
DimCustomer
table and rename the queryDimCustomer
. - Repeat steps 4 and 5 for the
DimDate
table. - Group the dimension queries into a query group called Base Queries.
- Disable the loading of all queries.
- For the
DimCustomer
query, find theDimGeography
column. In the column header, click the diverging arrows icon, uncheck (Select All Columns), and then check the box next toCountryRegionCode
and DimSalesTerritory before clicking the OK button.Figure 2.27: Expanding DimGeography to Include CountryRegionCode and DimSalesTerritory
- Now expand DimGeography.DimSalesTerritory and only select the SalesTerritoryCountry column.
- Rename the DimGeography.CountryRegionCode column to CountryCode and the DimGeography.DimSalesTerritory.SalesTerritoryCountry column to SalesTerritoryCountry.
For additional details on performing these steps, see the Managing Queries and Data Sources recipe in this chapter.
How to Apply Multiple Functions
To implement this recipe, use the following steps:
- Right-click the DimCustomer query, choose Reference, and then open the Advanced Editor window for this query. Use the
Table.SelectRows
function to apply the US query predicate and rename the query United States Customers. The finished query should appear the same as the following:let Source = DimCustomer, USCustomers = Table.SelectRows(Source, each [CountryCode] = "US") in USCustomers
- Repeat step 1, but this time filter on the
TotalChildren
column for>=
3 and rename this query Customers w3+ Children:let Source = DimCustomer, ThreePlusChildFamilies = Table.SelectRows(Source, each [TotalChildren] >=3) in ThreePlusChildFamilies
- Repeat step 1, but this time use the conditional logic operator
or
to define the filter condition for blank values in theMiddleName
orTitle
columns. Use lowercase literalnull
to represent blank values. Name this query Missing Titles or Middle Names:let Source = DimCustomer, MissingTitleorMiddleName = Table.SelectRows( Source, each [MiddleName] = null or [Title] = null ) in MissingTitleorMiddleName
- Repeat step 1, but this time use the
#date
literal to apply the 2012-2013 filter on theDateFirstPurchase
column. Rename this query 2012-2013 First Purchase Customers:let Source = DimCustomer, BetweenDates = Table.SelectRows( Source, each [DateFirstPurchase] >= #date(2012,01,01) and [DateFirstPurchase] <= #date(2013,12,31) ) in BetweenDates
- Repeat step 1, but this time use parentheses to define the filter conditions for an
EnglishOccupation
ofManagement
, and either the female gender (F
), orBachelors
education. The parentheses ensure that theor
condition filters are isolated from the filter onOccupation
. Rename this query Management and Female or Bachelors:let Source = DimCustomer, MgmtAndFemaleOrBachelors = Table.SelectRows( Source, each [EnglishOccupation] = "Management" and ([Gender] = "F" or [EnglishEducation] = "Bachelors") ) in MgmtAndFemaleOrBachelors
- Right-click the
United States Customers
query, select Reference, and open the Advanced Editor. This time, use theTable.Sort
function to order this table by theYearlyIncome
column. Finally, use theTable.FirstN
function to retrieve the top 100 rows. Rename this query to Top US Customers by Income.let Source = #"United States Customers", SortedByIncome = Table.Sort( Source, {{"YearlyIncome", Order.Descending}} ), TopUSIncomeCustomers = Table.FirstN(SortedByIncome,100) in TopUSIncomeCustomers
- Repeat step 1, but this time use the
List.Distinct
andList.Sort
functions to retrieve a distinct list of values from theSalesTerritoryCountry
column. Rename this query Customer Sales Territory List.let Source = DimCustomer, SalesTerritoryCountryList = List.Distinct(Source[SalesTerritoryCountry]), OrderedList = List.Sort(SalesTerritoryCountryList,Order.Ascending) in OrderedList
- Group the queries created thus far into a Customer Filter Queries query group.
- Create a new query by referencing DimDate and open the Advanced Editor. Use the
DateTime.LocalNow
,DateTime.Date
, andDate.Year
functions to retrieve the trailing ten years from the current date. Rename this query Trailing Ten Years from Today and place this query in its own group, Date Filter Queries.let Source = DimDate, TrailingTenYearsFromToday = Table.SelectRows( Source, each [FullDateAlternateKey] <= DateTime.Date(DateTime.LocalNow) and [CalendarYear] >= Date.Year(DateTime.LocalNow) - 10 ) in TrailingTenYearsFromToday
How it works
The Table.SelectRows
function is the primary table-filtering function in the M language, and is functionally aligned with the FROM
and WHERE
clauses of SQL. Observe that variable names are used as inputs to M functions, such as the Source
line being used as the first parameter to the Table.SelectRows
function.
Readers should not be concerned with the each
syntax of the Table.SelectRows
function. In many languages, this would suggest row-by-row iteration, but when possible, the M engine folds the function into the WHERE
clause of the SQL query submitted to the source system.
In the queries United States Customers, Customers w3+ Children, Missing Titles or Middle Names, and Management and Female or Bachelors, notice the various forms of the each
selection condition. The syntax supports multiple comparison operators as well as complex logic, including the use of parenthesis to isolate logical tests.
In the 2012-2013 First Purchase Customers query, the #date
literal function is used to generate the comparison values. Literals are also available for DateTime
(#datetime
), Duration
(#duration
), Time
(#time
), and DateTimeZone
(#datetimezone
).
In the Top US Customers by Income query, the Table.Sort
function is used to sort the rows by a specified column and sort order. The Table.Sort
function also supports multiple columns as per the Importing Data recipe in this chapter. The Table.FirstN
function is then used to return 100 rows starting from the very top of the sorted table. In this example, the set returned is not deterministic due to ties in income.
The Customer Sales Territory List query returns a list instead of a table. This is evident from the different icon present in the Queries pane for this query versus the others. Lists are distinct from tables in M, and one must use a different set of functions when dealing with lists rather than tables. A list of distinct values can be used in multiple ways, such as a dynamic source of available input values to parameters.
Finally, in the Trailing 10 Yrs from Today query, the current date and year are retrieved from the DateTime.LocalNow
function and then compared to columns from the date dimension with these values.
There's more...
With simple filtering conditions, as well as in proof-of-concept projects, using the UI to develop filter conditions may be helpful to expedite query development. However, the developer should review the M expressions generated by these interfaces, as they are only based on the previews of data available at design time, and logical filter assumptions can be made under certain conditions.
To access the Filter Rows dialog, click on the drop-down button in a column header and then choose the Text Filters option, before specifying a starting filtering condition.
Figure 2.28: Accessing the Filter Rows dialog
The Basic option of the Filter Rows dialog only allows you to work with the currently selected column. However, by clicking on the Advanced radio button, you can work with any column in the table.
Figure 2.29: Advanced Filter Rows dialog in the Query Editor
Despite this, even the Advanced version of the Filter Rows dialog does not provide the ability to group logical filtering criteria. While the dialog in Figure 2.29 looks like it recreates the query for Management and Female or Bachelors, the generated M code does not include the parenthesis that groups the Gender and EnglishEducation clauses. Thus, the code generated would have to be edited manually in the Advanced Editor to return the same results as the original Management and Female or Bachelors query. The M code generated by the Filter Rows dialog shown in Figure 2.29 generates the following code:
Table.SelectRows(
Source,
each
[EnglishOccupation] = "Management" and
[Gender] = "F" or
[EnglishEducation] = "Bachelors"
)
See also
Table.SelectRows
: http://bit.ly/3bSkEyjTable.Sort
: http://bit.ly/3qPaeUoTable.FirstN
: http://bit.ly/3ttb0InList.Distinct
: http://bit.ly/3lnCqwqList.Sort
: http://bit.ly/30QLEb1- 10 Common Mistakes You Do In #PowerBI #PowerQuery – Pitfall #3: http://bit.ly/2nLX6QW