Grouping and summarizing information is a powerful feature of Excel pivot tables and Power BI table and matrix visualizations. However, it is often necessary to group and summarize information in DAX as well. Grouping and summarizing in DAX can be accomplished through the use of two functions, SUMMARIZE and GROUPBY. In this recipe, we will create new tables that summarize information by using the SUMMARIZE and GROUPBY functions.
Grouping and summarizing
Getting ready
To prepare for this recipe, perform the following steps:
- Create a table using the following formula:
R05_Table = GENERATESERIES(DATE(2020,1,1),DATE(2020,12,31))
- Create a column in that table using the following formula:
Month = [Value].[Month]
- Create another column in that table using the following formula:
Weekday = FORMAT([Value], "dddd")
How to do it...
The SUMMARIZE function has the following syntax:
SUMMARIZE(<table>, <groupBy_columnName>[, <groupBy_columnName>]…[, <name>, <expression>]…)
To use the SUMMARIZE function to return the number of weekdays in each month as well as the first day when each weekday occurs in each month, create a new table with the following formula:
R05_summarizedTable =
SUMMARIZE(
'R05_Table', // This is the table to summarize
[Month], // This is the column by which we want to group values
[Weekday], // This is a second column by which we want to group values
"# of Days", // Create a column called "# of Days"
COUNTROWS('R05_Table'), // Return the count of rows for "# of Days"
"First Date", // Create a second column called "First Date"
MINX('R05_Table','R05_Table'[Value]) // Return first date weekday occurs in the month in "First Date"
)
An excerpt of the table returned by this formula is as follows:
Month |
# of Days |
Weekday |
First Date |
January |
5 |
Wednesday |
1/1/2020 12:00:00 AM |
February |
4 |
Wednesday |
2/5/2020 12:00:00 AM |
March |
4 |
Wednesday |
3/4/2020 12:00:00 AM |
April |
5 |
Wednesday |
4/1/2020 12:00:00 AM |
May |
4 |
Wednesday |
5/6/2020 12:00:00 AM |
June |
4 |
Wednesday |
6/3/2020 12:00:00 AM |
July |
5 |
Wednesday |
7/1/2020 12:00:00 AM |
August |
4 |
Wednesday |
8/5/2020 12:00:00 AM |
September |
5 |
Wednesday |
9/2/2020 12:00:00 AM |
October |
4 |
Wednesday |
10/7/2020 12:00:00 AM |
November |
4 |
Wednesday |
11/4/2020 12:00:00 AM |
December |
5 |
Wednesday |
12/2/2020 12:00:00 AM |
January |
5 |
Thursday |
1/2/2020 12:00:00 AM |
February |
4 |
Thursday |
2/6/2020 12:00:00 AM |
We can also use the GROUPBY function to return the same information in a table. The GROUPBY function has the following format:
GROUPBY (<table>, [<groupBy_columnName1>], [<name>, <expression>]… )
To do this, create a new table with the following formula:
R05_groupedTable =
GROUPBY(
'R05_Table', // This is the table to group
[Month], // This is the column to group by
[Weekday], // This is a second column by which we want to group values
"# of Days", // Create a new column in this table called "# of Days"
COUNTX(CURRENTGROUP(),'R05_Table'[Value]), // Return the count of values for "# of Days"
"First Date", // Create a second column called "First Date"
MINX(CURRENTGROUP(),'R05_Table'[Value]) // Return first date weekday occurs in the month in "First Date"
)
This formula returns a table that is identical to our SUMMARIZE formula, except that the order of the rows differs slightly.
How it works...
Looking at the SUMMARIZE formula, the first parameter is the table that we want to summarize, and the next two columns are the columns according to which we want to group our data. Note that you can group by one, two, three, or nearly any number of columns.
The next four parameters are name/expression pairs. DAX understands where these pairs start when you stop referring to column names in the table and enter a text value for a parameter denoted by double quotes. These name/expression pairs specify a column name as the first part of the pair and a DAX expression to evaluate as the second portion of the pair. We first create a column called # of Days, and the value to be returned in this column is the count of the rows from our groupings. In other words, we get the number of each weekday in each month as our value. For the second column called First Date, we return the minimum date from our groupings. In other words, we get the first date of each weekday within each month as our value.
Looking at our GROUPBY formula, the first parameter is the table that we want to group. The next two columns are the columns according to which we want to group our data. This works in the same way as the SUMMARIZE function in that you can group by one, two, three, or nearly any number of columns.
Again, with the next four parameters, these are similar to the SUMMARIZE function in that they are name/expression pairs. This works exactly like the SUMMARIZE function except that instead of referring to the original table, you must refer to a special DAX function that can only be used within a GROUPBY function – the CURRENTGROUP function. In other words, you are referring to the current row of the Cartesian product created by your grouping columns.
There's more...
OK, so the most obvious question is likely something related to why there are two extremely similar functions that essentially do precisely the same thing. The answer is that while they are similar and do similar things, the way in which these functions go about what they do is fairly different. The order of the resulting table rows returned by these functions provides a hint that they are operating somewhat differently within the bowels of DAX.
Without going into excruciating detail regarding the internals of DAX calculations, the best way to describe the differences is to understand that within name/expression pairs of a SUMMARIZE function, the expression portion must always refer to an actual table name within the data model, while within the name/expression pairs of a GROUPBY function, the expression portion must always refer to CURRENTGROUP. This means that if you were to nest two SUMMARIZE functions, you could not refer to the table or columns created by the first SUMMARIZE function within the second (nested) SUMMARIZE function.
However, with GROUPBY, it would be possible to refer to a column created by the first GROUPBY function within the second (nested) GROUPBY function. This can be fairly powerful under the right circumstances. Conversely, because the SUMMARIZE expressions do not need to refer to CURRENTGROUP, this provides a certain flexibility that is not possible when using GROUPBY. For example, if you were to change the last parameter of each formula, within SUMMARIZE, as shown in the following line of code:
DAY(MINX('R05_Table',[Value]))
This would work and you would return just the day of the first date of the weekday within each month instead of the full date. However, you would receive an error in the GROUPBY formula because the specified expression does not aggregate over the current group.
See also
For more details regarding this recipe, refer to the following links:
- SUMMARIZE: https://docs.microsoft.com/en-us/dax/summarize-function-dax
- GROUPBY: https://docs.microsoft.com/en-us/dax/groupby-function-dax
- Nested grouping using GROUPBY versus SUMMARIZE: https://www.sqlbi.com/articles/nested-grouping-using-groupby-vs-summarize/