Aggregate functions are used in dataflow Query transforms to perform aggregation on the grouped dataset.
You should be familiar with these functions as they are the same ones used in the SQL language: avg()
, min()
, max()
, count()
, count_distinct()
, and sum()
.
To demonstrate the use of aggregate functions, we will perform a simple analysis of one of our tables. Import the DimGeography
table into the DWH datastore and create a new job with a single dataflow inside it using these steps:
Your dataflow should include the
DimGeography
source table and theDimGeography
target template table in aSTAGE
database to send the output to:Open the Query transform and create the following output structure:
The
COUNTRYREGIONCODE
column contains country code values and will be the column on which we perform the grouping of the dataset. It is mapped from the input dataset to the output. Also, drag and drop it to theGROUP BY
tab of the Query transform from the input dataset...