We can also use the Count
and groupby
functions to aggregate individual variables.
Here is an example of using this to tally the number of observations by outcome. Since the result is another dataframe, we can use the head
function to write the results to the console.
Note
You might have to alter the number of rows returned by head if you change the query. It is always a good idea to filter results using a function such as head
, to make sure that you are not printing hundreds of rows (or more).
However, you also need to ensure that you do not cut off all of your output. If you are unsure as to the number of rows, first assign the result to a dataframe and then check the number of rows (with nrow
) first:
This code line count the number of rows by outcome. I know that there should be only 2 outcomes, but I place the count function within a head statement just to program defensively.
head(SparkR::count(groupBy(out_sd, "outcome")))...