-
Book Overview & Buying
-
Table Of Contents
Programming MapReduce with Scalding
By :
Operations groupAll and groupBy are essential building blocks of Scalding applications, and they generate groups. groupAll generates a single group containing all the available tuples. groupBy generates m number of groups, where m is the number of unique keys in the data.
For example, if groupBy('color) is executed and three unique colors exist in the data, then three groups will be generated. Once grouping is achieved, a number of group operations can be applied to them.
The first seven group operations average, count, min, max, sum, size, and sizeAveStdev are useful to extract statistics from data, and their syntax is as follows:
group.average(field -> newField) group.count(field -> newField) { function } group.min(field -> newField) group.max(field -> newField) group.sum(field -> newField) group.size(newField) group.sizeAveStdev(field -> sizeField,averageField, stdField)
We can also apply multiple group operations on the same group. To calculate...
Change the font size
Change margin width
Change background colour