Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Operations on groups


Operations groupAll and groupBy are essential building blocks of Scalding applications, and they generate groups. groupAll generates a single group containing all the available tuples. groupBy generates m number of groups, where m is the number of unique keys in the data.

For example, if groupBy('color) is executed and three unique colors exist in the data, then three groups will be generated. Once grouping is achieved, a number of group operations can be applied to them.

The first seven group operations average, count, min, max, sum, size, and sizeAveStdev are useful to extract statistics from data, and their syntax is as follows:

group.average(field -> newField)
group.count(field -> newField) { function }
group.min(field -> newField)
group.max(field -> newField)
group.sum(field -> newField)
group.size(newField)
group.sizeAveStdev(field -> sizeField,averageField, stdField)

We can also apply multiple group operations on the same group. To calculate, for...