Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Understanding the core capabilities of Scalding


Scalding provides a rich set of core operations to perform data transformations. Map-like operations apply a function to each tuple in the pipe. Join operations can join data from multiple pipes. Pipe operations allow us to concatenate or debug pipes. Grouping/Reducing operations group related data together. Also, for data that has been grouped, there is a rich set of group operations.

Map-like operations

These operations are internally translated into map phases of MapReduce and apply a function to every row of data. The syntax of the map operation is:

pipe.map(existingFields -> additionalFields) { function }

The map operation uses some of the existing fields of a pipe as input and creates a pipe with additional fields by applying a function to the elements of the input. In the following example, a new field 'priceWithVAT is introduced:

pipe.map('price -> 'priceWithVAT) { price: Double => price*1.20 }

Operations can be executed to multiple...