Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

A simple example


To make clear that Scalding operations can be chained together to implement a complete pipeline look at the following example:

Tsv(args("input"), ('kid,'age,'fruits))
.read
.flatMap('fruits -> 'fruit) { text : String => text.split(",") }
.project('kid, 'fruit)
.write(Tsv("results.tsv"))

The same example can be expressed in a number of pipes, where we assemble and control how each pipe connects to another:

val logs = Tsv(args("logfiles"), LogsOperations.schema )
  .read
  .extractSomeUserInfo

val customers = Tsv(args("cust_log"),COperations.schema).read
  .extractSomeCustomerInfo

val joined = logs.joinWithSmaller(customers, 'user)

val result = joined.filter(
  .write(Tsv(args("output")))

The preceding code allows us to whiteboard our designs for processing data before implementing a data processing flow. Refer to the pipeline definition of the first chapter to see how the theory matches the implementation.