Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Running our first Scalding job


After adding Scalding as a project dependency, we can now create our first Scalding job as src/main/scala/WordCountJob.scala:

import com.twitter.scalding._
class WordCountJob(args : Args) extends Job(args) {
  TextLine( args("input") )
  .flatMap('line -> 'word) { line : String => 
    line.toLowerCase.split("\\s+") }
  .groupBy('word) { _.size }
  .write( Tsv( args("output") ) )
}

The Scalding code above implements a cascading flow using an input file as source and stores results into another file that is used as an output tap. The pipeline tokenizes lines into words and calculates the number of times each word appears in the input text.

Note

Find complete project files in the code accompanying this book at http://github.com/scalding-io/ProgrammingWithScalding.

We can create a dummy file to use as input with the following command:

$ echo "This is a happy day. A day to remember" > input.txt

Scalding supports two types of execution modes: local mode and...