Book Image

Learning Hadoop 2

Book Image

Learning Hadoop 2

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
About the Authors
About the Reviewers

Processing data with Apache Spark

In this section, we will implement the examples from Chapter 3, Processing – MapReduce and Beyond, using the Scala API. We will consider both the batch and real-time processing scenarios. We will show you how Spark Streaming can be used to compute statistics on the live Twitter stream.

Building and running the examples

Scala source code for the examples can be found at We will be using sbt to build, manage, and execute code.

The build.sbt file controls the codebase metadata and software dependencies; these include the version of the Scala interpreter that Spark links to, a link to the Akka package repository used to resolve implicit dependencies, as well as dependencies on Spark and Hadoop libraries.

The source code for all examples can be compiled with:

$ sbt compile

Or, it can be packaged into a JAR file with:

$ sbt package

A helper script to execute compiled classes can be generated with: