Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Scalding execution throttling


Scalding execution throttling is a Hadoop-specific trick. It makes sense to highlight it here as we may read billions of rows of data when running Scalding applications in production.

For resource management, Hadoop offers a number of schedulers. Each cluster has a specific capacity, for example 600 simultaneous map tasks and 300 reduce tasks. The most common scheduler used in Hadoop is the Fair Scheduler. It attempts to assign resources to jobs so that in average they get an equal amount of resources.

There are occasions, however, when we will want to protect some resources for business critical jobs, or we will want to throttle some job. Sometimes, we may need to limit resources to newer members of the team, or limit resources on a new beta release of an application.

For this, we can access the JobTracker using ssh and add a new pool in the file fair-scheduler.xml, as shown in the following code:

<pool name="staging_pool">
  <maxMaps>50</maxMaps...