Book Image

Programming MapReduce with Scalding

By : Antonios Chalkiopoulos
Book Image

Programming MapReduce with Scalding

By: Antonios Chalkiopoulos

Overview of this book

Table of Contents (16 chapters)
Programming MapReduce with Scalding
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

SQL databases


It is a common scenario for a Scalding job to process files from HDFS and join them with data fetched from a SQL database. Similarly, we will often have to implement a MapReduce job that writes some results into a SQL database.

For SQL, and in the context of MapReduce, we are interested to have support for all access patterns, many SQL dialects, and also batch capabilities. Batching is the technique of aggregating multiple, possibly hundreds of SQL statements and executing them as a single batch command into the database system.

The latter is very important as a MapReduce application can easily scale to hundreds of Java virtual machines, running the map and reduce tasks. Having hundreds of nodes trying to communicate with a database system at the same time can stress the system to its limits.

In SQL, the available access patterns are as follows:

  • SELECT: This is used to select data from a database and add them into a pipe

  • INSERT: This is used to insert new records from a pipe into...