Book Image

Learning Bayesian Models with R

By : Hari Manassery Koduvely
Book Image

Learning Bayesian Models with R

By: Hari Manassery Koduvely

Overview of this book

Table of Contents (16 chapters)
Learning Bayesian Models with R
About the Author
About the Reviewers

Distributed computing using Hadoop

In the last decade, tremendous progress was made in distributed computing when two research engineers from Google developed a computing paradigm called the MapReduce framework and an associated distributed filesystem called Google File System (reference 2 in the References section of this chapter). Later on, Yahoo developed an open source version of this distributed filesystem named Hadoop that became the hallmark of Big Data computing. Hadoop is ideal for processing large amounts of data, which cannot fit into the memory of a single large computer, by distributing the data into multiple computers and doing the computation on each node locally from the disk. An example would be extracting relevant information from log files, where typically the size of data for a month would be in the order of terabytes.

To use Hadoop, one has to write programs using MapReduce framework to parallelize the computing. A Map operation splits the data into multiple key-value...