Book Image

R High Performance Programming

Book Image

R High Performance Programming

Overview of this book

Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Index

Chapter 10. R and Big Data

We have come to the final chapter of this book where we will go to the very limits of large-scale data processing. The term Big Data has been used to describe the ever growing volume, velocity, and variety of data being generated on the Internet in connected devices and many other places. Many organizations now have massive datasets that measure in petabytes (one petabyte is 1,048,576 gigabytes), more than ever before. Processing and analyzing Big Data is extremely challenging for traditional data processing tools and database architectures.

In 2005, Doug Cutting and Mike Cafarella at Yahoo! developed Hadoop, based on earlier work by Google, to address these challenges. They set out to develop a new data platform to process, index, and query billions of web pages efficiently. With Hadoop, the work which would have previously required very expensive supercomputers can now be done on large clusters of inexpensive standard servers. As the volume of data grows, more...