There are many ways to set up a Hadoop cluster. We can install Hadoop on a single server in pseudo-distributed mode to simulate a cluster, or on an actual cluster of servers, or virtual machines in fully distributed mode. There are also several distributions of Hadoop available from the vanilla open source version provided by the Apache Foundation to commercial distributions such as Cloudera, Hortonworks, and MapR. Covering all the different ways of setting up Hadoop is beyond the scope of this book. We instead provide instructions for one way to set up Hadoop and other relevant tools for the purpose of the examples in this chapter. If you are using an existing Hadoop cluster or setting up one in a different way, you might have to modify some of the steps.
R High Performance Programming
R High Performance Programming
Overview of this book
Table of Contents (17 chapters)
R High Performance Programming
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Understanding R's Performance – Why Are R Programs Sometimes Slow?
Profiling – Measuring Code's Performance
Simple Tweaks to Make R Run Faster
Using Compiled Code for Greater Speed
Using GPUs to Run R Even Faster
Simple Tweaks to Use Less RAM
Processing Large Datasets with Limited RAM
Multiplying Performance with Parallel Computing
Offloading Data Processing to Database Systems
R and Big Data
Index
Customer Reviews