Book Image

Optimizing Hadoop for MapReduce

By : Khaled Tannir
Book Image

Optimizing Hadoop for MapReduce

By: Khaled Tannir

Overview of this book

Table of Contents (15 chapters)

Hadoop best practices and recommendations


In order to improve Hadoop performance, these are some configuration tips and recommendations that represent compendium of best practices for applications running on the Hadoop framework.

Deploying Hadoop

Hadoop can be installed manually by downloading its archived files from the official website and copying it to the cluster. This will work, but it is not recommended if you want to install Hadoop on more than four node clusters. Installing Hadoop manually on a large cluster can lead to issues with maintenance and troubleshooting. Any configuration changes need to be applied manually to all nodes using Secure Copy Protocol (SCP) or Secure Shell (SSH).

To deploy Hadoop on a large cluster, it is recommended (and a good practice) to use a configuration management system and/or automated deployment tools such as Cloudera (http://www.cloudera.com), Hortonworks (http://hortonworks.com), and the MapR (http://www.mapr.com) management system. For additional...