Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Setting the file replication factor


HDFS stores files across the cluster by breaking them down into coarser-grained, fixed-size blocks. These coarser-grained data blocks are replicated to different DataNodes mainly for fault-tolerance purposes. Data block replication also has the ability to increase the data locality of the MapReduce computations and to increase the total data access bandwidth as well. Reducing the replication factor helps save storage space in HDFS.

The HDFS replication factor is a file-level property that can be set on a per-file basis. This recipe shows you how to change the default replication factor of an HDFS deployment affecting the new files that will be created afterwards, how to specify a custom replication factor at the time of file creation in HDFS, and how to change the replication factor of existing files in HDFS.

How to do it...

Follow these instructions to set the file replication factor using the NameNode configuration:

  1. Add or modify the dfs.replication property...