Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Setting the HDFS block size


HDFS stores files across the cluster by breaking them down in to coarser-grained, fixed-size blocks. The default HDFS block size is 64 MB. Block size of a data product can affect the performance of the filesystem operations where larger block sizes would be more effective if you are storing and processing very large files. Block size of a data product can also affect the performance of MapReduce computations, as the default behavior of Hadoop is to create one Map task for each data block of the input files.

How to do it...

The following steps show you how to use the NameNode configuration file to set the HDFS block size:

  1. Add or modify the following code in the $HADOOP_HOME/etc/hadoop/hdfs-site.xml file. The block size is provided using the number of bytes. This change would not change the block size of the files that are already in the HDFS. Only the files copied after the change will have the new block size.

    <property>
            <name>dfs.blocksize<...