Book Image

Mastering Hadoop

By : Sandeep Karanth
Book Image

Mastering Hadoop

By: Sandeep Karanth

Overview of this book

Table of Contents (21 chapters)
Mastering Hadoop
Credits
About the Author
Acknowledgments
About the Reviewers
www.PacktPub.com
Preface
Index

The RecordReader class


Unlike InputSplit, the RecordReader class presents a record view of the data to the Map task. RecordReader works within each InputSplit class and generates records from the data in the form of key-value pairs. The InputSplit boundary is a guideline for RecordReader and is not enforced. On one extreme, a custom RecordReader class can be written to read an entire file (though this is not encouraged). Most often, a RecordReader class will have to read from a subsequent InputSplit class to present the complete record to the Map task. This happens when records overlap InputSplit classes.

The reading of bytes from a subsequent InputSplit class happens via the FSDataInputS tream objects. Though this reading does not respect locality in itself, generally, it gathers only a few bytes from the next split and there is not a significant performance overhead. But in some cases where record sizes are huge, this can have a bearing on the performance due to significant byte transfers...