Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Formatting the results of MapReduce computations – using Hadoop OutputFormats


Often the output of your MapReduce computation will be consumed by other applications. Hence, it is important to store the result of a MapReduce computation in a format that can be consumed efficiently by the target application. It is also important to store and organize the data in a location that is efficiently accessible by your target application. We can use Hadoop OutputFormat interface to define the data storage format, data storage location, and the organization of the output data of a MapReduce computation. An OutputFormat prepares the output location and provides a RecordWriter implementation to perform the actual serialization and storage of data.

Hadoop uses the org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<K,V> abstract class as the default OutputFormat for the MapReduce computations. TextOutputFormat writes the records of the output data to plain text files in HDFS using a separate line...