Often times the output of your MapReduce computation will be consumed by other applications. Hence, it is important to store the result of a MapReduce computation in a format that can be consumed efficiently by the target application. It is also important to store and organize the data in a location that is efficiently accessible by your target application. We can use Hadoop OutputFormat
interface to define the data storage format, data storage location and the organization of the output data of a MapReduce computation. A OutputFormat
prepares the output location and provides a RecordWriter
implementation to perform the actual serialization and storage of the data.
Hadoop uses the org.apache.hadoop.mapreduce.lib.output.TextOutputFormat<K,V>
as the default OutputFormat
for the MapReduce computations. TextOutputFormat
writes the records of the output data to plain text files in HDFS using a separate line for...