Hadoop supports processing of many different formats and types of data through InputFormat. The InputFormat of a Hadoop MapReduce computation generates the key-value pair inputs for the Mappers by parsing the input data. InputFormat also performs the splitting of the input data into logical partitions, essentially determining the number of Map tasks of a MapReduce computation and indirectly deciding the execution location of the Map tasks. Hadoop generates a Map task for each logical data partition and invokes the respective Mappers with the key-value pairs of the logical splits as the input.
The following steps show you how to use FileInputFormat
based KeyValueTextInputFormat
as InputFormat for a Hadoop MapReduce computation:
In this example, we are going to specify the
KeyValueTextInputFormat
as InputFormat for a Hadoop MapReduce computation using theJob
object as follows:Configuration conf = new Configuration...