Hadoop provides a utility class org.apache.hadoop.mapreduce.lib.output.MultipleOutputs
that simplifies writing output data to multiple files and locations.
You might use MultipleOutputs
mainly for the following two use cases:
To emit additional outputs other than the job default output
To emit data to different files and/or directories provided by a user
Each additional output or named output might be configured with its own output format, key class, and value class. You can define multiple named outputs in your Driver
class and then use them in your Reducer
class to emit additional output. Each Reducer
class creates a separate copy of these named outputs, the same way it does with default job output.