Rumen is a tool for extracting well-formatted information from job logfiles. It parses logs and generates statistics for the Hadoop jobs. The job traces can be used for performance tuning and simulation.
Current Rumen implementation includes two components: TraceBuilder
and folder
. The TraceBuilder takes job history as input and generates easily parsed json
files. The folder is a utility to manipulate on input traces, and, most of the time, it is used to scale the summarized job traces from the TraceBuilder. For example, we can use the folder tool to scale up (make time longer) or down (make time shorter) the job runtime. In this recipe, we will outline steps to analyze the job history with Rumen.