We will cover a use case to find out the top 20 highly rated movies and will consider the condition that movies should have been rated by more than 100 people. The filter pattern we discussed earlier is a good fit for the use case. The format of the data is as follows:
|
|
|
|
|
|
The title code refers to the specific movie. The rating is based on a 10 point scale. Let's look into the mapper, reduce code, and driver code. The template can also be used for similar use cases.
The job of the mapper is to process the record and emit the top 20 records it has processed for input split. We are also filtering out movies that have not been rated by at least 100 people. The code is as follows:
import org.apache.Hadoop.io.LongWritable; import org.apache.Hadoop.io.Text; import org.apache.Hadoop.mapreduce.Mapper; import java.io.IOException; import java.util.Map; import java.util.TreeMap; public class MovieRatingMapper extends Mapper...