In this recipe, we are going to learn how to write a map reduce, which will join records from two tables.
To perform this recipe, you should have a running Hadoop cluster as well as an eclipse that's similar to an IDE.
We are aware of the various types of joins that are available in SQL—Inner Join, Left outer join, right outer join, full outer join, and so on. Performing joins in SQL is quite easy, but when it comes to MapReduce, this is a little tricky. In this recipe, we will be try to perform various join operations using the Map Reduce program in the following dataset.
Consider two datasets: the Users
table, which has information about userId
, username, and deptId
. We also have data on the Department
table where we have deptId
and deptName
as columns. If we place our data in a table, it would look like this:
Users' table:
User ID |
Username |
Department ID |
---|---|---|
1 |
Tanmay |
1 |
2 |
Sneha |
1 |
3 |
Sakalya |
2 |
4 |
Manisha |
2 |
5 |
Avinash... |