This recipe will guide you on how to use Hive to perform a join across two datasets. The first dataset is the book details dataset of the Book-Crossing database and the second dataset is the reviewer ratings for those books. This recipe will use Hive to find the authors with the most number of ratings of more than 3 stars.
This section demonstrates how to perform a join using Hive. Proceed with the following steps:
Start the Hive CLI and use the Book-Crossing database:
$ hive hive > USE bookcrossing;
Create the books and book ratings tables by executing the
create-book-crossing.hql
Hive query file after referring to the previous Hive batch mode commands using a query file recipe. Use the following commands to verify the existence of those tables in theBook-Crossing
database:hive > SELECT * FROM books LIMIT 10; …. hive > SELECT * FROM RATINGS LIMIT 10; ….
Now, we...