Hive does not directly support foreign keys. Nevertheless, it is still very common to join records on identically matching keys contained in one or more tables.
This recipe will show a very simple inner join over weblog data that links each request record in the weblog_entries
table to a country, based on the request IP.
For each record contained in the weblog_entries
table, the query will print the record out with an additional trailing value showing the determined country.
Make sure that you have access to a pseudo-distributed or fully-distributed Hadoop cluster, with Apache Hive 0.7.1 installed on your client machine and on the environment path for the active user account.
This recipe depends on having the weblog_entries
dataset loaded into a Hive table named weblog_entries
with the following fields mapped to the respective datatypes.
Issue the following command to the Hive client:
describe weblog_entries
You should...