There are many operations you will want to repeat across various data sources and tables in Hive. For this scenario, it makes sense to write your own user-defined function (UDF). You can write your own subroutine in Java for use on any Writable input fields and to invoke your function from Hive scripts whenever necessary. This recipe will walk you through the process of creating a very simple UDF that takes a source and returns yes
or no
for whether that source is reliable.
Make sure you have access to a pseudo-distributed or fully-distributed Hadoop cluster with Apache Hive 0.7.1 installed on your client machine and on the environment path for the active user account.
This recipe depends on having the Nigera_ACLED_cleaned.tsv
dataset loaded into a Hive table with the name acled_nigeria_cleaned
with the following fields mapped to the respective datatypes.
Issue the following command...