Apache Thrift is a cross-language serialization and RPC services framework. Thrift uses an interface definition file to generate bindings in many languages, including Java.
This recipe demonstrates the defining of a Thrift interface, the generation of the corresponding Java bindings, and the use of these bindings to serialize a Java object to HDFS using MapReduce.
You will need to download/compile/install the following:
Hadoop LZO library
Apache Thrift Version 0.7.0, from http://thrift.apache.org/
The latest version of Elephant Bird, from https://github.com/kevinweil/elephant-bird
The test data file
weblog_entries.txt
, from http://www.packtpub.com/support
To compile and install Apache Thrift, first ensure that you have all the required dependencies using Yum:
# yum install automake libtool flex bison pkgconfig gcc-c++ boost-devel libevent-devel zlib-devel python-devel ruby-devel openssl-devel
Next, build Elephant Bird.
$ cd /path/to/elephant-bird...