Protocol Buffers is a cross-language data format. Protocol Buffers uses an interface definition file to generate bindings in many languages, including Java.
This recipe will demonstrate how to define a Protocol Buffers message, generate the corresponding Java bindings, and use these bindings to serialize a Java object to HDFS using MapReduce.
You will need to download/compile/install the following:
Hadoop LZO library
Google Protocol Buffers Version 2.3.0 from http://code.google.com/p/protobuf/
Elephant Bird (see the previous recipe)
The test data file
weblog_entries.txt
, from http://www.packtpub.com/support
Note
Note that you will need to have a GNU C/C++ compiler collection installed to compile the protocol buffer source. We will be compiling the source code for Protocol Buffers.
To install GNU C/C++ using Yum, run the following command as the root user from a bash shell:
# yum install gcc gcc-c++ autoconf automake
To compile and install Protocol...