Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Book Image

Hadoop MapReduce v2 Cookbook - Second Edition: RAW

Overview of this book

Table of Contents (19 chapters)
Hadoop MapReduce v2 Cookbook Second Edition
Credits
About the Author
Acknowledgments
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Implementing a custom Hadoop Writable data type


There can be use cases where none of the inbuilt data types match your requirement or a custom data type optimized for your use case may perform better than a Hadoop built-in data type. In such scenarios, we can easily write a custom Writable data type by implementing the org.apache.hadoop.io.Writable interface to define the serialization format of your data type. The Writable interface-based types can be used as value types in Hadoop MapReduce computations.

In this recipe, we implement a sample Hadoop Writable data type for HTTP server log entries. For the purpose of this sample, we consider that a log entry consists of the five fields: request host, timestamp, request URL, response size, and the HTTP status code. The following is a sample log entry:

192.168.0.2 - - [01/Jul/1995:00:00:01 -0400] "GET /history/apollo/ HTTP/1.0" 200 6245

You can download a sample HTTP server log dataset from ftp://ita.ee.lbl.gov/traces/NASA_access_log_Jul95.gz...