Until now, we introduced the architecture of HDFS and how to programmatically store and retrieve data using the command-line tools and the Java API. In the examples seen until now, we have implicitly assumed that our data was stored as a text file. In reality, some applications and datasets will require ad hoc data structures to hold the file's contents. Over the years, file formats have been created to address both the requirements of MapReduce processing—for instance, we want data to be splittable—and to satisfy the need to model both structured and unstructured data. Currently, a lot of focus has been dedicated to better capture the use cases of relational data storage and modeling. In the remainder of this chapter, we will introduce some of the popular file format choices available within the Hadoop ecosystem.
Learning Hadoop 2
Learning Hadoop 2
Overview of this book
Table of Contents (18 chapters)
Learning Hadoop 2
Credits
About the Authors
About the Reviewers
www.PacktPub.com
Preface
Free Chapter
Introduction
Storage
Processing – MapReduce and Beyond
Real-time Computation with Samza
Iterative Computation with Spark
Data Analysis with Apache Pig
Hadoop and SQL
Data Lifecycle Management
Making Development Easier
Running a Hadoop Cluster
Where to Go Next
Index
Customer Reviews