Book Image

Mastering Hadoop

By : Karanth
Book Image

Mastering Hadoop

By: Karanth

Overview of this book

Do you want to broaden your Hadoop skill set and take your knowledge to the next level? Do you wish to enhance your knowledge of Hadoop to solve challenging data processing problems? Are your Hadoop jobs, Pig scripts, or Hive queries not working as fast as you intend? Are you looking to understand the benefits of upgrading Hadoop? If the answer is yes to any of these, this book is for you. It assumes novice-level familiarity with Hadoop.
Table of Contents (15 chapters)
14
Index

The advanced Pig operators


In this section, we will examine some of the advanced features and hints available in Pig operators.

The advanced FOREACH operator

The FOREACH operator is primarily used to transform every record of the input relation into a transformed record. A list of expressions is used to make this transformation. There are situations where the FOREACH operator can increase the number of output records. They are discussed in the following sections.

The FLATTEN operator

The FLATTEN keyword is an operator, though it looks like a UDF in syntax. It is used to un-nest nested tuples and bags. However, the semantics of the elimination of nesting is different when it is used on tuples when compared to bags.

FLATTEN on a nested tuple yields a single tuple, as shown in the following snippet. All the nested tuples are elevated to the topmost level.

Consider data of the following nature:

(1, (2, 3, 4)) 
X = FOREACH A GENERATE $0, FLATTEN($1); 

This will yield (1,2,3,4) as the resulting tuple...