Book Image

Apache Hive Essentials

By : Dayong Du
Book Image

Apache Hive Essentials

By: Dayong Du

Overview of this book

Table of Contents (17 chapters)
Apache Hive Essentials
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Streaming


Hive can also leverage the streaming feature in Hadoop to transform data in an alternative way. The streaming API opens an I/O pipe to an external process (script). Then, the process reads data from the standard input and writes the results out through the standard output. In Hive, we can use TRANSFORM clauses in HQL directly, and embed the mapper and the reducer scripts written in commands, shell scripts, Java, or other programming languages. Although streaming brings overhead by using serialization/deserialization between processes, it is a simpler coding mode for developers, especially non-Java developers. The syntax of the TRANSFORM clause is as follows:

FROM (
    FROM src
    SELECT TRANSFORM '(' expression (',' expression)* ')'
    (inRowFormat)?
    USING 'map_user_script'
    (AS colName (',' colName)*)?
    (outRowFormat)? (outRecordReader)?
    (CLUSTER BY?|DISTRIBUTE BY? SORT BY?) src_alias
 )
 SELECT TRANSFORM '(' expression (',' expression)* ')'
 (inRowFormat)?
 USING...