Book Image

Learning Hadoop 2

By : Gerald Turkington, GABRIELE MODENA
Book Image

Learning Hadoop 2

By: Gerald Turkington, GABRIELE MODENA

Overview of this book

Table of Contents (18 chapters)
Learning Hadoop 2
About the Authors
About the Reviewers

Chapter 7. Hadoop and SQL

MapReduce is a powerful paradigm that enables complex data processing that can reveal valuable insights. As discussed in earlier chapters however, it does require a different mindset and some training and experience on the model of breaking processing analytics into a series of map and reduce steps. There are several products that are built atop Hadoop to provide higher-level or more familiar views of the data held within HDFS, and Pig is a very popular one. This chapter will explore the other most common abstraction implemented atop Hadoop: SQL.

In this chapter, we will cover the following topics:

  • What the use cases for SQL on Hadoop are and why it is so popular

  • HiveQL, the SQL dialect introduced by Apache Hive

  • Using HiveQL to perform SQL-like analysis of the Twitter dataset

  • How HiveQL can approximate common features of relational databases such as joins and views

  • How HiveQL allows the incorporation of user-defined functions into its queries

  • How SQL on Hadoop complements...