Book Image

HDInsight Essentials - Second Edition

By : Rajesh Nadipalli
Book Image

HDInsight Essentials - Second Edition

By: Rajesh Nadipalli

Overview of this book

Table of Contents (16 chapters)
HDInsight Essentials Second Edition
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Using Sqoop to move data from RDBMS to Data Lake


Sqoop enables us to transfer data between any relational database and Hadoop. You can import data from any relational database that has a JDBC adaptor such as SQL Server, MySQL, Oracle, Teradata, and others, to HDInsight.

Key benefits

The major benefits of using Sqoop to move data are as follows:

  • Leverages RDBMS metadata to get the column data types

  • It is simple to script and uses SQL

  • It can be used to handle change data capture by importing daily transactional data to HDInsight

  • It uses MapReduce for export and import that enables parallel and efficient data movement

Two modes of using Sqoop

Sqoop can be used to get data into and out of Hadoop; it has two modes of operation:

  • Sqoop import: Data moves from RDBMS to HDInsight

  • Sqoop export: Data moves from HDInsight to RDBMS

The following screenshot shows you two modes of using Sqoop:

Using Sqoop to import data (SQL to Hadoop)

The following is the setup for a Sqoop import demonstration:

  • Source database: Teradata...