Sqoop is an Apache Foundation package designed to transfer bulk data from HDFS to relational databases. As a data migration tool, Sqoop is used to transfer data to and from HDFS. The primary purpose for Sqoop is to serve as a utility for transferring data between data warehouses and Hadoop clusters. It can also be used as a forensic tool when HDFS data can be exported as relational data.
Sqoop reads data from HDFS and transfers the data to a relational database. It reads entire directories of files and then parses them based on specified delimiters and qualifiers. Sqoop imports the parsed data into databases using a series of INSERT
commands. It then tracks errors and exceptions and reports on any such failed inserts.
Sqoop imports data into the following databases:
HSQLDB
MySQL
Oracle
PostgreSQL
Other databases are supported via Sqoop connectors, including MS SQL Server.
To export HDFS data to a relational database using Sqoop, the investigator runs Sqoop from a machine that...