Book Image

Big Data Forensics: Learning Hadoop Investigations

Book Image

Big Data Forensics: Learning Hadoop Investigations

Overview of this book

Table of Contents (15 chapters)
Big Data Forensics – Learning Hadoop Investigations
Credits
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Collection via Sqoop


Sqoop is an Apache Foundation package designed to transfer bulk data from HDFS to relational databases. As a data migration tool, Sqoop is used to transfer data to and from HDFS. The primary purpose for Sqoop is to serve as a utility for transferring data between data warehouses and Hadoop clusters. It can also be used as a forensic tool when HDFS data can be exported as relational data.

Sqoop reads data from HDFS and transfers the data to a relational database. It reads entire directories of files and then parses them based on specified delimiters and qualifiers. Sqoop imports the parsed data into databases using a series of INSERT commands. It then tracks errors and exceptions and reports on any such failed inserts.

Sqoop imports data into the following databases:

  • HSQLDB

  • MySQL

  • Oracle

  • PostgreSQL

Other databases are supported via Sqoop connectors, including MS SQL Server.

To export HDFS data to a relational database using Sqoop, the investigator runs Sqoop from a machine that...