Apache Sqoop
Apache Sqoop is a tool that has the ability to transfer data between Hadoop and structured data stores, such as a relational database. Thus, it may seem at first that Sqoop is not really a data-collection tool, but I wanted to provide details of it anyway because organizations have use cases where all they want to do is bring (also known as collect) data from their legacy data-warehouse-based relational tables into a distributed store, such as Hadoop. Thus, the discussion of Sqoop brings completes this chapter, in my opinion.
One of the primary places Apache Sqoop really shines is use cases where there is a need to perform expensive ETL processes on large amounts of data, and the enterprise data warehouse can't handle such a memory- and process-consuming task. In such cases, it makes sense to offload the execution to a distributed processing platform, such as Hadoop, and Sqoop fits nicely in this use case as the tool to transfer data between EDW and Hadoop:
The primary use cases...