HBase has an importtsv
tool to support importing data from TSV files into HBase. Using this tool to load text data into HBase is very efficient, because it runs a MapReduce job to perform the importing. Even if you are going to load data from an existing RDBMS, you can dump data into a text file somehow and then use importtsv
to import dumped data into HBase. This approach works well when importing a huge amount of data, as dumping data is much faster than executing SQL on RDBMS.
The importtsv
tool does not only load data directly into an HBase table, it also supports generating HBase internal format (HFile) files, so that you can use the HBase bulk load tool to load generated files directly into a running HBase cluster. This way, you reduce network traffic that was generated from the data transfers and your HBase load, during the migration.
This recipe describes usage of the importtsv
and bulk load tools. We first demonstrate loading...