-
Book Overview & Buying
-
Table Of Contents
Pentaho Data Integration Cookbook - Second Edition - Second Edition
HBase is another component in the Hadoop ecosystem. It is a columnar database, which stores datasets based on the columns, instead of the rows that make it up. This allows for higher compression and faster searching, making columnar databases ideal for the kinds of analytical queries that can cause significant performance issues in traditional relational databases.
For this recipe we will be using the Baseball Dataset loaded into Hadoop in the recipe Loading data into Hadoop, (also in this chapter). It is recommended that the recipe Loading data into Hadoop is performed before continuing.
In this recipe, we will be loading the Schools.csv, Master.csv, and SchoolsPlayers.csv files. The data relates (via the SchoolsPlayers.csv file) schools (found in the Schools.csv file) to players (found in the Master.csv file). This data is designed for a relational database, so we will be tweaking the data to take advantage of Hbase's data store capabilities. Before...
Change the font size
Change margin width
Change background colour