At a conceptual level, an HBase table can be seen as a sparse set of rows, but in actual storage, it is stored as per a column family. While defining a table, columns can be added or specified on the run in a column family. We must decide the number and name of the column family at the time of table creation, but columns can be added as required at any point in time while storing the data, and this is the beauty of schema-free when we use HBase.
The following is the logical view of how data is stored in HBase, but in actual these are stored separately with column families:
Row keys |
Time_Stamp |
Column family 1 (CF1) |
Column family 2 (CF2) | |||
---|---|---|---|---|---|---|
CF1:Col 1 |
CF1:Col 2 |
CF2:Col 3 |
CF2:Col4 |
CF2:Col 5 | ||
Row1 |
Time stamp 1 |
Value 3 |
Value 4 |
Value 5 | ||
Row2 |
Time stamp 2 |
Value 6 |
Value 7 |
Value 8 |
Value 9 |
Value 10 |
Row2 |
Time stamp 3 |
Value 11 |
Value 12 |
Value 13 |
So, in physical storage, this table will be stored in two parts...