Learning HBase

Book Image

Learning HBase

By : Shashwat Shriparv

Book Image

Learning HBase

By: Shashwat Shriparv

Overview of this book

Learning HBase

Credits

About the Author

About the Author

Acknowledgments

Acknowledgments

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding the HBase Ecosystem

Understanding the HBase Ecosystem

HBase layout on top of Hadoop

Comparing architectural differences between RDBMs and HBase

HBase in the Hadoop ecosystem

Comparing functional differences between RDBMs and HBase

About the internal storage architecture of HBase

Getting started with HBase

Applications of HBase

HBase pros and cons

Let's Begin with HBase

Let's Begin with HBase

Understanding HBase components in detail

Reading and writing cycle

HBase housekeeping

The HBase delete request

List of available HBase distributions

Prerequisites and capacity planning for HBase

Let's Start Building It

Let's Start Building It

Downloading Java on Ubuntu

Considering host configurations

Installing and configuring SSH

Installing and configuring NTP

Performing capacity planning

Installing and configuring Hadoop

Hadoop start up steps

Configuring Apache HBase

Installing and configuring ZooKeeper

Installing Cloudera Hadoop and HBase

Installing the Hadoop and MapReduce packages

Installing Hadoop on Windows

Optimizing the HBase/Hadoop Cluster

Optimizing the HBase/Hadoop Cluster

Setup types for Hadoop and HBase clusters

Recommendations for CDH cluster configuration

Capacity planning

Hadoop optimization

Optimizing HBase

Optimizing ZooKeeper

Important files in Hadoop

Important files in HBase

The Storage, Structure Layout, and Data Model of HBase

The Storage, Structure Layout, and Data Model of HBase

Data types in HBase

Storing data in HBase – logical view versus actual physical view

Services of HBase

Data model operations

Versioning and why

Deciding the number of the version

Schema designing

Calculating the data size stored in HBase

HBase Cluster Maintenance and Troubleshooting

HBase Cluster Maintenance and Troubleshooting

Hadoop shell commands

HBase shell commands

HBase administration tools

Writing HBase shell scripts

Using the Hadoop tool or JARs for HBase

Connecting HBase with Hive

HBase region management

HBase node management

Implementing security

Troubleshooting the most frequent HBase errors and their explanations

Scripting in HBase

Scripting in HBase

HBase backup and restore techniques

HBase on Windows

Scripting in HBase

Contributing to HBase

Coding HBase in Java

Coding HBase in Java

Setting up the environment for development

Data model Java operations

Advance Coding in Java for HBase

Advance Coding in Java for HBase

Interfaces, classes, and exceptions

Code related to administrative tasks

Data operation code

MapReduce and HBase

RESTful services and Thrift services interface

Coding for HDFS operations

Some advance topics in brief

HBase Use Cases

HBase Use Cases

HBase in industry today

The future of HBase against relational databases

Some real-world project examples' use cases

Useful links and references

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Chapter 1. Understanding the HBase Ecosystem

HBase is a horizontally scalable, distributed, open source, and a sorted map database. It runs on top of Hadoop file system that is Hadoop Distributed File System (HDFS). HBase is a NoSQL nonrelational database that doesn't always require a predefined schema. It can be seen as a scaling flexible, multidimensional spreadsheet where any structure of data is fit with on-the-fly addition of new column fields, and fined column structure before data can be inserted or queried. In other words, HBase is a column-based database that runs on top of Hadoop distributed file system and supports features such as linear scalability (scale out), automatic failover, automatic sharding, and more flexible schema.

HBase is modeled on Google BigTable. It was inspired by Google BigTable, which is compressed, high-performance, proprietary data store built on the Google file system. HBase was a developed as a Hadoop subproject to support storage of structural data, which can take advantage of most distributed files systems (typically, the Hadoop Distributed File System known as HDFS).

The following table contains key information about HBase and its features:

Features	Description
Developed by	Apache
Written in	Java
Type	Column oriented
License	Apache License
Lacking features of relational databases	SQL support, relations, primary, foreign, and unique key constraints, normalization
Website	http://hbase.apache.org
Distributions	Apache, Cloudera
Download link	http://mirrors.advancedhosters.com/apache/hbase/
Mailing lists	The user list: `<[email protected]>` The developer list: `<[email protected]>`
Blog	http://blogs.apache.org/hbase/