Learning HBase

Book Image

Learning HBase

By : Shashwat Shriparv

Book Image

Learning HBase

By: Shashwat Shriparv

Overview of this book

Learning HBase

Credits

About the Author

About the Author

Acknowledgments

Acknowledgments

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding the HBase Ecosystem

Understanding the HBase Ecosystem

HBase layout on top of Hadoop

Comparing architectural differences between RDBMs and HBase

HBase in the Hadoop ecosystem

Comparing functional differences between RDBMs and HBase

About the internal storage architecture of HBase

Getting started with HBase

Applications of HBase

HBase pros and cons

Let's Begin with HBase

Let's Begin with HBase

Understanding HBase components in detail

Reading and writing cycle

HBase housekeeping

The HBase delete request

List of available HBase distributions

Prerequisites and capacity planning for HBase

Let's Start Building It

Let's Start Building It

Downloading Java on Ubuntu

Considering host configurations

Installing and configuring SSH

Installing and configuring NTP

Performing capacity planning

Installing and configuring Hadoop

Hadoop start up steps

Configuring Apache HBase

Installing and configuring ZooKeeper

Installing Cloudera Hadoop and HBase

Installing the Hadoop and MapReduce packages

Installing Hadoop on Windows

Optimizing the HBase/Hadoop Cluster

Optimizing the HBase/Hadoop Cluster

Setup types for Hadoop and HBase clusters

Recommendations for CDH cluster configuration

Capacity planning

Hadoop optimization

Optimizing HBase

Optimizing ZooKeeper

Important files in Hadoop

Important files in HBase

The Storage, Structure Layout, and Data Model of HBase

The Storage, Structure Layout, and Data Model of HBase

Data types in HBase

Storing data in HBase – logical view versus actual physical view

Services of HBase

Data model operations

Versioning and why

Deciding the number of the version

Schema designing

Calculating the data size stored in HBase

HBase Cluster Maintenance and Troubleshooting

HBase Cluster Maintenance and Troubleshooting

Hadoop shell commands

HBase shell commands

HBase administration tools

Writing HBase shell scripts

Using the Hadoop tool or JARs for HBase

Connecting HBase with Hive

HBase region management

HBase node management

Implementing security

Troubleshooting the most frequent HBase errors and their explanations

Scripting in HBase

Scripting in HBase

HBase backup and restore techniques

HBase on Windows

Scripting in HBase

Contributing to HBase

Coding HBase in Java

Coding HBase in Java

Setting up the environment for development

Data model Java operations

Advance Coding in Java for HBase

Advance Coding in Java for HBase

Interfaces, classes, and exceptions

Code related to administrative tasks

Data operation code

MapReduce and HBase

RESTful services and Thrift services interface

Coding for HDFS operations

Some advance topics in brief

HBase Use Cases

HBase Use Cases

HBase in industry today

The future of HBase against relational databases

Some real-world project examples' use cases

Useful links and references

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

HBase pros and cons

Let's now briefly discuss HBase pros and cons.

The following are some advantages of HBase:

Great for analytics in association with Hadoop MapReduce
It can handle very large volumes of data
Supports scaling out in coordination with Hadoop file system even on commodity hardware
Fault tolerance
License free
Very flexible on schema design/no fixed schema
Can be integrated with Hive for SQL-like queries, which is better for DBAs who are more familiar with SQL queries
Auto-sharding
Auto failover
Simple client interface
Row-level atomicity, that is, the PUT operation will either write or fail

The following are some missing aspects:

Single point of failure (when only one HMaster is used)
No transaction support
JOINs are handled in MapReduce layer rather than the database itself
Indexed and sorted only on key, but RDBMS can be indexed on some arbitrary field
No built-in authentication or permissions

So overall, we can say if we are in a position to neglect these cons, we can go with HBase which provides many other benefits that are not there in RDBMS. We can see that it's still an evolving technology with Hadoop and with time, it will become more mature and rich, which will make it one of the best tools for analytical database and distributed fault tolerant database. It is an open source Apache project where users and developers can contribute and add more and more features.

Hadoop HBase and a combination of some other Hadoop subproject can do wonders in the data analysis field; using these technologies, the data can be a hidden treasure, which were stored somewhere uselessly as a dump and now they can be very beneficial for understanding various prospects of a specific industry.