Learning HBase

Book Image

Learning HBase

By : Shashwat Shriparv

Book Image

Learning HBase

By: Shashwat Shriparv

Overview of this book

Learning HBase

Credits

About the Author

About the Author

Acknowledgments

Acknowledgments

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Understanding the HBase Ecosystem

Understanding the HBase Ecosystem

HBase layout on top of Hadoop

Comparing architectural differences between RDBMs and HBase

HBase in the Hadoop ecosystem

Comparing functional differences between RDBMs and HBase

About the internal storage architecture of HBase

Getting started with HBase

Applications of HBase

HBase pros and cons

Let's Begin with HBase

Let's Begin with HBase

Understanding HBase components in detail

Reading and writing cycle

HBase housekeeping

The HBase delete request

List of available HBase distributions

Prerequisites and capacity planning for HBase

Let's Start Building It

Let's Start Building It

Downloading Java on Ubuntu

Considering host configurations

Installing and configuring SSH

Installing and configuring NTP

Performing capacity planning

Installing and configuring Hadoop

Hadoop start up steps

Configuring Apache HBase

Installing and configuring ZooKeeper

Installing Cloudera Hadoop and HBase

Installing the Hadoop and MapReduce packages

Installing Hadoop on Windows

Optimizing the HBase/Hadoop Cluster

Optimizing the HBase/Hadoop Cluster

Setup types for Hadoop and HBase clusters

Recommendations for CDH cluster configuration

Capacity planning

Hadoop optimization

Optimizing HBase

Optimizing ZooKeeper

Important files in Hadoop

Important files in HBase

The Storage, Structure Layout, and Data Model of HBase

The Storage, Structure Layout, and Data Model of HBase

Data types in HBase

Storing data in HBase – logical view versus actual physical view

Services of HBase

Data model operations

Versioning and why

Deciding the number of the version

Schema designing

Calculating the data size stored in HBase

HBase Cluster Maintenance and Troubleshooting

HBase Cluster Maintenance and Troubleshooting

Hadoop shell commands

HBase shell commands

HBase administration tools

Writing HBase shell scripts

Using the Hadoop tool or JARs for HBase

Connecting HBase with Hive

HBase region management

HBase node management

Implementing security

Troubleshooting the most frequent HBase errors and their explanations

Scripting in HBase

Scripting in HBase

HBase backup and restore techniques

HBase on Windows

Scripting in HBase

Contributing to HBase

Coding HBase in Java

Coding HBase in Java

Setting up the environment for development

Data model Java operations

Advance Coding in Java for HBase

Advance Coding in Java for HBase

Interfaces, classes, and exceptions

Code related to administrative tasks

Data operation code

MapReduce and HBase

RESTful services and Thrift services interface

Coding for HDFS operations

Some advance topics in brief

HBase Use Cases

HBase Use Cases

HBase in industry today

The future of HBase against relational databases

Some real-world project examples' use cases

Useful links and references

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Performing capacity planning

Hadoop and HBase were developed to run commodity hardware so that we can have hundreds of commodity machines and configure a Hadoop/HBase cluster. As data becomes costlier or important, we prefer some good machines so as to provide a robust cluster operation.

We have two scenarios—one in which we have many low-end machines, and another in which we have less number of machines for a cluster to be configured. In the first scenario, what we can do is set the replication factor more as we have many machines with storage and memory, and by setting a higher replication of data, we can make sure that data is available even if a machine fails frequently. For this scenario, we must have a good configuration machine that hosts NameNode, because it's a crucial component of the cluster and a proper back-up plan for metadata. In the second scenario, we might have less number of machines, so it is suggested that these machines must be well configured.

The following table shows...