Book Image

HBase Design Patterns

By : Mark Kerzner, Sujee Maniyam

Book Image

HBase Design Patterns

By: Mark Kerzner, Sujee Maniyam

Overview of this book

<p>With the increasing use of NoSQL in general and HBase in particular, knowing how to build practical applications depends on the application of design patterns. These patterns, distilled from extensive practical experience of multiple demanding projects, guarantee the correctness and scalability of the HBase application. They are also generally applicable to most NoSQL databases.</p> <p>Starting with the basics, this book will show you how to install HBase in different node settings. You will then be introduced to key generation and management and the storage of large files in HBase. Moving on, this book will delve into the principles of using time-based data in HBase, and show you some cases on denormalization of data while working with HBase. Finally, you will learn how to translate the familiar SQL design practices into the NoSQL world. With this concise guide, you will get a better idea of typical storage patterns, application design templates, HBase explorer in multiple scenarios with minimum effort, and reading data from multiple region servers.</p>

HBase Design Patterns

HBase Design Patterns

Credits

About the Authors

About the Authors

About the Reviewers

About the Reviewers

www.PacktPub.com

www.PacktPub.com

Preface

Free Chapter

Starting Out with HBase

Starting Out with HBase

Installing HBase

Selecting an instance

Security groups

Starting the instance

Reading, Writing, and Using SQL

Reading, Writing, and Using SQL

Inspecting the cluster

HBase tables, families, and cells

The HBase shell

Project Phoenix — a SQL for HBase

Using HBase Tables for Single Entities

Using HBase Tables for Single Entities

Storing user information

Sets, maps, and lists

Generating the test data

Analyzing your query

Dealing with Large Files

Dealing with Large Files

Storing files using keys

What to do when your binary files grow larger

Time Series Data

Time Series Data

Using time-based keys to store time series data

Avoiding region hotspotting

Tall and narrow rows versus wide rows

OpenTSDB principles

Denormalization Use Cases

Denormalization Use Cases

Storing all the objects for a user

Dealing with lost usernames and passwords

Tables for storing videos

A popularity contest

The section tag index

Advanced Patterns for Data Modeling

Advanced Patterns for Data Modeling

Many-to-many relationships in HBase

Applying the many-to-many relationship techniques for a video site

Event time data – keeping track of what is going on

Dealing with transactions

Trafodion – transactional SQL on HBase

Performance Optimization

Performance Optimization

Loading bulk data into HBase

Importing data into HBase using MapReduce

Importing data from HDFS into HBase

Profiling HBase applications

Benchmarking or load testing HBase

Monitoring HBase

Index

Customer Reviews

5 star

0

4 star

0

3 star

0

2 star

0

1 star

0

Index

A

advantages, Twitter solution
- low cost / Twitter solution to store large files
- high performance / Twitter solution to store large files
- easy to operate / Twitter solution to store large files
Amazon S3 storage
- about / Amazon S3 storage for very large objects
Amazon Web Services (AWS) / Creating a distributed HBase cluster
Apache BigTop
- project, URL / Summary
autoflush
- enabling / Turning off autofush
AWS
- website / Creating a distributed HBase cluster

B

Baird Petrophysical
- URL / Using time-based keys to store time series data
batch puts
- about / Batch writes
batch writes
- about / Batch writes
benchmarks
- PerformanceEvaluation / HBase's built-in benchmark
- YCSB / YCSB
- JMeter / JMeter for custom workloads
Big Data cartoon series
- reference link / Avoiding region hotspotting
BigTable / Installing HBase
blobs
- about / Using Google Blobstore to store large files
Blobstore API
- about / Using Google Blobstore to store large files
bootstrapping
- about / Loading bulk data into HBase
Build Your Own Clusters (BYOC) / Installing HBase

C

Cassandra
- URL / Project Phoenix — a SQL for HBase
Cloudera blog post
- URL / Adding storage
Cloudera Manager / Inspecting the cluster
Cloudera Manager (CM) / Creating a distributed HBase cluster
cluster
- inspecting / Inspecting the cluster
collectors
- about / OpenTSDB
- reference link / OpenTSDB
column-oriented store / HBase tables, families, and cells
compactions, OpenTSDB
- about / Compactions
counters
- used, for popularity contest / A popularity contest
Coursera
- URL / Creating a many-to-many relationship for a university with students and courses
create command / The HBase shell

D

data
- generating, for performance testing / Generating data for performance testing, Generating data for performance testing
data files, staging into HDFS
- HBase table, creating / Creating an HBase table
- import, running / Run the import
data import, from HDFS into HBase
- about / Importing data from HDFS into HBase
- Pig for MapReduce / Pig for MapReduce
- Java MapReduce / Java MapReduce
- bulk loader utility, using / Using HBase's bulk loader utility
- bulk import scenarios / Bulk import scenarios
Devops (development and operations) / Installing HBase
distributed computing environment (DCE) / Using UUID
distributed HBase cluster
- creating / Creating a distributed HBase cluster
downloadurl map<varchar,varchar>
- about / A practical lab

E

Effective C++
- about / Profiling HBase applications
Elastic Compute Cloud (EC2) web service / Creating a distributed HBase cluster
events
- tracking / Event time data – keeping track of what is going on

F

Facebook's Haystack
- about / Facebook's Haystack for the storage of large files
- reference link / Facebook's Haystack for the storage of large files
files
- storing, keys used / Storing files using keys
- larger binary files, handling / What to do when your binary files grow larger
Flume module / OpenTSDB principles

G

Ganglia
- about / Ganglia
- metrics, enabling / Ganglia
generators lab / Generating data for performance testing
GitHub
- URL / The HBase shell, Installing Phoenix, Generating data for performance testing
GitHub repository
- URL / Dealing with transactions
globally unique identifier (GUID) / Using UUID
gnuplot
- using / OpenTSDB principles
Google BlobStorage
- about / Using Google Blobstore to store large files
- reference link / Using Google Blobstore to store large files

H

Hadoop illuminated
- book, URL / Summary
Hadoop illuminated forum
- URL / Starting the instance
HBase
- installing / Installing HBase
- many-to-many relationships / Many-to-many relationships in HBase
- transactional SQL / Trafodion – transactional SQL on HBase
- ImportTsv / Using HBase's bulk loader utility
- monitoring / Monitoring HBase
- metrics, collecting via JMX / Collecting metrics via the JMX interface
HBase's built-in benchmark
- about / HBase's built-in benchmark
HBase Apache
- URL / Creating a single-node HBase cluster
HBase applications
- profiling / Profiling HBase applications
- high-performing HBase writes, tips / More tips for high-performing HBase writes
- high-performing HBase reads, tips / More tips for high-performing HBase reads
HBase cells
- about / HBase tables, families, and cells
HBase documentation
- URL / The HBase shell
HBase families
- about / HBase tables, families, and cells
HBase Shell
- about / The HBase shell
- status command / The HBase shell
- create command / The HBase shell
HBase tables
- about / HBase tables, families, and cells
HDFS
- data, importing into HBase / Importing data from HDFS into HBase
HDPP
- URL / Generating the test data
high-performing HBase reads
- tips / More tips for high-performing HBase reads
- scan cache, setting / The scan cache
- family, specifying / Only read the families or columns needed
- columns, specifying / Only read the families or columns needed
- block cache, disabling / The block cache
high-performing HBase writes
- tips / More tips for high-performing HBase writes
- batch writes / Batch writes
- memory buffers, setting / Setting memory buffers
- autoflush, disabling / Turning off autofush
- WAL, disabling / Turning off WAL
HortonWorks / Summary
HTrace library / Generating data for performance testing

I

ImportTsv
- about / Using HBase's bulk loader utility
- using / Using HBase's bulk loader utility
- data files, staging into HDFS / Staging data files into HDFS
- parameters / Run the import
index tables / Dealing with lost usernames and passwords
initial loading
- about / Loading bulk data into HBase
instance
- selecting / Selecting an instance
- spot instances / Spot instances
- starting / Starting the instance

J

Java MapReduce
- about / Java MapReduce
JMeter
- about / Benchmarking or load testing HBase, JMeter for custom workloads
JProfiler
- about / Profiling HBase applications

K

KairosDB
- reference link / The UID table schema
Kiji / The HBase shell

L

larger binary files, handling
- approaches / What to do when your binary files grow larger
- Google BlobStorage / Using Google Blobstore to store large files
- Facebook's Haystack / Facebook's Haystack for the storage of large files
- Twitter solution / Twitter solution to store large files
- Amazon S3 storage / Amazon S3 storage for very large objects
- practical approach / A practical approach
- multistep approach / Practical recommendations
- practical lab / A practical lab
lists
- about / Sets, maps, and lists
lookup tables / Dealing with lost usernames and passwords

M

Mac / Selecting an instance
many-to-many relationship
- about / Many-to-many relationships in HBase
- creating, for university with students and courses / Creating a many-to-many relationship for a university with students and courses
- creating, for social network / Creating a many-to-many relationship for a social network
- applying, for video site / Applying the many-to-many relationship techniques for a video site
MapR / Summary
MapReduce
- about / Importing data into HBase using MapReduce
- used, for importing data into HBase / Importing data into HBase using MapReduce
maps
- about / Sets, maps, and lists
memory buffers
- setting / Setting memory buffers
metrics
- collecting, via JMX interface / Collecting metrics via the JMX interface
MongoDB
- URL / Project Phoenix — a SQL for HBase
multistep approach, for storing larger files
- about / Practical recommendations

N

NoSQL / Installing HBase

O

objects
- storing, for user / Storing all the objects for a user
Open Software Foundation (OSF) / Using UUID
OpenTSDB
- about / OpenTSDB principles, OpenTSDB
- architecture / OpenTSDB principles
- tools, using / OpenTSDB principles
- overall design / The overall design of TSDB
- row key / The row key
- timestamp / The timestamp
- compactions / Compactions
- UID table schema / The UID table schema
- reference link, for design documentation / The UID table schema
overall design, TSDB
- about / The overall design of TSDB

P

passwords
- lost, dealing with / Dealing with lost usernames and passwords
PerformanceEvaluation
- about / HBase's built-in benchmark
performance optimization, HBase
- bulk data, loading / Loading bulk data into HBase
- data, importing with MapReduce / Importing data into HBase using MapReduce
- data, importing from HDFS / Importing data from HDFS into HBase
- HBase applications, profiling / Profiling HBase applications
- HBase, benchmarking / Benchmarking or load testing HBase
- HBase, load testing / Benchmarking or load testing HBase
- HBase, monitoring / Monitoring HBase
performance testing
- data, generating for / Generating data for performance testing, Generating data for performance testing
Phoenix
- about / Project Phoenix — a SQL for HBase
- URL / Project Phoenix — a SQL for HBase
- installing / Installing Phoenix
Pig
- about / Pig for MapReduce
popularity contest
- counters used / A popularity contest
practical approach, for storing larger files
- about / A practical approach
practical lab, for storing larger files
- about / A practical lab

Q

query
- analyzing / Analyzing your query, Exercise, Solution
- analyzing, URL / Solution

R

region hotspotting, time series data
- avoiding / Avoiding region hotspotting
- indices, avoiding / Avoiding region hotspotting
- writes, randomizing / Avoiding region hotspotting
- shard identifier, prefixing to key / Avoiding region hotspotting
RegionServer / Inspecting the cluster
relational database management system (RDBMS)
- about / A solution for storing user information
row key / HBase tables, families, and cells
row key, OpenTSDB
- about / The row key

S

section tag index
- about / The section tag index
security groups
- about / Security groups
sets
- about / Sets, maps, and lists
single-node HBase cluster
- creating / Creating a single-node HBase cluster
sink / Importing data into HBase using MapReduce
social network
- many-to-many relationships, creating for / Creating a many-to-many relationship for a social network
source / Importing data into HBase using MapReduce
Spanner
- about / Trafodion – transactional SQL on HBase
spot instances
- about / Spot instances
Sqoop / Bulk import scenarios
SQuirrel / Installing Phoenix
Statsd publisher / OpenTSDB principles
status command / The HBase shell
storage
- adding / Adding storage
STORE command
- about / Pig for MapReduce

T

tables
- for, storing videos / Tables for storing videos
tags set<varchar>
- about / A practical lab
test data
- generating / Generating the test data
time-based keys
- used, for storing time series data / Using time-based keys to store time series data
time series data
- storing, time-based keys used / Using time-based keys to store time series data
- region hotspotting, avoiding / Avoiding region hotspotting
- tall and narrow rows, versus wide rows / Tall and narrow rows versus wide rows
- OpenTSDB principles / OpenTSDB principles
- best practices / The timestamp
timestamp, OpenTSDB
- about / The timestamp
Trafodion
- about / Trafodion – transactional SQL on HBase
transactional SQL
- on HBase / Trafodion – transactional SQL on HBase
transactions
- about / Dealing with transactions
- URL / Dealing with transactions
Twitter solution
- for storing large files / Twitter solution to store large files
- reference link / Twitter solution to store large files
- advantages / Twitter solution to store large files

U

UID table schema, OpenTSDB
- about / The UID table schema
user information
- storing / Storing user information, A solution for storing user information
usernames
- lost, dealing with / Dealing with lost usernames and passwords
username varchar
- about / A practical lab
users
- objects, storing / Storing all the objects for a user
UUID
- using / Using UUID
- about / Using UUID

V

Vacuumetrix / OpenTSDB principles
videoid uuid
- about / A practical lab
videos
- storing, tables for / Tables for storing videos
- row, inserting manually / Manual exercises
- data, generating for performance testing / Generating data for performance testing
- popularity contest / A popularity contest
video site
- many-to-many relationship techniques, applying for / Applying the many-to-many relationship techniques for a video site

W

WAL
- enabling / Turning off WAL
WibiData / The HBase shell

Y

YCSB
- about / Benchmarking or load testing HBase, YCSB
- reference link / YCSB

Z

ZooKeeper
- URL / Twitter solution to store large files