HBase Design Patterns

HBase Design Patterns

By : Mark Kerzner, Sujee Maniyam

Buy this Book

HBase Design Patterns

By: Mark Kerzner, Sujee Maniyam

Buy this Book

Overview of this book

<p>With the increasing use of NoSQL in general and HBase in particular, knowing how to build practical applications depends on the application of design patterns. These patterns, distilled from extensive practical experience of multiple demanding projects, guarantee the correctness and scalability of the HBase application. They are also generally applicable to most NoSQL databases.</p> <p>Starting with the basics, this book will show you how to install HBase in different node settings. You will then be introduced to key generation and management and the storage of large files in HBase. Moving on, this book will delve into the principles of using time-based data in HBase, and show you some cases on denormalization of data while working with HBase. Finally, you will learn how to translate the familiar SQL design practices into the NoSQL world. With this concise guide, you will get a better idea of typical storage patterns, application design templates, HBase explorer in multiple scenarios with minimum effort, and reading data from multiple region servers.</p>

HBase Design Patterns

Credits

About the Authors

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Starting Out with HBase

Installing HBase

Selecting an instance

Adding storage

Security groups

Starting the instance

Summary

Reading, Writing, and Using SQL

Inspecting the cluster

HBase tables, families, and cells

The HBase shell

Project Phoenix — a SQL for HBase

Summary

Using HBase Tables for Single Entities

Storing user information

Sets, maps, and lists

Generating the test data

Analyzing your query

Exercise

Summary

Dealing with Large Files

Storing files using keys

Using UUID

What to do when your binary files grow larger

Exercises

Summary

Time Series Data

Using time-based keys to store time series data

Avoiding region hotspotting

Tall and narrow rows versus wide rows

OpenTSDB principles

Summary

Denormalization Use Cases

Storing all the objects for a user

Dealing with lost usernames and passwords

Tables for storing videos

A popularity contest

The section tag index

Summary

Advanced Patterns for Data Modeling

Many-to-many relationships in HBase

Applying the many-to-many relationship techniques for a video site

Event time data – keeping track of what is going on

Dealing with transactions

Trafodion – transactional SQL on HBase

Summary

Performance Optimization

Loading bulk data into HBase

Importing data into HBase using MapReduce

Importing data from HDFS into HBase

Profiling HBase applications

Benchmarking or load testing HBase

Monitoring HBase

Summary

Index

Customer Reviews

5 star

4 star

3 star

2 star

1 star

About the Reviewers

Ricky Ho is a data scientist and programmer, providing advisory and development services in big data analytics, machine learning, and distributed system design projects. He has a wide range of technical interests but is especially passionate about the intersection of machine learning and big data. He has served as the Principal Architect of Microsoft advertising, implementing scalable prediction systems to optimize advertising revenue within a large, web scale deployment. Prior to this, he was a researcher at Adobe's lab where he processed web log data for predictive analytics related research. Before that, he was a distinguished architect in PayPal's risk management team, where he developed a fraud detection system using machine learning and anomaly detection algorithms. Ricky holds 10 patents in distributed computing and cloud resource optimization. He is also an active technical blogger and shares what he learns on his blog at http://horicky.blogspot.com.

Raghu Sakleshpur is a technologist at heart who works in the field of big data, developing and designing solutions specifically in the Hadoop ecosystem. He started off his career in distributed (clustered) systems and transitioned to developing Enterprise Java application (middleware) space, only to return to his true passion of handling big data in both scaled up and scaled out architectures. He is currently working with Intel in the field of big data and spends a good portion of his time working with customers and partners alike to define optimal architectures for specific big data needs.

Sergey Tatarenko is a senior software developer in a major legal e-discovery company in Austin, TX. He received his MSc in Computer Science from Ben-Gurion University of the Negev in Israel and has worked as a software developer since 1999. He started his professional career at Clockwork Solutions, Israel, and worked on a product that was used to build discrete event simulation models. Later, he lead a team of software developers in HyperRoll, but staying farther away from actual software development was not so much fun. In 2008, Sergey agreed to relocate to USA and help his previous employer to finish building their product. In April 2013, he decided to get himself more exposed to big data and started working for a leading legal e-discovery company in Austin, TX. In addition to being a software developer, Sergey is a proud father of three beautiful kids—Ilia, Antony, and Emilia—and a happy husband to his beautiful wife, Ilona. He is also a very active member in the Russian-speaking community of Austin, an enthusiastic builder of Arduino projects at home, and an occasional fisherman.

HBase Design Patterns

By : Mark Kerzner, Sujee Maniyam

HBase Design Patterns

By: Mark Kerzner, Sujee Maniyam

Overview of this book

Related Content you might be interested in

Current Title:

HBase Design Patterns

About the Reviewers