Ceph: Designing and Implementing Scalable Storage Systems

Ceph: Designing and Implementing Scalable Storage Systems

By : Michael Hackett, Vikhyat Umrao, Karan Singh, Nick Fisk, Anthony D'Atri, Vaibhav Bhembre

Buy this Book

Ceph: Designing and Implementing Scalable Storage Systems

By: Michael Hackett, Vikhyat Umrao, Karan Singh, Nick Fisk, Anthony D'Atri, Vaibhav Bhembre

Buy this Book

Overview of this book

This Learning Path takes you through the basics of Ceph all the way to gaining in-depth understanding of its advanced features. You’ll gather skills to plan, deploy, and manage your Ceph cluster. After an introduction to the Ceph architecture and its core projects, you’ll be able to set up a Ceph cluster and learn how to monitor its health, improve its performance, and troubleshoot any issues. By following the step-by-step approach of this Learning Path, you’ll learn how Ceph integrates with OpenStack, Glance, Manila, Swift, and Cinder. With knowledge of federated architecture and CephFS, you’ll use Calamari and VSM to monitor the Ceph environment. In the upcoming chapters, you’ll study the key areas of Ceph, including BlueStore, erasure coding, and cache tiering. More specifically, you’ll discover what they can do for your storage system. In the concluding chapters, you will develop applications that use Librados and distributed computations with shared object classes, and see how Ceph and its supporting infrastructure can be optimized. By the end of this Learning Path, you'll have the practical knowledge of operating Ceph in a production environment. This Learning Path includes content from the following Packt products: • Ceph Cookbook by Michael Hackett, Vikhyat Umrao and Karan Singh • Mastering Ceph by Nick Fisk • Learning Ceph, Second Edition by Anthony D'Atri, Vaibhav Bhembre and Karan Singh

Title Page

About Packt

Contributors

Preface

Free Chapter

Ceph - Introduction and Beyond

Introduction

Ceph – the beginning of a new era

RAID – the end of an era

Ceph – the architectural overview

Planning a Ceph deployment

Setting up a virtual infrastructure

Installing and configuring Ceph

Scaling up your Ceph cluster

Using the Ceph cluster with a hands-on approach

Working with Ceph Block Device

Introduction

Configuring Ceph client

Creating Ceph Block Device

Mapping Ceph Block Device

Resizing Ceph RBD

Working with RBD snapshots

Working with RBD clones

Disaster recovery replication using RBD mirroring

Configuring pools for RBD mirroring with one way replication

Configuring image mirroring

Configuring two-way mirroring

Recovering from a disaster!

Working with Ceph and OpenStack

Introduction

Ceph – the best match for OpenStack

Setting up OpenStack

Configuring OpenStack as Ceph clients

Configuring Glance for Ceph backend

Configuring Cinder for Ceph backend

Configuring Nova to boot instances from Ceph RBD

Configuring Nova to attach Ceph RBD

Working with Ceph Object Storage

Introduction

Understanding Ceph object storage

RADOS Gateway standard setup, installation, and configuration

Creating the radosgw user

Accessing the Ceph object storage using S3 API

Accessing the Ceph object storage using the Swift API

Integrating RADOS Gateway with OpenStack Keystone

Integrating RADOS Gateway with Hadoop S3A plugin

Working with Ceph Object Storage Multi-Site v2

Introduction

Functional changes from Hammer federated configuration

RGW multi-site v2 requirement

Installing the Ceph RGW multi-site v2 environment

Configuring Ceph RGW multi-site v2

Testing user, bucket, and object sync between master and secondary sites

Working with the Ceph Filesystem

Introduction

Understanding the Ceph Filesystem and MDS

Deploying Ceph MDS

Accessing Ceph FS through kernel driver

Accessing Ceph FS through FUSE client

Exporting the Ceph Filesystem as NFS

Ceph FS – a drop-in replacement for HDFS

Operating and Managing a Ceph Cluster

Introduction

Understanding Ceph service management

Managing the cluster configuration file

Running Ceph with systemd

Scale-up versus scale-out

Scaling out your Ceph cluster

Scaling down your Ceph cluster

Replacing a failed disk in the Ceph cluster

Upgrading your Ceph cluster

Maintaining a Ceph cluster

Ceph under the Hood

Introduction

Ceph scalability and high availability

Understanding the CRUSH mechanism

CRUSH map internals

CRUSH tunables

Ceph cluster map

High availability monitors

Ceph authentication and authorization

I/O path from a Ceph client to a Ceph cluster

Ceph Placement Group

Placement Group states

Creating Ceph pools on specific OSDs

The Virtual Storage Manager for Ceph

Introductionc

Understanding the VSM architecture

Setting up the VSM environment

Getting ready for VSM

Installing VSM

Creating a Ceph cluster using VSM

Exploring the VSM dashboard

Upgrading the Ceph cluster using VSM

VSM roadmap

VSM resources

RAID – the end of an era

The RAID technology has been the fundamental building block for storage systems for years. It has proven successful for almost every kind of data that has been generated in the last 3 decades. But all eras must come to an end, and this time, it's RAID's turn. These systems have started showing limitations and are incapable of delivering to future storage needs. In the course of the last few years, cloud infrastructures have gained strong momentum and are imposing new requirements on storage and challenging traditional RAID systems. In this section, we will uncover the limitations imposed by RAID systems.

RAID rebuilds are painful

The most painful thing in a RAID technology is its super-lengthy rebuild process. Disk manufacturers are packing lots of storage capacity per disk. They are now producing an extra-large capacity of disk drives at a fraction of the price. We no longer talk about 450 GB, 600 GB, or even 1 TB disks, as there is a larger capacity of disks available today. The newer enterprise disk specification offers disks up to 4 TB, 6 TB, and even 10 TB disk drives, and the capacities keep increasing year by year.

Think of an enterprise RAID-based storage system that is made up of numerous 4 TB or 6 TB disk drives. Unfortunately, when such a disk drive fails, RAID will take several hours and even up to days to repair a single failed disk. Meanwhile, if another drive fails from the same RAID group, then it would become a chaotic situation. Repairing multiple large disk drives using RAID is a cumbersome process.

RAID spare disks increases TCO

The RAID system requires a few disks as hot spare disks. These are just free disks that will be used only when a disk fails; else, they will not be used for data storage. This adds extra cost to the system and increases TCO. Moreover, if you're running short of spare disks and immediately a disk fails in the RAID group, then you will face a severe problem.

RAID can be expensive and hardware dependent

RAID requires a set of identical disk drivers in a single RAID group; you would face penalties if you change the disk size, rpm, or disk type. Doing so would adversely affect the capacity and performance of your storage system. This makes RAID highly choosy about the hardware.

Also, enterprise RAID-based systems often require expensive hardware components, such as RAID controllers, which significantly increases the system cost. These RAID controllers will become single points of failure if you do not have many of them.

The growing RAID group is a challenge

RAID can hit a dead end when it's not possible to grow the RAID group size, which means that there is no scale-out support. After a point, you cannot grow your RAID-based system, even though you have money. Some systems allow the addition of disk shelves but up to a very limited capacity; however, these new disk shelves put a load on the existing storage controller. So, you can gain some capacity but with a performance trade-off.

The RAID reliability model is no longer promising

RAID can be configured with a variety of different types; the most common types are RAID5 and RAID6, which can survive the failure of one and two disks, respectively. RAID cannot ensure data reliability after a two-disk failure. This is one of the biggest drawbacks of RAID systems.

Moreover, at the time of a RAID rebuild operation, client requests are most likely to starve for I/O until the rebuild completes. Another limiting factor with RAID is that it only protects against disk failure; it cannot protect against a failure of the network, server hardware, OS, power, or other data center disasters.

After discussing RAID's drawbacks, we can come to the conclusion that we now need a system that can overcome all these drawbacks in performance and cost-effective way. The Ceph storage system is one of the best solutions available today to address these problems. Let's see how.

For reliability, Ceph makes use of the data replication method, which means it does not use RAID, thus overcoming all the problems that can be found in a RAID-based enterprise system. Ceph is a software-defined storage, so we do not require any specialized hardware for data replication; moreover, the replication level is highly customized by means of commands, which means that the Ceph storage administrator can manage the replication factor of a minimum of one and a maximum of a higher number, totally depending on the underlying infrastructure.

In an event of one or more disk failures, Ceph's replication is a better process than RAID. When a disk drive fails, all the data that was residing on that disk at that point of time start recovering from its peer disks. Since Ceph is a distributed system, all the data copies are scattered on the entire cluster of disks in the form of objects, such that no two object's copies should reside on the same disk and must reside in a different failure zone defined by the CRUSH map. The good part is that all the cluster disks participate in data recovery. This makes the recovery operation amazingly fast with the least performance problems. Furthermore, the recovery operation does not require any spare disks; the data is simply replicated to other Ceph disks in the cluster. Ceph uses a weighting mechanism for its disks, so different disk sizes is not a problem.

In addition to the replication method, Ceph also supports another advanced way of data reliability: using the erasure-coding technique. Erasure-coded pools require less storage space compared to replicated pools. In erasure-coding, data is recovered or regenerated algorithmically by erasure code calculation. You can use both the techniques of data availability, that is, replication as well as erasure-coding, in the same Ceph cluster but over different storage pools. We will learn more about the erasure-coding technique in the upcoming chapters.

Ceph: Designing and Implementing Scalable Storage Systems

By : Michael Hackett, Vikhyat Umrao, Karan Singh, Nick Fisk, Anthony D'Atri, Vaibhav Bhembre

Ceph: Designing and Implementing Scalable Storage Systems

By: Michael Hackett, Vikhyat Umrao, Karan Singh, Nick Fisk, Anthony D'Atri, Vaibhav Bhembre

Overview of this book

Related Content you might be interested in

Current Title:

Ceph: Designing and Implementing Scalable Storage Systems

RAID – the end of an era

RAID rebuilds are painful

RAID spare disks increases TCO

RAID can be expensive and hardware dependent

The growing RAID group is a challenge

The RAID reliability model is no longer promising