Ceph Cookbook - Second Edition

By : Vikhyat Umrao, Karan Singh, Michael Hackett

Ceph Cookbook - Second Edition

By: Vikhyat Umrao, Karan Singh, Michael Hackett

Overview of this book

Ceph is a unified distributed storage system designed for reliability and scalability. This technology has been transforming the software-defined storage industry and is evolving rapidly as a leader with its wide range of support for popular cloud platforms such as OpenStack, and CloudStack, and also for virtualized platforms. Ceph is backed by Red Hat and has been developed by community of developers which has gained immense traction in recent years. This book will guide you right from the basics of Ceph , such as creating blocks, object storage, and filesystem access, to advanced concepts such as cloud integration solutions. The book will also cover practical and easy to implement recipes on CephFS, RGW, and RBD with respect to the major stable release of Ceph Jewel. Towards the end of the book, recipes based on troubleshooting and best practices will help you get to grips with managing Ceph storage in a production environment. By the end of this book, you will have practical, hands-on experience of using Ceph efficiently for your storage requirements.

Preface

What this book covers

What you need for this book

Free Chapter

Ceph – Introduction and Beyond

Introduction

Ceph – the beginning of a new era

RAID – the end of an era

Ceph – the architectural overview

Planning a Ceph deployment

Setting up a virtual infrastructure

Installing and configuring Ceph

Scaling up your Ceph cluster

Using the Ceph cluster with a hands-on approach

Working with Ceph Block Device

Introduction

Configuring Ceph client

Creating Ceph Block Device

Mapping Ceph Block Device

Resizing Ceph RBD

Working with RBD snapshots

Working with RBD clones

Disaster recovery replication using RBD mirroring

Configuring pools for RBD mirroring with one way replication

Configuring image mirroring

Configuring two-way mirroring

Recovering from a disaster!

Working with Ceph and OpenStack

Introduction

Ceph – the best match for OpenStack

Setting up OpenStack

Configuring OpenStack as Ceph clients

Configuring Glance for Ceph backend

Configuring Cinder for Ceph backend

Configuring Nova to boot instances from Ceph RBD

Configuring Nova to attach Ceph RBD

Working with Ceph Object Storage

Introduction

Understanding Ceph object storage

RADOS Gateway standard setup, installation, and configuration

Creating the radosgw user

Accessing the Ceph object storage using S3 API

Accessing the Ceph object storage using the Swift API

Integrating RADOS Gateway with OpenStack Keystone

Integrating RADOS Gateway with Hadoop S3A plugin

Working with Ceph Object Storage Multi-Site v2

Introduction

Functional changes from Hammer federated configuration

RGW multi-site v2 requirement

Installing the Ceph RGW multi-site v2 environment

Configuring Ceph RGW multi-site v2

Testing user, bucket, and object sync between master and secondary sites

Working with the Ceph Filesystem

Introduction

Understanding the Ceph Filesystem and MDS

Deploying Ceph MDS

Accessing Ceph FS through kernel driver

Accessing Ceph FS through FUSE client

Exporting the Ceph Filesystem as NFS

Ceph FS – a drop-in replacement for HDFS

Monitoring Ceph Clusters

Introduction

Monitoring Ceph clusters – the classic way

Introducing Ceph Metrics and Grafana

Installing and configuring Ceph Metrics with the Grafana dashboard

Monitoring Ceph clusters with Ceph Metrics with the Grafana dashboard

Operating and Managing a Ceph Cluster

Introduction

Understanding Ceph service management

Managing the cluster configuration file

Running Ceph with systemd

Scale-up versus scale-out

Scaling out your Ceph cluster

Scaling down your Ceph cluster

Replacing a failed disk in the Ceph cluster

Upgrading your Ceph cluster

Maintaining a Ceph cluster

Ceph under the Hood

Introduction

Ceph scalability and high availability

Understanding the CRUSH mechanism

CRUSH map internals

CRUSH tunables

Ceph cluster map

High availability monitors

Ceph authentication and authorization

I/O path from a Ceph client to a Ceph cluster

Ceph Placement Group

Placement Group states

Creating Ceph pools on specific OSDs

Production Planning and Performance Tuning for Ceph

Introduction

The dynamics of capacity, performance, and cost

Choosing hardware and software components for Ceph

Ceph recommendations and performance tuning

Ceph erasure-coding

Creating an erasure-coded pool

Ceph cache tiering

Creating a pool for cache tiering

Creating a cache tier

Configuring a cache tier

Testing a cache tier

Cache tiering – possible dangers in production environments

The Virtual Storage Manager for Ceph

Introductionc

Understanding the VSM architecture

Setting up the VSM environment

Getting ready for VSM

Installing VSM

Creating a Ceph cluster using VSM

Exploring the VSM dashboard

Upgrading the Ceph cluster using VSM

VSM roadmap

VSM resources

Ceph – the beginning of a new era

Data storage requirements have grown explosively over the last few years. Research shows that data in large organizations is growing at a rate of 40 to 60 percent annually, and many companies are doubling their data footprint each year. IDC analysts have estimated that worldwide, there were 54.4 exabytes of total digital data in the year 2000. By 2007, this reached 295 exabytes, and by 2020, it's expected to reach 44 zettabytes worldwide. Such data growth cannot be managed by traditional storage systems; we need a system such as Ceph, which is distributed, scalable and most importantly, economically viable. Ceph has been especially designed to handle today's as well as the future's data storage needs.

Software-defined storage – SDS

SDS is what is needed to reduce TCO for your storage infrastructure. In addition to reduced storage cost, SDS can offer flexibility, scalability, and reliability. Ceph is a true SDS solution; it runs on commodity hardware with no vendor lock-in and provides low cost per GB. Unlike traditional storage systems, where hardware gets married to software, in SDS, you are free to choose commodity hardware from any manufacturer and are free to design a heterogeneous hardware solution for your own needs. Ceph's software-defined storage on top of this hardware provides all the intelligence you need and will take care of everything, providing all the enterprise storage features right from the software layer.

Cloud storage

One of the drawbacks of a cloud infrastructure is the storage. Every cloud infrastructure needs a storage system that is reliable, low-cost, and scalable with a tighter integration than its other cloud components. There are many traditional storage solutions out there in the market that claim to be cloud-ready, but today, we not only need cloud readiness, but also a lot more beyond that. We need a storage system that should be fully integrated with cloud systems and can provide lower TCO without any compromise to reliability and scalability. Cloud systems are software-defined and are built on top of commodity hardware; similarly, it needs a storage system that follows the same methodology, that is, being software-defined on top of commodity hardware, and Ceph is the best choice available for cloud use cases.

Ceph has been rapidly evolving and bridging the gap of a true cloud storage backend. It is grabbing the center stage with every major open source cloud platform, namely OpenStack, CloudStack, and OpenNebula. Moreover, Ceph has succeeded in building up beneficial partnerships with cloud vendors such as Red Hat, Canonical, Mirantis, SUSE, and many more. These companies are favoring Ceph big time and including it as an official storage backend for their cloud OpenStack distributions, thus making Ceph a red-hot technology in cloud storage space.

The OpenStack project is one of the finest examples of open source software powering public and private clouds. It has proven itself as an end-to-end open source cloud solution. OpenStack is a collection of programs, such as Cinder, Glance, and Swift, which provide storage capabilities to OpenStack. These OpenStack components require a reliable, scalable, and all in one storage backend such as Ceph. For this reason, OpenStack and Ceph communities have been working together for many years to develop a fully compatible Ceph storage backend for the OpenStack.

Cloud infrastructure based on Ceph provides much-needed flexibility to service providers to build Storage-as-a-Service and Infrastructure-as-a-Service solutions, which they cannot achieve from other traditional enterprise storage solutions as they are not designed to fulfill cloud needs. Using Ceph, service providers can offer low-cost, reliable cloud storage to their customers.

Unified next-generation storage architecture

The definition of unified storage has changed lately. A few years ago, the term unified storage referred to providing file and block storage from a single system. Now because of recent technological advancements, such as cloud computing, big data, and internet of Things, a new kind of storage has been evolving, that is, object storage. Thus, all storage systems that do not support object storage are not really unified storage solutions. A true unified storage is like Ceph; it supports blocks, files, and object storage from a single system.

In Ceph, the term unified storage is more meaningful than what existing storage vendors claim to provide. It has been designed from the ground up to be future-ready, and it's constructed such that it can handle enormous amounts of data. When we call Ceph future ready, we mean to focus on its object storage capabilities, which is a better fit for today's mix of unstructured data rather than blocks or files. Everything in Ceph relies on intelligent objects, whether it's block storage or file storage. Rather than managing blocks and files underneath, Ceph manages objects and supports block-and-file-based storage on top of it. Objects provide enormous scaling with increased performance by eliminating metadata operations. Ceph uses an algorithm to dynamically compute where the object should be stored and retrieved from.

The traditional storage architecture of SAN and NAS systems is very limited. Basically, they follow the tradition of controller high availability; that is, if one storage controller fails, it serves data from the second controller. But, what if the second controller fails at the same time, or even worse, if the entire disk shelf fails? In most cases, you will end up losing your data. This kind of storage architecture, which cannot sustain multiple failures, is definitely what we do not want today. Another drawback of traditional storage systems is their data storage and access mechanism. They maintain a central lookup table to keep track of metadata, which means that every time a client sends a request for a read or write operation, the storage system first performs a lookup in the huge metadata table, and after receiving the real data location, it performs the client operation. For a smaller storage system, you might not notice performance hits, but think of a large storage cluster—you would definitely be bound by performance limits with this approach. This would even restrict your scalability.

Ceph does not follow this traditional storage architecture; in fact, the architecture has been completely reinvented. Rather than storing and manipulating metadata, Ceph introduces a newer way: the CRUSH algorithm. CRUSH stands for Controlled Replication Under Scalable Hashing. Instead of performing a lookup in the metadata table for every client request, the CRUSH algorithm computes on demand where the data should be written to or read from. By computing metadata, the need to manage a centralized table for metadata is no longer there. Modern computers are amazingly fast and can perform a CRUSH lookup very quickly; moreover, this computing load, which is generally not too much, can be distributed across cluster nodes, leveraging the power of distributed storage. In addition to this, CRUSH has a unique property, which is infrastructure awareness. It understands the relationship between various components of your infrastructure and stores your data in a unique failure zone, such as a disk, node, rack, row, and data center room, among others. CRUSH stores all the copies of your data such that it is available even if a few components fail in a failure zone. It is due to CRUSH that Ceph can handle multiple component failures and provide reliability and durability.

The CRUSH algorithm makes Ceph self-managing and self-healing. In the event of component failure in a failure zone, CRUSH senses which component has failed and determines the effect on the cluster. Without any administrative intervention, CRUSH self-manages and self-heals by performing a recovering operation for the data lost due to failure. CRUSH regenerates the data from the replica copies that the cluster maintains. If you have configured the Ceph CRUSH map in the correct order, it makes sure that at least one copy of your data is always accessible. Using CRUSH, we can design a highly reliable storage infrastructure with no single point of failure. This makes Ceph a highly scalable and reliable storage system that is future-ready. CRUSH is covered more in detail in Chapter 9, Ceph Under the Hood.

Ceph Cookbook - Second Edition

By : Vikhyat Umrao, Karan Singh, Michael Hackett

Ceph Cookbook - Second Edition

By: Vikhyat Umrao, Karan Singh, Michael Hackett

Overview of this book

Related Content you might be interested in

Current Title:

Ceph Cookbook - Second Edition

Learning Ceph

Mastering Ceph,

Mastering Ceph