Ceph Cookbook

Ceph Cookbook

Overview of this book

Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. This cutting-edge technology has been transforming the storage industry, and is evolving rapidly as a leader in software-defined storage space, extending full support to cloud platforms such as Openstack and Cloudstack, including virtualization platforms. It is the most popular storage backend for Openstack, public, and private clouds, so is the first choice for a storage solution. Ceph is backed by RedHat and is developed by a thriving open source community of individual developers as well as several companies across the globe. This book takes you from a basic knowledge of Ceph to an expert understanding of the most advanced features, walking you through building up a production-grade Ceph storage cluster and helping you develop all the skills you need to plan, deploy, and effectively manage your Ceph cluster. Beginning with the basics, you’ll create a Ceph cluster, followed by block, object, and file storage provisioning. Next, you’ll get a step-by-step tutorial on integrating it with OpenStack and building a Dropbox-like object storage solution. We’ll also take a look at federated architecture and CephFS, and you’ll dive into Calamari and VSM for monitoring the Ceph environment. You’ll develop expert knowledge on troubleshooting and benchmarking your Ceph storage cluster. Finally, you’ll get to grips with the best practices to operate Ceph in a production environment.

Ceph Cookbook

Credits

Foreword

About the Author

About the Reviewers

www.PacktPub.com

Preface

Free Chapter

Ceph – Introduction and Beyond

Introduction

Ceph – the beginning of a new era

RAID – the end of an era

Ceph – the architectural overview

Planning the Ceph deployment

Setting up a virtual infrastructure

Installing and configuring Ceph

Scaling up your Ceph cluster

Using Ceph cluster with a hands-on approach

Working with Ceph Block Device

Introduction

Working with Ceph Block Device

Configuring Ceph client

Creating Ceph Block Device

Mapping Ceph Block Device

Ceph RBD resizing

Working with RBD snapshots

Working with RBD Clones

A quick look at OpenStack

Ceph – the best match for OpenStack

Setting up OpenStack

Configuring OpenStack as Ceph clients

Configuring Glance for Ceph backend

Configuring Cinder for Ceph backend

Configuring Nova to attach Ceph RBD

Configuring Nova to boot instances from Ceph RBD

Working with Ceph Object Storage

Introduction

Understanding Ceph object storage

RADOS Gateway standard setup, installation, and configuration

Creating the radosgw user

Accessing Ceph object storage using S3 API

Accessing Ceph object storage using the Swift API

Integrating RADOS Gateway with OpenStack Keystone

Configuring Ceph federated gateways

Testing the radosgw federated configuration

Building file sync and share service using RGW

Working with the Ceph Filesystem

Introduction

Understanding Ceph Filesystem and MDS

Deploying Ceph MDS

Accessing CephFS via kernel driver

Accessing CephFS via FUSE client

Exporting Ceph Filesystem as NFS

ceph-dokan – CephFS for Windows clients

CephFS a drop-in replacement for HDFS

Monitoring Ceph Clusters using Calamari

Introduction

Ceph cluster monitoring – the classic way

Monitoring Ceph clusters

Introducing Ceph Calamari

Building Calamari server packages

Building Calamari client packages

Setting up Calamari master server

Adding Ceph nodes to Calamari

Monitoring Ceph clusters from the Calamari dashboard

Troubleshooting Calamari

Operating and Managing a Ceph Cluster

Introduction

Understanding Ceph service management

Managing the cluster configuration file

Running Ceph with SYSVINIT

Running Ceph as a service

Scale-up versus scale-out

Scaling out your Ceph cluster

Scaling down your Ceph cluster

Replacing a failed disk in the Ceph cluster

Upgrading your Ceph cluster

Maintaining a Ceph cluster

Ceph under the Hood

Introduction

Ceph scalability and high availability

Understanding the CRUSH mechanism

CRUSH map internals

Ceph cluster map

High availability monitors

Ceph authentication and authorization

Ceph dynamic cluster management

Ceph placement group

Placement group states

Creating Ceph pools on specific OSDs

Production Planning and Performance Tuning for Ceph

Introduction

The dynamics of capacity, performance, and cost

Choosing the hardware and software components for Ceph

Ceph recommendation and performance tuning

Ceph erasure coding

Creating an erasure coded pool

Ceph cache tiering

Creating a pool for cache tiering

Creating a cache tier

Configuring a cache tier

Testing a cache tier

The Virtual Storage Manager for Ceph

Introduction

Understanding the VSM architecture

Setting up the VSM environment

Getting ready for VSM

Installing VSM

Creating a Ceph cluster using VSM

Exploring the VSM dashboard

Upgrading the Ceph cluster using VSM

VSM roadmap

VSM resources

Ceph – the beginning of a new era

Data storage requirements have grown explosively over the last few years. Research shows that data in large organizations is growing at a rate of 40 to 60 percent annually, and many companies are doubling their data footprint each year. IDC analysts estimated that worldwide, there were 54.4 exabytes of total digital data in the year 2000. By 2007, this reached 295 exabytes, and by 2020, it's expected to reach 44 zettabytes worldwide. Such data growth cannot be managed by traditional storage systems; we need a system like Ceph, which is distributed, scalable and most importantly, economically viable. Ceph has been designed especially to handle today's as well as the future's data storage needs.

Software Defined Storage (SDS)

SDS is what is needed to reduce TCO for your storage infrastructure. In addition to reduced storage cost, an SDS can offer flexibility, scalability, and reliability. Ceph is a true SDS solution; it runs on commodity hardware with no vendor lock-in and provides low cost per GB. Unlike traditional storage systems where hardware gets married to software, in SDS, you are free to choose commodity hardware from any manufacturer and are free to design a heterogeneous hardware solution for your own needs. Ceph's software-defined storage on top of this hardware provides all the intelligence you need and will take care of everything, providing all the enterprise storage features right from the software layer.

Cloud storage

One of the drawbacks of a cloud infrastructure is the storage. Every cloud infrastructure needs a storage system that is reliable, low-cost, and scalable with a tighter integration than its other cloud components. There are many traditional storage solutions out there in the market that claim to be cloud ready, but today we not only need cloud readiness, but a lot more beyond that. We need a storage system that should be fully integrated with cloud systems and can provide lower TCO without any compromise to reliability and scalability. The cloud systems are software defined and are built on top of commodity hardware; similarly, it needs a storage system that follows the same methodology, that is, being software defined on top of commodity hardware, and Ceph is the best choice available for cloud use cases.

Ceph has been rapidly evolving and bridging the gap of a true cloud storage backend. It is grabbing center stage with every major open source cloud platform, namely OpenStack, CloudStack, and OpenNebula. Moreover, Ceph has succeeded in building up beneficial partnerships with cloud vendors such as Red Hat, Canonical, Mirantis, SUSE, and many more. These companies are favoring Ceph big time and including it as an official storage backend for their cloud OpenStack distributions, thus making Ceph a red hot technology in cloud storage space.

The OpenStack project is one of the finest examples of open source software powering public and private clouds. It has proven itself as an end-to-end open source cloud solution. OpenStack is a collection of programs, such as cinder, glance, and swift, which provide storage capabilities to OpenStack. These OpenStack components required a reliable, scalable, and all in one storage backend like Ceph. For this reason, Openstack and Ceph communities have been working together for many years to develop a fully compatible Ceph storage backend for the OpenStack.

Cloud infrastructure based on Ceph provides much needed flexibility to service providers to build Storage-as-a-Service and Infrastructure-as-a-Service solutions, which they cannot achieve from other traditional enterprise storage solutions as they are not designed to fulfill cloud needs. By using Ceph, service providers can offer low-cost, reliable cloud storage to their customers.

Unified next generation storage architecture

The definition of unified storage has changed lately. A few years ago, the term "unified storage" referred to providing file and block storage from a single system. Now, because of recent technological advancements, such as cloud computing, big data, and Internet of Things, a new kind of storage has been evolving, that is, object storage. Thus, all the storage systems that do not support object storage are not really unified storage solutions. A true unified storage is like Ceph; it supports blocks, files, and object storage from a single system.

In Ceph, the term "unified storage" is more meaningful than what existing storage vendors claim to provide. Ceph has been designed from the ground up to be future ready, and it's constructed such that it can handle enormous amounts of data. When we call Ceph "future ready", we mean to focus on its object storage capabilities, which is a better fit for today's mix of unstructured data rather than blocks or files. Everything in Ceph relies on intelligent objects, whether it's block storage or file storage. Rather than managing blocks and files underneath, Ceph manages objects and supports block-and-file-based storage on top of it. Objects provide enormous scaling with increased performance by eliminating metadata operations. Ceph uses an algorithm to dynamically compute where the object should be stored and retrieved from.

The traditional storage architecture of a SAN and NAS system is very limited. Basically, they follow the tradition of controller high availability, that is, if one storage controller fails it serves data from the second controller. But, what if the second controller fails at the same time, or even worse, if the entire disk shelf fails? In most cases, you will end up losing your data. This kind of storage architecture, which cannot sustain multiple failures, is definitely what we do not want today. Another drawback of traditional storage systems is its data storage and access mechanism. It maintains a central lookup table to keep track of metadata, which means that every time a client sends a request for a read or write operation, the storage system first performs a lookup in the huge metadata table, and after receiving the real data location, it performs client operation. For a smaller storage system, you might not notice performance hits, but think of a large storage cluster—you would definitely be bound by performance limits with this approach. This would even restrict your scalability.

Ceph does not follow such traditional storage architecture; in fact, the architecture has been completely reinvented. Rather than storing and manipulating metadata, Ceph introduces a newer way: the CRUSH algorithm. CRUSH stands for Controlled Replication Under Scalable Hashing. Instead of performing lookup in the metadata table for every client request, the CRUSH algorithm computes on demand where the data should be written to or read from. By computing metadata, the need to manage a centralized table for metadata is no longer there. The modern computers are amazingly fast and can perform a CRUSH lookup very quickly; moreover, this computing load, which is generally not too much, can be distributed across cluster nodes, leveraging the power of distributed storage. In addition to this, CRUSH has a unique property, which is infrastructure awareness. It understands the relationship between various components of your infrastructure and stores your data in a unique failure zone, such as a disk, node, rack, row, and datacenter room, among others. CRUSH stores all the copies of your data such that it is available even if a few components fail in a failure zone. It is due to CRUSH that Ceph can handle multiple component failures and provide reliability and durability.

The CRUSH algorithm makes Ceph self-managing and self-healing. In an event of component failure in a failure zone, CRUSH senses which component has failed and determines the effect on the cluster. Without any administrative intervention, CRUSH self-manages and self-heals by performing a recovering operation for the data lost due to failure. CRUSH regenerates the data from the replica copies that the cluster maintains. If you have configured the Ceph CRUSH map in the correct order, it makes sure that at least one copy of your data is always accessible. Using CRUSH, we can design a highly reliable storage infrastructure with no single point of failure. It makes Ceph a highly scalable and reliable storage system that is future ready.

Ceph Cookbook

Ceph Cookbook

Overview of this book

Related Content you might be interested in

Current Title:

Ceph Cookbook

Ceph – the beginning of a new era

Software Defined Storage (SDS)

Cloud storage

Unified next generation storage architecture