Book Image

Ceph Cookbook

Book Image

Ceph Cookbook

Overview of this book

Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. This cutting-edge technology has been transforming the storage industry, and is evolving rapidly as a leader in software-defined storage space, extending full support to cloud platforms such as Openstack and Cloudstack, including virtualization platforms. It is the most popular storage backend for Openstack, public, and private clouds, so is the first choice for a storage solution. Ceph is backed by RedHat and is developed by a thriving open source community of individual developers as well as several companies across the globe. This book takes you from a basic knowledge of Ceph to an expert understanding of the most advanced features, walking you through building up a production-grade Ceph storage cluster and helping you develop all the skills you need to plan, deploy, and effectively manage your Ceph cluster. Beginning with the basics, you’ll create a Ceph cluster, followed by block, object, and file storage provisioning. Next, you’ll get a step-by-step tutorial on integrating it with OpenStack and building a Dropbox-like object storage solution. We’ll also take a look at federated architecture and CephFS, and you’ll dive into Calamari and VSM for monitoring the Ceph environment. You’ll develop expert knowledge on troubleshooting and benchmarking your Ceph storage cluster. Finally, you’ll get to grips with the best practices to operate Ceph in a production environment.
Table of Contents (18 chapters)
Ceph Cookbook
Credits
Foreword
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

Ceph erasure coding


The default data protection mechanism in Ceph is replication. It's proven and is one of the most popular methods of data protection. However, the downside of replication is that it requires double the amount of storage space to provide redundancy. For instance, if you were planning to build a storage solution with 1 PB of usable capacity with a replication factor of three, you would require 3 PB of raw storage capacity for 1 PB of usable capacity, that is, 200% or more. In this way, with the replication mechanism, cost per gigabyte of storage system increases significantly. For a small cluster, you might ignore the replication overhead, but for large environments, it becomes significant.

Since the Firefly release of Ceph, it has introduced another method for data protection known as erasure coding. This method of data protection is absolutely different from the replication method. It guarantees data protection by dividing each object into smaller chunks known as data chunks...