Book Image

Learning Ceph - Second Edition

By : Karan Singh, Vaibhav Bhembre, Anthony D'Atri
Book Image

Learning Ceph - Second Edition

By: Karan Singh, Vaibhav Bhembre, Anthony D'Atri

Overview of this book

Learning Ceph, Second Edition will give you all the skills you need to plan, deploy, and effectively manage your Ceph cluster. You will begin with the first module, where you will be introduced to Ceph use cases, its architecture, and core projects. In the next module, you will learn to set up a test cluster, using Ceph clusters and hardware selection. After you have learned to use Ceph clusters, the next module will teach you how to monitor cluster health, improve performance, and troubleshoot any issues that arise. In the last module, you will learn to integrate Ceph with other tools such as OpenStack, Glance, Manila, Swift, and Cinder. By the end of the book you will have learned to use Ceph effectively for your data storage requirements.
Table of Contents (18 chapters)
Title Page
Credits
About the Authors
About the Reviewer
www.PacktPub.com
Customer Feedback
Preface

Ceph compared to other storage solutions


The enterprise storage market is experiencing a fundamental realignment. Traditional proprietary storage systems are incapable of meeting future data storage needs, especially within a reasonable budget. Appliance-based storage is declining even as data usage grows by leaps and bounds.

The high TCO of proprietary systems does not end with hardware procurement: nickle-and-dime feature licenses, yearly support, and management add up to a breathtakingly expensive bottom line. One would previously purchase a pallet-load of hardware, pay for a few years of support, then find that the initial deployment has been EOL'd and thus can't be expanded or even maintained. This perpetuates a cycle of successive rounds of en-masse hardware acquisition. Concomitant support contracts to receive bug fixes and security updates often come at spiraling cost. After a few years (or even sooner) your once-snazzy solution becomes unsupported scrap metal, and the cycle repeats. Pay, rinse, lather, repeat. When the time comes to add a second deployment, the same product line may not even be available, forcing you to implement, document, and support a growing number of incompatible, one-off solutions. I daresay your organization's money and your time can be better spent elsewhere, like giving you a well-deserved raise.

With Ceph new software releases are always available, no licenses expire, and you're welcome to read the code yourself and even contribute. You can also expand your solution along many axes, compatibly and without disruption. Unlike one-size-fits-none proprietary solutions, you can pick exactly the scale, speed, and components that make sense today while effortlessly growing tomorrow, with the highest levels of control and customization.

Open source storage technologies however have demonstrated performance, reliability, scalability, and lower TCO (Total Cost of Ownership) without fear of product line or model phase-outs or vendor lock-in. Many corporations as well as government, universities, research, healthcare, and HPC (High Performance Computing) organizations are already successfully exploiting open source storage solutions.

Ceph is garnering tremendous interest and gaining popularity, increasingly winning over other open source as well as proprietary storage solutions. In the remainder of this chapter we'll compare Ceph to other open source storage solutions.

GPFS

General Parallel File System (GPFS) is a distributed filesystem developed and owned by IBM. This is a proprietary and closed source storage system, which limits its appeal and adaptability. Licensing and support cost added to that of storage hardware add up to an expensive solution. Moreover, it has a very limited set of storage interfaces: it provides neither block storage (like RBD) nor RESTful (like RGW) access to the storage system, limiting the constellation of use-cases that can be served by a single backend.

In 2015 GPFS was rebranded as IBM Spectrum Scale.

iRODS

iRODS stands for Integrated Rule-Oriented Data System, an open source data-management system released with a 3-clause BSD license. iRODS is not highly available and can be bottlenecked. Its iCAT metadata server is a single point of failure (SPoF) without true high availability (HA) or scalability. Moreover, it implements a very limited set of storage interfaces, providing neither block storage nor RESTful access modalities. iRODS is more effective at storing a relatively small number of large files than both a large number of mixed small and large files. iRODS implements a traditional metadata architecture, maintaining an index of the physical location of each filename.

HDFS

HDFS is a distributed scalable filesystem written in Java for the Hadoop processing framework. HDFS is not a fully POSIX-compliant filesystem and does not offer a block interface. The reliability of HDFS is of concern as it lacks high availability. The single NameNode in HDFS is a SPoF and performance bottleneck. HDFS is again suitable, primarily storing a small number of large files rather than the mix of small and large files at scale that modern deployments demand.

Lustre

Lustre is a parallel-distributed filesystem driven by the open source community and is available under GNU General Public License (GPL). Lustre relies on a single server for storing and managing metadata. Thus, I/O requests from the client are totally dependent on a single server's computing power, which can be a bottleneck for enterprise-level consumption. Like iRODS and HDFS, Lustre is better suited to a small number of large files than to a more typical mix of numbers files of various sizes. Like iRODS, Lustre manages an index file that maps filenames to physical addresses, which makes its traditional architecture prone to performance bottlenecks. Lustre lacks a mechanism for failure detection and correction: when a node fails clients must connect to another node.

Gluster

GlusterFS was originally developed by Gluster Inc., which was acquired by Red Hat in 2011. GlusterFS is a scale-out network-attached filesystem in which administrators must determine the placement strategy to use to store data replicas on geographically spread racks. Gluster does not provide block access, filesystem, or remote replication as intrinsic functions; rather, it provides these features as add-ons.

Ceph

Ceph stands out from the storage solution crowd by virtue of its feature set. It has been designed to overcome the limitations of existing storage systems, and effectively replaces old and expensive proprietary solutions. Ceph is economical by being open source and software-defined and by running on most any commodity hardware. Clients enjoy the flexibility of Ceph's variety of client access modalities with a single backend.

Every Ceph component is reliable and supports high availability and scaling. A properly configured Ceph cluster is free from single points of failure and accepts an arbitrary mix of file types and sizes without performance penalties.

Ceph by virtue of being distributed does not follow the traditional centralized metadata method of placing and accessing data. Rather, it introduces a new paradigm in which clients independently calculate the locations of their data then access storage nodes directly. This is a significant performance win for clients as they need not queue up to get data locations and payloads from a central metadata server. Moreover, data placement inside a Ceph cluster is transparent and automatic; neither the client nor the administrators need manually or consciously spread data across failure domains.

Ceph is self-healing and self-managing. In the event of disaster, when other storage systems cannot survive multiple failures, Ceph remains rock solid. Ceph detects and corrects failure at every level, managing component loss automatically and healing without impacting data availability or durability. Other storage solutions can only provide reliability at drive or at node granularity.

Ceph also scales easily from as little as one server to thousands, and unlike many proprietary solutions, your initial investment at modest scale will not be discarded when you need to expand. A major advantage of Ceph over proprietary solutions is that you will have performed your last ever forklift upgrade. Ceph's redundant and distributed design allow individual components to be replaced or updated piecemeal in a rolling fashion. Neither components nor entire hosts need to be from the same manufacturer.

Examples of upgrades that the authors have performed on entire petabyte-scale production clusters, without clients skipping a beat, are as follows:

  • Migrate from from one Linux distribution to another
  • Upgrade within a given Linux distribution, for example, RHEL 7.1 to RHEL 7.3
  • Replace all payload data drives
  • Update firmware
  • Migrate between journal strategies and devices
  • Hardware repairs, including entire chasses
  • Capacity expansion by swapping small drives for new
  • Capacity expansion by adding additional servers

Unlike many RAID and other traditional storage solutions, Ceph is highly adaptable and does not require storage drives or hosts to be identical in type or size. A cluster that begins with 4TB drives can readily expand either by adding 6TB or 8TB drives either as replacements for smaller drives, or in incrementally added servers. A single Ceph cluster can also contain a mix of storage drive types, sizes, and speeds, either for differing workloads or to implement tiering to leverage both cost-effective slower drives for bulk storage and faster drives for reads or caching.

While there are certain administrative conveniences to a uniform set of servers and drives, it is also quite feasible to mix and match server models, generations, and even brands within a cluster.