Book Image

Ceph Cookbook

Book Image

Ceph Cookbook

Overview of this book

Ceph is a unified, distributed storage system designed for excellent performance, reliability, and scalability. This cutting-edge technology has been transforming the storage industry, and is evolving rapidly as a leader in software-defined storage space, extending full support to cloud platforms such as Openstack and Cloudstack, including virtualization platforms. It is the most popular storage backend for Openstack, public, and private clouds, so is the first choice for a storage solution. Ceph is backed by RedHat and is developed by a thriving open source community of individual developers as well as several companies across the globe. This book takes you from a basic knowledge of Ceph to an expert understanding of the most advanced features, walking you through building up a production-grade Ceph storage cluster and helping you develop all the skills you need to plan, deploy, and effectively manage your Ceph cluster. Beginning with the basics, you’ll create a Ceph cluster, followed by block, object, and file storage provisioning. Next, you’ll get a step-by-step tutorial on integrating it with OpenStack and building a Dropbox-like object storage solution. We’ll also take a look at federated architecture and CephFS, and you’ll dive into Calamari and VSM for monitoring the Ceph environment. You’ll develop expert knowledge on troubleshooting and benchmarking your Ceph storage cluster. Finally, you’ll get to grips with the best practices to operate Ceph in a production environment.
Table of Contents (18 chapters)
Ceph Cookbook
Credits
Foreword
About the Author
About the Reviewers
www.PacktPub.com
Preface
Index

CephFS a drop-in replacement for HDFS


Hadoop is a programming framework that supports the processing and storage of large data sets in a distributed computing environment. The Hadoop core includes the analytics Map-Reduce engine and the distributed file system known as HDFS (Hadoop Distributed File System), which has several weaknesses that are listed as follows:

  • It had a single point of failure until the recent versions of HDFS

  • It isn't POSIX compliant

  • It stores at least 3 copies of data

  • It has a centralized name server resulting in scalability challenges

The Apache Hadoop project and other software vendors are working independently to fix these gaps in HDFS.

The Ceph community has done some development in this space, and it has a file system plugin for Hadoop that possibly overcomes the limitations of HDFS and can be used as a drop-in replacement for it. There are three requirements for using CephFS with HDFS; they are as follows:

  • Running the Ceph cluster

  • Running the Hadoop cluster

  • Installing the...