Mastering Ceph

By : Nick Fisk

Mastering Ceph

By: Nick Fisk

Overview of this book

Mastering Ceph covers all that you need to know to use Ceph effectively. Starting with design goals and planning steps that should be undertaken to ensure successful deployments, you will be guided through to setting up and deploying the Ceph cluster, with the help of orchestration tools. Key areas of Ceph including Bluestore, Erasure coding and cache tiering will be covered with help of examples. Development of applications which use Librados and Distributed computations with shared object classes are also covered. A section on tuning will take you through the process of optimisizing both Ceph and its supporting infrastructure. Finally, you will learn to troubleshoot issues and handle various scenarios where Ceph is likely not to recover on its own. By the end of the book, you will be able to successfully deploy and operate a resilient high performance Ceph cluster.

Preface

What this book covers

What you need for this book

Free Chapter

Planning for Ceph

Infrastructure design

Network design

How to plan a successful Ceph implementation

Summary

Deploying Ceph

Preparing your environment with Vagrant and VirtualBox

Orchestration

Ansible

A very simple playbook

Adding the Ceph Ansible modules

Change and configuration management

Summary

BlueStore

Summary

Erasure Coding for Better Storage Efficiency

What is erasurecoding?

How does erasure coding work in Ceph?

Algorithms and profiles

Where can I use erasure coding?

Creating an erasure-coded pool

Summary

Developing with Librados

What is librados?

How to use librados?

Example librados application

Summary

Distributed Computation with Ceph RADOS Classes

Example applications and the benefits of using RADOS classes

Writing a simple RADOS class in Lua

Writing a RADOS class that simulates distributed computing

RADOS class caveats

Summary

Monitoring Ceph

Why it is important to monitor Ceph

What should be monitored

PG states -the good, the bad, and the ugly

Monitoring Ceph with collectd

Summary

Tiering with Ceph

Tiering versus caching

What is a bloom filter

Tiering modes

Uses cases

Creating tiers in Ceph

Tuning tiering

Promotion throttling

Summary

Tuning Ceph

Latency

Benchmarking

Recommended tunings

Summary

Troubleshooting

Repairing inconsistent objects

Full OSDs

Ceph logging

Slow performance

Extremely slow performance or no IO

Investigating PGs in a down state

Large monitor databases

Summary

Disaster Recovery

What is a disaster?

Avoiding data loss

What can cause an outage or data loss?

RBD mirroring

RBD recovery

Lost objects and inactive PGs

Recovering from a complete monitor failure

Using the Cephs object store tool

Investigating asserts

Summary

Customer Reviews

5 star

4 star

3 star

2 star

1 star

How to plan a successful Ceph implementation

In order to be certain your Ceph implementation will be succesfull, there are a number of rules you should follow:

Use 10G networking as a minimum
Research and test the correctly sized hardware you wish to use
Don't use the nobarrier mount option
Don't configure pools with size=2 or minsize=1
Don't use consumer SSDs
Don't use RAID controllers in writeback without battery protection
Don't use configuration options you don't understand
Implement some form of change management
Do carry out power loss testing
Do have an agreed backup and recovery plan

Understanding your requirements and how it relates to Ceph

As we have discussed, Ceph is not always the right choice for every storage requirement. Hopefully, this chapter has given you the knowledge to be able to help you identify your requirements and match them to Ceph's capabilities. Hopefully though, Ceph is a good fit for your use case and you can proceed with the project.

Care should be taken to understand the requirements of the project including the following:

Who are the key stakeholders of the project, they will likely be the same people that will be able to detail how Ceph will be used.
Collect details of what systems Ceph will need to interact with. If it becomes apparent, for example, that unsupported operating systems are expected to be used with Ceph, this needs to be flagged at an early stage.

Defining goals so that you can gauge if the project is a success

Every project should have a series of goals that can help identify if the project has been a success. Example goals may be:

Cost no more than X
Provide X IOPS or MBps of performance
Survive certain failure scenarios
Reduce ownership costs of storage by X

These goals will need to be revisited throughout the life of the project to make sure that it is on track.

Choosing your hardware

The infrastructure section of this chapter will have given you a good idea on the hardware requirements of Ceph and the theory behind selecting the correct hardware for the project. The second biggest cause of outages with a Ceph cluster is caused by poor hardware choices, making the right choices early on in the design stage crucial.

If possible, check with your hardware vendor to see if they have any reference designs, these are often certified by Red Hat and will take a lot of the hard work off your shoulders in trying to determine if your hardware choices are valid. You can also ask Red Hat or your chosen Ceph support vendor to validate your hardware; they will have had previous experience and will be able to guide you around any questions you may have.

Finally, if you are planning on deploying and running your Ceph cluster entirely in-house without any third-party involvement or support, consider reaching out to the Ceph community. The Ceph-users mailing list is participated in by individuals from vastly different backgrounds stretching right round the globe. There is a high chance that someone somewhere will be doing something similar to you and will be able to advise you on hardware choice.

Training yourself and your team to use Ceph

As with all technologies, it's essential that Ceph administrators receive some sort of training. Once the Ceph cluster goes live and becomes a business dependency, unexperienced administrators are a risk to stability. Depending on your reliance on third-party support, various levels of training may be required and may also determine if you look for a training course or self teach.

Running PoC to determine if Ceph has met the requirements

A proof of concept (PoC) cluster should be deployed to test the design and identify any issues early on before proceeding with full-scale hardware procurement. This should be treated as a decision point in the project; don't be afraid to revisit goals or start design from fresh if any serious issues are uncovered. If you have existing hardware of similar specifications, then it should be fine to use it in the proof of concept, but the aim should be to try and test hardware that is as similar as possible to what you intend to build the production cluster with, so as to be able to fully test the design.

As well as testing for stability, the PoC cluster should also be used to forecast if it looks likely that the goals you have set for the project will be met.

The proof of concept stage is also a good time to firm up your knowledge on Ceph, practice day-to-day operations and test out features. This will be of benefit further down the line. You should also take this opportunity to be as abusive as possible to your PoC cluster. Randomly pull out disks, power off nodes, and disconnect network cables. If designed correctly, Ceph should be able to withstand all of these events. Carrying out this testing now will give you the confidence to operate Ceph at larger scale where these events will happen and also help you understand how to troubleshoot them more easily if needed.

Following best practices to deploy your cluster

When deploying your cluster, attention should be paid to understanding the process rather than following guided examples. This will give you better knowledge of the various components that make up Ceph and should you encounter any errors during deployment or operation, you will be much better placed to solve them. The next chapter of this book goes into more detail on deployment of Ceph, including the use of orchestration tools.

Initially, it is recommended that the default options for both the operating system and Ceph are used. It is better to start from a known state should any issues arise during deployment and initial testing.

RADOS pools replication level should be left at the default of 3 and the minimum replication level of 2. This corresponds to the pool variables of size and min_size, respectively. Unless there is both a good understanding and reason for the impact of lowering these values, it would be unwise to change them. The replication size determines how many copies of data will be stored in the cluster, and the effects of lowering it should be obvious in terms of protection against data loss. Less understood is the effect of min_size in relation to data loss and is a common reason for it.

The min_size variable controls how many copies the cluster must write to acknowledge the write back to a client. A min_size of 2 means that the cluster must be able to write two copies of data; this can mean in a severely degraded scenario that write operations are blocked if the PG has only one remaining copy and will continue to do so until the PG is recovered to have two copies of the object. This is the reason that there may be a desire to decrease min_size to 1 so that in this event, cluster operations can still continue and if availability is more important than consistency, then this can be a valid decision. However, with a min_size of 1, data may be written to only one OSD and there is no guarantee that the number of desired copies will be met anytime soon. During that period, any component failure will likely result in loss of data written in the degraded state. If summary downtime is bad, data loss is typically worse and these two settings will probably have one of the biggest impacts on the probability of data loss.

Defininga change management process

The biggest cause of data loss and outages with a Ceph cluster is normally human error, whether it be by accidently running the wrong command or changing configuration options, which may have unintended consequences. These incidents will likely become more common as the number of people in the team administering Ceph grows. A good way of reducing the risk of human error causing service interruptions or data loss is to implement some form of change control. This is covered in the next chapter in more detail.

Creating a backup and recovery plan

Ceph is highly redundant and when properly designed should have no single point of failure and be resilient to many types of hardware failures. However, one in a million situations do occur and as we have also discussed, human error can be very unpredictable. In both cases, there is a chance that the Ceph cluster may enter a state where it is unavailable or data loss occurs. In many cases, it may be possible to recover some or all of the data and return the cluster to full operation. However, in all cases, a full backup and recovery plan should be discussed before putting any live data onto a Ceph cluster. Many businesses have gone out of business or lost faith from customers when it's revealed that not only has there been an extended period of downtime, but critical data has also been lost. It may be that as a result of discussion it is agreed that a backup and recovery plan is not required; this is fine. As long as risks and possible outcomes have been discussed and agreed, that is the important part.

Mastering Ceph

By : Nick Fisk

Mastering Ceph

By: Nick Fisk

Overview of this book

Related Content you might be interested in

Current Title:

Mastering Ceph

Ceph Cookbook

Learning Ceph

Mastering Proxmox