Book Image

Mastering Ceph

By : Nick Fisk
Book Image

Mastering Ceph

By: Nick Fisk

Overview of this book

Mastering Ceph covers all that you need to know to use Ceph effectively. Starting with design goals and planning steps that should be undertaken to ensure successful deployments, you will be guided through to setting up and deploying the Ceph cluster, with the help of orchestration tools. Key areas of Ceph including Bluestore, Erasure coding and cache tiering will be covered with help of examples. Development of applications which use Librados and Distributed computations with shared object classes are also covered. A section on tuning will take you through the process of optimisizing both Ceph and its supporting infrastructure. Finally, you will learn to troubleshoot issues and handle various scenarios where Ceph is likely not to recover on its own. By the end of the book, you will be able to successfully deploy and operate a resilient high performance Ceph cluster.
Table of Contents (12 chapters)

How to plan a successful Ceph implementation

In order to be certain your Ceph implementation will be succesfull, there are a number of rules you should follow:

  • Use 10G networking as a minimum
  • Research and test the correctly sized hardware you wish to use
  • Don't use the nobarrier mount option
  • Don't configure pools with size=2 or minsize=1
  • Don't use consumer SSDs
  • Don't use RAID controllers in writeback without battery protection
  • Don't use configuration options you don't understand
  • Implement some form of change management
  • Do carry out power loss testing
  • Do have an agreed backup and recovery plan

Understanding your requirements and how it relates to Ceph

As we have discussed, Ceph is not always the right choice for every storage requirement. Hopefully, this chapter has given you the knowledge to be able to help you identify your requirements and match them to Ceph's capabilities. Hopefully though, Ceph is a good fit for your use case and you can proceed with the project.

Care should be taken to understand the requirements of the project including the following:

  • Who are the key stakeholders of the project, they will likely be the same people that will be able to detail how Ceph will be used.
  • Collect details of what systems Ceph will need to interact with. If it becomes apparent, for example, that unsupported operating systems are expected to be used with Ceph, this needs to be flagged at an early stage.

Defining goals so that you can gauge if the project is a success

Every project should have a series of goals that can help identify if the project has been a success. Example goals may be:

  • Cost no more than X
  • Provide X IOPS or MBps of performance
  • Survive certain failure scenarios
  • Reduce ownership costs of storage by X

These goals will need to be revisited throughout the life of the project to make sure that it is on track.

Choosing your hardware

The infrastructure section of this chapter will have given you a good idea on the hardware requirements of Ceph and the theory behind selecting the correct hardware for the project. The second biggest cause of outages with a Ceph cluster is caused by poor hardware choices, making the right choices early on in the design stage crucial.

If possible, check with your hardware vendor to see if they have any reference designs, these are often certified by Red Hat and will take a lot of the hard work off your shoulders in trying to determine if your hardware choices are valid. You can also ask Red Hat or your chosen Ceph support vendor to validate your hardware; they will have had previous experience and will be able to guide you around any questions you may have.

Finally, if you are planning on deploying and running your Ceph cluster entirely in-house without any third-party involvement or support, consider reaching out to the Ceph community. The Ceph-users mailing list is participated in by individuals from vastly different backgrounds stretching right round the globe. There is a high chance that someone somewhere will be doing something similar to you and will be able to advise you on hardware choice.

Training yourself and your team to use Ceph

As with all technologies, it's essential that Ceph administrators receive some sort of training. Once the Ceph cluster goes live and becomes a business dependency, unexperienced administrators are a risk to stability. Depending on your reliance on third-party support, various levels of training may be required and may also determine if you look for a training course or self teach.

Running PoC to determine if Ceph has met the requirements

A proof of concept (PoC) cluster should be deployed to test the design and identify any issues early on before proceeding with full-scale hardware procurement. This should be treated as a decision point in the project; don't be afraid to revisit goals or start design from fresh if any serious issues are uncovered. If you have existing hardware of similar specifications, then it should be fine to use it in the proof of concept, but the aim should be to try and test hardware that is as similar as possible to what you intend to build the production cluster with, so as to be able to fully test the design.

As well as testing for stability, the PoC cluster should also be used to forecast if it looks likely that the goals you have set for the project will be met.

The proof of concept stage is also a good time to firm up your knowledge on Ceph, practice day-to-day operations and test out features. This will be of benefit further down the line. You should also take this opportunity to be as abusive as possible to your PoC cluster. Randomly pull out disks, power off nodes, and disconnect network cables. If designed correctly, Ceph should be able to withstand all of these events. Carrying out this testing now will give you the confidence to operate Ceph at larger scale where these events will happen and also help you understand how to troubleshoot them more easily if needed.

Following best practices to deploy your cluster

When deploying your cluster, attention should be paid to understanding the process rather than following guided examples. This will give you better knowledge of the various components that make up Ceph and should you encounter any errors during deployment or operation, you will be much better placed to solve them. The next chapter of this book goes into more detail on deployment of Ceph, including the use of orchestration tools.

Initially, it is recommended that the default options for both the operating system and Ceph are used. It is better to start from a known state should any issues arise during deployment and initial testing.

RADOS pools replication level should be left at the default of 3 and the minimum replication level of 2. This corresponds to the pool variables of size and min_size, respectively. Unless there is both a good understanding and reason for the impact of lowering these values, it would be unwise to change them. The replication size determines how many copies of data will be stored in the cluster, and the effects of lowering it should be obvious in terms of protection against data loss. Less understood is the effect of min_size in relation to data loss and is a common reason for it.

The min_size variable controls how many copies the cluster must write to acknowledge the write back to a client. A min_size of 2 means that the cluster must be able to write two copies of data; this can mean in a severely degraded scenario that write operations are blocked if the PG has only one remaining copy and will continue to do so until the PG is recovered to have two copies of the object. This is the reason that there may be a desire to decrease min_size to 1 so that in this event, cluster operations can still continue and if availability is more important than consistency, then this can be a valid decision. However, with a min_size of 1, data may be written to only one OSD and there is no guarantee that the number of desired copies will be met anytime soon. During that period, any component failure will likely result in loss of data written in the degraded state. If summary downtime is bad, data loss is typically worse and these two settings will probably have one of the biggest impacts on the probability of data loss.

Defininga change management process

The biggest cause of data loss and outages with a Ceph cluster is normally human error, whether it be by accidently running the wrong command or changing configuration options, which may have unintended consequences. These incidents will likely become more common as the number of people in the team administering Ceph grows. A good way of reducing the risk of human error causing service interruptions or data loss is to implement some form of change control. This is covered in the next chapter in more detail.

Creating a backup and recovery plan

Ceph is highly redundant and when properly designed should have no single point of failure and be resilient to many types of hardware failures. However, one in a million situations do occur and as we have also discussed, human error can be very unpredictable. In both cases, there is a chance that the Ceph cluster may enter a state where it is unavailable or data loss occurs. In many cases, it may be possible to recover some or all of the data and return the cluster to full operation. However, in all cases, a full backup and recovery plan should be discussed before putting any live data onto a Ceph cluster. Many businesses have gone out of business or lost faith from customers when it's revealed that not only has there been an extended period of downtime, but critical data has also been lost. It may be that as a result of discussion it is agreed that a backup and recovery plan is not required; this is fine. As long as risks and possible outcomes have been discussed and agreed, that is the important part.