Optimally combine Kubernetes and Ceph with Rook


Rolling Out Rook

As a final step, run the kubeadm join command (generated previously by kubeadm init), which runs the command on all nodes of the setup except the Control Plane. Kubernetes is now ready to run Rook.

Thanks to various preparations by Rook developers, Rook is just as easy to get up and running as Kubernetes. Before doing so, however, it makes sense to review the basic architecture of a Ceph cluster. Rolling out the necessary containers with Rook is not rocket science, but knowing what is actually happening is beneficial.

To review the basics of Ceph and how it relates to Rook, see the box "How Ceph Works."

How Ceph Works

To help understand how Rook is deployed in the overall context of Ceph (Figure 1), I will review only the most important details here.

Figure 1: Ceph offers several interfaces, including a block device, a REST interface, and a POSIX filesystem. © Inktank

Ceph is based on the principle of object storage. It treats any content that a user uploads to Ceph as a binary object. Advantageously, objects can be separated, distributed, and reassembled at will, as long as everything happens in the correct order. In this way, object storage bypasses the limitations of traditional block storage devices, which are always inextricably linked to the medium to which they belong.

The object store at the heart of Ceph is RADOS, which implicitly provides redundancy and multiplies every piece of uploaded information as often as the administrative policy defines. Moreover, RADOS has self-healing capabilities. For example, if a single hard disk or SSD fails, RADOS notices and creates new copies of the lost objects in the cluster after a configurable tolerance time.

At least two services run under the RADOS hood: Object storage daemons (OSDs) and monitoring servers (MONs). OSDs are the storage silos in Ceph. These block devices store the user data. More than half of the MON guard dogs must always be active and working in the cluster; in this way, the cluster can achieve a quorum.

If a Ceph cluster splits into several parts (e.g., a switch fails), you would otherwise be in danger of uncoordinated write operations on the individual partitions of the cluster. This split-brain scenario is a horror story for any storage admin, because it always means you have to discard the entire cluster database. After doing so, you can only import the latest backup.

However, Ceph clusters are typically no longer completely backed up because of their size, so recovery from a backup would also be problematic if worst came to worst.


Out of the box, Ceph offers three front ends for client access. The Ceph block device supports access to a virtual hard disk (image) in Ceph as if it were a block device and can be implemented with either the Linux kernel module RBD or by Librbd in userspace.

Another option envisages access through a REST interface, similar to Amazon S3. Precisely this protocol is supported by Ceph's REST interface, the Ceph Object Gateway – in addition to the Swift OpenStack protocol. What RBD and Ceph Object Gateway have in common is that they get along with the OSDs and MONs in RADOS.

The situation is different with the CephFS POSIX filesystem, for which a third RADOS service is required: the metadata server (MDS), which reads the POSIX metadata stored in the extended user attributes of the objects and delivers the data directly to its clients as a cache.

To roll out Rook (Figure 2) in Kubernetes, you need OSDs and MONs. Rook makes it easy, because the required resource definitions can be taken from the Rook source code in a standard configuration. Custom Resource Definitions (CRDs) are used in Kubernetes to convert the local hard drives of a system into OSDs without further action by the administrator.

Figure 2: Rook sits between Ceph and Kubernetes and takes care of Kubernetes administration almost completely automatically. © Rook

In other words, by applying the ready-made Rook definitions from the Rook Git repository to your Kubernetes instance, you automatically create a Rook cluster with a working Ceph that utilizes the unused disks on the target systems.

Experienced admins might now be thinking of using the Kubernetes Helm package manager for a fast rollout of the containers and solutions. However, it would fail because Rook only packages the operator for Helm, but not the actual cluster.

Therefore, your best approach is to check out Rook's Git directory locally (Listing 3). In the newly created ceph/ subfolder are two files worthy of note: operator.yaml and cluster.yaml. (See also "The Container Storage Interface" box.) With kubectl, first install operator, which enables the operation of the Ceph cluster in Rook. The kubectl get pods command lets you check the rollout to make sure it worked: The pods should be set to Running . Finally, the actual Rook cluster is rolled out with the cluster.yaml file.

Listing 3

Applying Rook Definitions

# git clone https://github.com/rook/rook.git
# cd rook/cluster/examples/kubernetes/ceph
# kubectl create -f operator.yaml
# kubectl get pods -n rook-ceph-system
# kubectl create -f cluster.yaml

The Container Storage Interface

Kubernetes is still changing fast – not least because many standards around container solutions are just emerging or are now considered necessary. For some time, the Container Storage Interface (CSI) standard for storage plugins has been in place. CSI is now implemented throughout Kubernetes, but many users simply don't use it yet.

The good news is that CSI works with Rook. The Rook examples also include an operator-with-csi.yaml file, which you can use to roll out Rook with a CSI connection instead of the previously mentioned operator.yaml. In the ceph/csi/ folder of the examples you will find CSI-compatible variants for the Ceph block device and CephFS, instead of the non-CSI variants used here. If you are rolling out a new Kubernetes cluster with Rook, you will want to take a closer look at CSI.

A second look should now show that all Kubelet instances are running rook-ceph-osd pods for the local hard drives and that rook-ceph-mon pods are running, but not on any of the Kubelet instances. Out of the box, Rook limits the number of MON pods to three because that is considered sufficient.

Integrating Rook and Kubernetes

Some observers claim that cloud computing is actually just a huge layer cake. Given that Rook introduces an additional layer between the containers and Ceph, maybe they are not that wrong, because to be able to use the Rook and Ceph installation in Kubernetes, you have to integrate it into Kubernetes first, independent of the storage type provided by Ceph. If you want to use CephFS for storage, it requires different steps than if you are using the Ceph Object Gateway.

The classic way of using storage, however, has always been block devices, on which the example in the next step is based. In the current working directory, after performing the above steps, you will find a storageclass.yaml file. In this file, replace size: 1 with size: 3 (Figure 3).

Figure 3: When creating a storage class in production for the Ceph Object Gateway, you should change size to 3.

In the next step, you use kubectl to create a pool in Ceph. In Ceph-speak, pools are something like name tags for binary objects used for the internal organization of the cluster. Basically, Ceph relies on binary objects, but these objects are assigned to placement groups. Binary objects belonging to the same placement group reside on the same OSDs.

Each placement group belongs to a pool, and at the pool level the size parameter determines how often each individual placement group should be replicated. In fact, you determine the replication level with the size entry (1 would not be enough here). The mystery remains as to why the Rook developers do not simply adopt 3 as the default.

As soon as you have edited the file, issue the create command; then, display the new rook-block storage class:

kubectl create -f storageclass.yaml
kubectl get sc -a

From now on, you have the option of organizing a Ceph block device from within the working Ceph cluster, which relies on a persistent volume claim (PVC) (Listing 4).

Listing 4

PVC for Kubernetes

01 child: PersistentVolumeClaim
02 apiVersion: v1
03 metadata:
04   name: lm-example-volume-claim
05 spec:
06   storageClassName: rook-block
07   accessModes:
08     - ReadWriteOnce
09   resources:
10     requests:
11       storage: 10Gi

In a pod definition, you then only reference the storage claim (lm-example-volume-claim) to make the volume available locally.

Using CephFS

In the same directory is the filesystem.yaml file, which you will need if you want to enable CephFS in addition to the Ceph block device; the setup is pretty much the same for both. As the first step, you need to edit filesystem.yaml and correct the value for the size parameter again, which – as you know – should be set to 3 for both dataPools and metadataPool (Figure 4).

Figure 4: When creating the storage class for CephFS, you again need to change the size parameter from 1 to 3 for production.

To create the custom resource definition for the CephFS service, type:

kubectl create -f filesystem.yaml

To demonstrate that the pods are now running with the Ceph MDS component, look at the output from the command:

# kubectl -n rook-ceph get pod -l app=rook-ceph-mds

Like the block device, CephFS can be mapped to its own storage class, which then acts as a resource for Kubernetes instances in the usual way.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>


		<div class=