Photo by CHUTTERSNAP on Unsplash

Photo by CHUTTERSNAP on Unsplash

Persistent storage management for Kubernetes

Data Logistics

Article from ADMIN 65/2021
The container storage interface (CSI) allows CSI-compliant plugins to connect their systems to Kubernetes and other orchestrated container environments for persistent data storage.

With ongoing developments in container technology in the area of data management and persistent storage, business-critical applications have been running in cloud-native environments with containers orchestrated by Kubernetes. A container is just a software box without its own operating system. Originally, a container would not only contain the app or microservice, but also all the necessary drivers, dependencies, and data the respective application needed to run. If the container was deleted, all that was gone, which meant that a data store was needed that would stay alive regardless of the existence of a container or its pod.

PVs, PVCs, and Storage Classes

Two variants are available: persistent volumes (PVs) and persistent volume calls (PVCs). PVs are static and defined by the admin in advance, belong to a Kubernetes cluster, and perish with it, but survive the deletion of individual containers. The admin assigns all their characteristic properties: size, storage class, paths, IP addresses, users, identifiers, and the plugin to be used.

Storage classes have certain characteristics such as quality of service (QoS), replication, compression, backup, and more, which the container storage interface (CSI), but not Kubernetes itself, supports. PVs now have many different storage classes in Kubernetes clusters, from local storage to storage attached with Network File System (NFS), iSCSI, or fibre channel to block storage from Azure, AWS, or Google. All PVs of a storage class are accessible through the same API or are coupled to the pod.

PVCs, on the other hand, are requested by the application manager as a storage class according to the application requirements. Depending on the request, a PVC is created from the template of the corresponding storage class and is attached to the pod from that point on (i.e., it also goes down with the pod). A stateful set means that PVCs can be copied across multiple pods. Their properties are defined in the YAML declaration of a pod. Ultimately, the PVC is allocated as much storage as users guess they need for a very specific application.

If a pod is running on a host, you can also define a path to a host directory where the container data will end up, but only as long as the container is actually running on the host. Data blocks or objects related to containerized apps can also be stored locally, but only as long as the container actually remains on this hardware.

Container Storage Interface

Today, the CSI [1] referred to earlier has become very important. It is an interface developed by the Cloud Native Computing Foundation (CNCF) so that storage system providers can connect their systems to Kubernetes and other orchestrated container environments without their own drivers. CSI has gained market acceptance and is supported by many storage vendors.

Before CSI, plugins for volumes had to be written, linked, compiled, and shipped with the Kubernetes code – an expensive and inflexible process – because every time new storage options made their way into the systems, the Kubernetes code itself had to be changed. Thanks to CSI, this is no longer the case; the Kubernetes code base is now unaffected by changes to the supported storage systems.

A CSI-compliant plugin comprises three components: a controller service, a node service, and an identity service. The controller service controls the storage and includes functions such as Create, Delete, List, Publish, Snapshots, and so on. The container node accesses the storage through the node service. Important functions include volume staging and publishing, volume statistics, and properties. The identity service provides information about the attached storage.

In total, a standards-compliant CSI comprises around 20 functions. If these comply with the CSI specification, administrators have access to a functioning plugin for connecting storage to any container system. The CSI controller service runs on the controller node there, but any number of nodes can be connected by the node service. If a node that used a specific volume dies, the volume is simply published to another node where it is then available. The major container orchestration systems (Kubernetes, OpenShift, PKS, Mesos, Cloud Foundry) now support CSI – Kubernetes as of version 1.13.

Kubernetes complements CSI with its own functions, including forwarding storage class parameters to the CSI drivers. Another option is encrypting identification data (secrets), automatically decrypted by the driver, and the automatic and dynamic start of the node service on newly created nodes.

In a Kubernetes environment, multiple CSI drivers can work in parallel. This capability is important when applications in the cluster have different storage requirements. They can then choose the appropriate storage resource, because they are all equally connected to the cluster by CSI. Kubernetes uses redundancy mechanisms to ensure that at least one controller service is always running. Kubernetes thus currently offers the most comprehensive support of all orchestrators for CSI.

However, CSI requires additional middleware software components outside of the Kubernetes core when working in Kubernetes environments. These components ensure the fit between the particular CSI and Kubernetes version in use. The middleware registers, binds, detaches, starts, and stops the CSI drivers. In this way, the external middleware, which was programmed by the Kubernetes team, provides the existing nodes with the required storage access.

Most CSI drivers are written in Go, with the support of the GoCSI framework, which provides about a quarter of the necessary code, including predefined remote procedure code (GoRPC). A special test tool is also available. Dell EMC, for example, uses this framework for some of its storage products.

The numerous open source projects centered around container storage fill functional gaps and deficiencies in Kubernetes, mostly in terms of managing persistent storage and data. Kubernetes needs these add-ons to provide a secure environment for business applications. Currently, about 30 storage projects are on the CNCF's project map, many of which have already been commercialized. I will be looking at Ceph, Rook, Gluster, and Swift in more detail here, in addition to short descriptions of other projects with a focus on container storage.

Ceph Management Tool

The open source storage platform Ceph [2] was developed in its basic form as early as 2004. Today, it is often used with the Rook container storage software. Currently, Red Hat, SUSE, and SanDisk are the main contributors to its development. Ceph is implemented on a distributed computing cluster and is suitable for object, block, and file storage. The system features automatic replication, self-healing, and self-management, while avoiding a single point of failure. Commodity hardware is sufficient for a Ceph environment. The object store is based on Reliable Autonomic Distributed Object Store (RADOS).

The ceph-mon cluster monitor monitors the function and configuration of the cluster nodes. It stores information about the placement of data and the general state of the cluster. The ceph-osd object storage daemon manages directly attached disk storage (BlueStore), whose entries are recorded in a journal, and the metadata server daemon ceph-mds keeps the metadata of all data objects stored in a Ceph system. Managers (ceph-mgr) monitor and maintain the cluster and interface (e.g., with external load balancers or other tools). HTTP gateways (ceph-gwy) provide an Amazon Simple Storage Service (S3)/Swift interface to the object storage.

Because of its tie-ins to the further history of Ceph, I also need to mention GlusterFS, developed in 2005. The plan was an open source platform for scale-out cloud storage for private and public clouds. In 2011, Red Hat bought Gluster, which has since been acquired by IBM. Red Hat initially marketed GlusterFS as Red Hat Storage Server, then bought Ceph, combined the two technologies, and now markets the solution as Red Hat Gluster Storage.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

comments powered by Disqus