Lead Image © efks, 123RF.com

Lead Image © efks, 123RF.com

Cloud-native storage for Kubernetes with Rook

Memory

Article from ADMIN 49/2019
By
Kubernetes is known to be inflexible when it comes to persistent storage, but Rook now offers cloud-native storage and seeks to achieve harmony.

A witticism making the rounds at IT conferences these days says that Internet providers operate servers to make their customers serverless. On the one hand, customers increasingly strive to move their setups into the cloud and avoid dealing with the operation of classic IT infrastructure. On the other hand, no software can run on air and goodwill.

Today's IT infrastructure providers, therefore, face challenges that hardly differ from those of earlier years – specifically, persistent data. Clearly, modern architectural approaches that comprise microservices are increasingly based on dynamic handling of data, fragmentation, cluster mechanisms, and, last but not least, inherent redundancy.

None of this changes the fact that in every setup, a point comes in which data needs to be stored safely somewhere, such as when you need to avoid the failure of a single server that would cause container and customer data to disappear into a black hole.

Cloud environments such as OpenStack [1] take a classic approach to the problem by providing components that act as intermediaries between the physical memory on one side and the virtual environments on the other. The virtual environments are typical virtual machines (VMs), so persistent storage can be connected without problem.

Containers, on the other hand, present a different battle plan: Kubernetes [2] (Figure 1), Docker [3], and the many alternatives somehow all have a solution for persistent storage of data, but they all do not really fit in with the concept of cloud native.

Figure 1: Kubernetes is the undisputed class leader in container orchestration. With persistent, redundant volumes, however, the solution is difficult.

Rook [4] promises no less than cloud-native storage for persistent data in the context of Kubernetes. The idea is simple: Combine Rook with Ceph [5] (the currently recommended object store). As soon as Rook is running in Kubernetes, a complete virtual Ceph cluster is available. The Ceph volumes can then be connected by Kubernetes, providing the cluster with data silos that are then managed in the usual way.

Rook integrates perfectly into Kubernetes so that it can be controlled centrally by an API. The customer then has a new type of volume available in their Kubernetes cluster, which can be used like any other type of volume in Kubernetes.

The storage created with Rook is not just glommed on, it is integrated into all Kubernetes processes, as is the case with classic volumes and containers. Rook thus promises admins a quick and simple solution for a vitally important problem – reason enough to take a closer look.

Cloud Native

Before taking a look at Rook, though, a short excursion into cloud marketing is essential. What cloud-native storage actually means is only intelligible to those who deal regularly with the subject, and they probably already have Rook on their radar. Remember that cloud-native applications are generally a new type of application designed to run in the cloud from the outset, and the cloud makes different demands on apps than do conventional setups. Cloud installations assume, for example, that the applications take care of their own redundancy, instead of relying on auxiliary constructs such as cluster managers.

The "cloud native" concept implies a microarchitecture. Whereas formerly, a huge, monolithic program was developed, today a number of small, self-sufficient, but very fast components is preferable, ultimately resulting in a harmonious overall response. If an application meets these requirements, it is generally considered to be cloud native.

Containers are an extremely popular tool for building such programs. Because they do without the huge overhead of VMs, they are particularly light-footed and resource-saving. Moreover, tools like Kubernetes can orchestrate containers very well – another factor that plays a major role in cloud-native applications.

However, cloud orchestrators like Kubernetes are very reluctant to deal with the topic of persistent storage. From the container environment's point of view, the facts are clear: If the application needs persistent storage, it should take care of this itself, just as MySQL and Galera do as a team, wherein the entire dataset always exists in multiple instances. If one set of data fails, a new one can be ramped up immediately; after a short while, the data is again synchronized with its cluster partners.

If you have enjoyed the experience of dealing with Galera in a full-blown production setup, you know it is not so easy. In fact it is very complicated – on the administrative as well as the development level. Therefore, developers of cloud-native apps often refuse to consider the subject at all and prefer to point their fingers at the container environment instead.

As the admin, though, you are left out in the cold if neither the container environment nor the application addresses the topic, and you are left to think about how to make persistent memory possible. In recent years, some hacks have been created that add replication solutions to Kubernetes and somehow enable redundant persistent storage. That's not the way to go, though.

The Rook Remedy

The Rook developers noticed that the software needed to enable persistent storage in cloud environments already exists in the form of Ceph. Red Hat had good reason to acquire Inktank and involve one of the Ceph founders, Sage Weil, years ago. It is no accident that Ceph has become the de facto standard when software-defined storage is used to offer scalable storage in clouds nor that some of the world's largest clouds use Ceph to provide persistent storage.

Although Ceph requires block storage as a kind of data silo, all the intelligence that takes care of scalability and internal redundancy is in software components. Ceph doesn't care if what it gets as persistent storage is a real hard drive or a virtual volume attached to a container. As long as Ceph can store its user data somewhere, the world is fine from its point of view.

The Missing Link

Although it might appear that Ceph has already solved the classic problems of redundant persistent memory for the developers of cloud-native applications, there has been no way until now to manage Ceph effectively in container environments.

From the user's point of view, the thought of manually creating containers in which a Ceph cluster runs is not very attractive. These VMs would not be integrated into the processes of the container environment and could not be controlled through a central API. Moreover, they would massively increase the complexity of the setup on the administration side.

From the provider's point of view, things are hardly any better. Although the provider could operate a Ceph cluster with the appropriate configuration in the background, which customers could then use, some usable process for container and storage integration would have to be considered. Although Docker has a Ceph volume driver, it's neither very well maintained nor very functional.

Rook is the link between containers and orchestration on the one hand and cloud-native applications on the other. How does this work in practice?

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

SINGLE ISSUES
 
SUBSCRIPTIONS
 
TABLET & SMARTPHONE APPS
Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

comments powered by Disqus