Live migration of virtual machines

Go Live

An Object Store

The Ceph solution belongs in the object storage category; files stored in Ceph are split into binary objects and then distributed to a plurality of storage locations. Object stores use this approach to work around a problem that arises when you try to design distributed systems on the basis of block storage: Blocks are rigid and closely tied to the filesystem, which makes block-based data media accessible to the user in the first place.

However, it makes little sense to use a bare metal hard disk directly from the factory. Although you could store data on it, reading the data in a meaningful way later on would be impossible. The structure that allows meaningful access to block-based storage is provided by a filesystem, but because a filesystem cannot be divided easily into several strips and then distributed to different servers, blocks are basically useless for scale-out solutions.

Object stores work around this problem by handling internal management themselves and constructing binary objects that are divisible and can be reassembled. In a way, they add an intermediate layer between the filesystem and the user.

Arbitrary Front Ends

One major advantage of object stores is that they are ultimately capable of providing an arbitrary front end to the client. After all, if the object store handles its internal organization on the basis of binary objects, it can also deliver them to the clients in any format. In the case of Ceph, for which the object store is called RADOS, the developers make multiple use of this capability. In addition to the POSIX-compatible Ceph FS and the REST API, Radosgw, RADOS publishes its data in the form of a RADOS Block Device (RBD).

The good news is that current Libvirt versions already support RADOS, and current versions of Qemu come with a native connection to RADOS via RBD (by way of the Librados C library).

The effect is quite impressive: The object store can be connected directly to Libvirt or Qemu on any number of virtualization nodes, and simultaneous access to the same binary objects from multiple locations is possible, which solves the problem of live migration.

Libvirt makes sure that nothing goes wrong; that is, it prevents hostile, or even competing write access during the actual VM migration.

En Route to the Virtualization Cluster

The prerequisite for a meaningful Ceph setup is three servers. Why three? To ensure that Ceph's quorum-handling functions work in a meaningful way. Ceph treats each drive as a separate storage device, so the number of disks (OSDs, object storage daemons; Figure 1) per host ultimately plays a subordinate role. It is advisable, initially at least, to use three computers with an identical hardware configuration for the setup.

Figure 1: Ceph manages its OSDs centrally, creating an intermediate layer between users and the block device to achieve flexibility.

Admins should not look for a low-budget hardware option. In addition to the object store role, the computers simultaneously act as virtualization hosts, so that each is thus part of the object store and a virtualizer.

In combination with Pacemaker and a configuration similar to that described for DRBD, a self-governing VM cluster is created from the individual cluster nodes. On the cluster, any virtual machine can run on any host, and live migration between computers also works fine – thanks to the Qemu and RBD pairing. For more information on Ceph, check out some of my previous articles [4]-[7].

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus