OpenStack Trove for users, sys admins, and database admins

Semiautomatic

Storage Alternatives

If you use Ceph and are dissatisfied with its performance for databases, you should look for alternatives. Many solutions such as Quobyte [2] and StorPool [3] advertise significantly lower latencies, without having to give up on Ethernet and without having to replace the existing storage. The previously described solution, with its own availability zone for the nodes with other storage, can be integrated into an existing environment. For the provider, however, this means a good deal of effort, because the they need to evaluate, test, and develop up front.

The Ceph developers are aware of their high latency problems and are already working on a solution: BlueStore is set to do everything better by trying to solve the problem at the physical storage level. Thus far, Ceph has relied on storing the binary objects on an XFS filesystem.

Originally Inktank, the company behind Ceph, promoted Btrfs, but because it is still not up to full speed, they had to make do with XFS. XFS is hardly suitable for the typical Ceph use case, because OSDs virtually never need large parts of the POSIX specification; POSIX compliance comes at the expense of major performance hits.

BlueStore [4] is accordingly a new storage back end or on-disk format for data that resides directly on the individual block storage devices and should perform significantly better than XFS. The next long-term support version of Ceph is supposed to support BlueStore for production use, but users will have to wait.

If Worst Comes to Worst

If you do not have an alternative network storage solution, you will have to come back to local storage, but you definitely will want to mitigate the worst side effects. For example, a single-node setup could be used, with individual nodes running VMs with local storage that is replicated in the background, so that it can be switched over to another host during operations.

DRBD 9, with its n -node replication, would be a good candidate. However, the DRBD driver for OpenStack Cinder does not currently offer the ability to switch a Cinder volume to the primary node on which the VM actually runs. The result would be that, although access to the storage device works, the data ultimately would be returned via the network.

Other variants exist in the form of locally connected Fibre Channel devices. Although they can handle the replication themselves, a tailor-made setup for the individual use case is absolutely essential.

Replication and Multimaster Mode

In principle, replication in MySQL is a useful way of distributing load across multiple nodes and ensuring redundancy. However, if database-level replication takes place between three MySQL instances that write to Ceph volumes in the background, the previously described latency effect multiplies – not a very useful construct.

On the other hand, it would be possible to organize replication at the MySQL level if it takes place between three VMs that have local storage, because then the maximum latency would be almost identical to Ethernet latency between the physical hosts – and definitely much lower than when accessing distributed network storage.

Multinode setups in MySQL are relevant in the context of load distribution. The typical master-slave setup, as used in databases on real metal, can also be used in a DBaaS environment, and, fittingly, Trove supports this type of setup out of the box (see the "Handling Your Own Images" box).

Handling Your Own Images

For Trove to work, it needs a special image for its virtual systems [5]. To understand why this is necessary, I'll embark on a small trip into the Trove architecture. Basically, a VM started by Trove with a database only differs from a normal, off-the-peg VM in one aspect. For Trove to work within the VM, it needs a helper: the Trove Guest Agent. The agent receives instructions from the outside and then carries out the necessary steps on the VM – in this example, creating the MySQL configuration for operation in a cluster.

Unfortunately, a standard Ubuntu image simply does not contain the Trove Guest Agent. The admin or user therefore has the task of building such an image, which must contain the Guest Agent as a program with a suitable configuration. This requires adapting the content of the /etc/trove/trove-guest-agent.conf file to the requirements of the respective setup (e.g., to the Keystone configuration).

The good news is that OpenStack's own standard tool for building images, Nova, has already been adapted for Trove. On request, it creates an image that can be used directly in the respective provider's cloud. The Trove documentation contains an example for Ubuntu; however, you need to run the tool on a host where Trove already has the configuration files needed for the cloud, because only then will it be adopted correctly into the image.

The Trove project also offers ready-made images on an Ubuntu base for various databases, differing from the official Ubuntu images only in that the corresponding Trove components are already integrated. However, this solution is not very convenient for users, because the configuration files contained in the finished images are, of course, generic and not yet adapted to a particular cloud.

When starting a VM with Trove, customers therefore have the task of transferring configuration files suitable for the given application case to the image as Nova metadata. The DBaaS vendor needs to provide complete and detailed documentation for this step.

If you want to save your customers the trouble of doing this work or if you are a customer that depends on handmade images for special reasons, you are better off using the described workflow and building your own images for Trove.

« Previous 1 2 3 4 Next »