How to back up in the cloud

Rescue Approach

Central Storage

When it comes to payload data, the situation is quite different. For this data, many companies rely on central storage solutions, such as Ceph, which although it is possible to build huge scalable storage facilities quickly and easily, also leaves you with a juggernaut when it comes to backups. Anyone coming from the old world of conventional setups often wonders how to bundle this kind of storage into a backup given that huge Ceph setups have several petabytes of capacity.

The answer is very simple: Install a second storage solution of the same size next door and mirror the master regularly, or don't protect payload data at all or only selectively for a surcharge. Logically, if you have a 5PB instance of Ceph, you actually need a 5PB cluster for a backup. In case of doubt, you can also use a trick based on erasure coding: With a 5PB gross capacity, only 1.6PB are effectively available if you have enabled double redundancy in Ceph.

In a backup, redundancy is theoretically not necessary at all, so you could reduce the value to 2PB and enable erasure coding, which is similar to RAID5. At the end of the day, a second storage solution is necessary although not 5PB in volume; nevertheless, it must be able to stash much of the data.

Provider Backups?

From the provider's point of view, the approach taken by large companies like Amazon or Azure is far more elegant. They see backups as the customer's responsibility and, as usual, the warning is that customers should not rely on anything in the cloud. If you book services on Amazon Web Services (AWS), you know that you are responsible for backing up your own data yourself. In case of doubt, the data could simply be gone, and Amazon will not assume any responsibility, even if they contributed to the data deletion because of an error.

Whether this approach can be sustained depends on several factors. Amazon, of course, is big enough to cope with the loss of large customers without any problems – even if it is naturally not desirable. However, the structure of your own clientele plays an important role.

If you want to leave the topic of backup as a provider to your cloud customers, you need to communicate this loudly and clearly at every opportunity that offers itself. A paragraph buried deep in the terms and conditions of your own platform is unlikely to help because for many customers, the backups by the provider are so normal that they would not even dream of this not happening. The step of assuming responsibility for your own backups is part of migrating to the cloud.

The Customer's Point of View

The situation is different from the customer's point of view, even if the fundamental questions are not so different from those the platform admins face. One thing is sure: The degree of automation should be as high as possible, not only in the cloud underlay, but also where customers set up their own virtual worlds. The requirements here are even tougher, because classical automation is joined by orchestration.

Because orchestration sits below automation, it takes advantage of the fact that virtual hardware in a cloud can be controlled through the API of the cloud environment. It is not difficult to imagine that manually clicking together a full virtual environment in AWS or OpenStack can take some time.

At the end of the day, admins start at square one, first creating a virtual network that connects over a router to the provider's network to gain Internet access. Next, the individual VMs of the platform follow; they also need to be set up and configured after a successful boot process.

If you imagine a virtual environment with hundreds of VMs, this process would take forever. Orchestration solves the problem, wherein the admin defines the desired state of the virtual environment with the help of a template language. When the template is executed, the cloud creates all the resources described in the template in the correct order.

From the user's point of view, they have even more automation options: A specially prepared image can be used to distribute the operating system (OS) image, which then needs a load balancer. If you do not rely on Load Balancing as a Service (LBaaS) from the outset, specific services such as load balancers, databases, and other servers can be provided by the cloud platform itself (Figure 3). Basically, the higher the percentage of a virtual environment that can be reproduced from templates, the easier it is to do the backup.

Figure 3: LBaaS turns the load balancer into a genuine resource managed by the cloud, making the restore process far easier and faster.

Keep in mind that nobody wants to back up, but everyone wants to restore. Admins who use the native features of a platform will typically find the restore process far more convenient than with DIY solutions. For restores, it is usually fine to start a new instance via the respective service and point to an existing backup – all done.

Admins who consistently use their cloud platform's LBaaS are pleased to see new instances automatically find their way into the load balancer after starting up. The less work that restoring the individual instances takes, the more fun admins have.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus
Subscribe to our ADMIN Newsletters
Subscribe to our Linux Newsletters
Find Linux and Open Source Jobs

Support Our Work

ADMIN content is made possible with support from readers like you. Please consider contributing when you've found an article to be beneficial.

Learn More”>


		<div class=