How to back up in the cloud

Rescue Approach

Identify Persistent Data

The be-all and end-all is, of course, identifying the data that has to become part of a backup in the first place. If you build your setup in an exemplary way and it is genuinely cloud-ready, you will soon be done. In the best case, the data only comprises the contents of the database, which acts as a hopper for your own application in the background. Although this is a kind of ideal world, it is a strategy worth considering.

If you are using AWS or OpenStack clouds, you could use your own images. Ideally, you will build an image with a continuous integration and continuous delivery (CI/CD) system such as Jenkins, that is independent of the virtual environment and from which the images can be quickly restored if something goes wrong. Alternatively, standard images provided by distributors can also be used, although it is advisable to have a central automation host from which new VMs can be quickly refuelled.

A setup of this kind will want to source all required services from the cloud to the extent possible. Having your own VMs for databases and the like is a thing of the past; users only need to pay attention to using the DBaaS backup function and regularly copy the backups to a safe place. The recommendation is to use the command-line interface tools for the respective cloud from an external system and use system timers (e.g., to trigger downloads). At best, the application comes in the form of a container or can be checked out from Git and quickly put into operation.

If you build a setup that is cloud-ready in this way, the backup requirements are dramatically lower than in conventional setups, because the volume of setup-specific data is far smaller.

Reality Is More Complex

The described scenario is, of course, the ideal for applications in the cloud and, practically, only realistic for people who tailor a virtual environment to fit a cloud. Anyone migrating a conventional setup to the cloud and not shaking off the traditional standards in the process will naturally also have to deal with conventional backup strategies.

Still, you can access a few tricks and tips for such scenarios. Even if your own application is not cloud-ready, it makes sense to separate strictly user data from generic data. If you move a database from a physical system to a VM, always make sure that the folder with the database data is on a separate volume; then, you only need to include the data from that volume in the backup instead of the whole VM. If something goes wrong, the volume content can be recovered quickly (e.g., from a snapshot), with no need to recover the data in a laboriously manual process.

However, admins should pay special attention to the backup solutions they use in such setups. If you have any kind of S3-compatible memory available, it is a good idea to use backup software with that functionality (Figure 4), because then local backups can be created directly from the backup application, which enables a fast and uncomplicated restore in case of problems.

Figure 4: Amazon S3 can be used for backups directly from the command line, which makes automating such solutions very easy.

At the same time, the same software can be used to create a backup (even an encrypted backup) at another location, such as Amazon's physical S3 storage service. This kind of backup gives you security even if the original data center were to burn to the ground. Amazon itself provides instructions for this type of setup [1].

Containers: Not a Special Case

Most of the examples thus far relate to classic full virtualization. However, this does not mean that the rules and principles described are not equally applicable to containers. In fact, they apply even more because cloud migrations that convert conventional applications into containers do not occur so frequently, in my own experience. Instead, the normal case is for the application to be redesigned in the context of the migration into the cloud and developed from scratch.

Accordingly, for container backups, it is always a good idea to store only the actual user data, such as the content of a database, into your own backups (Figure 5). Data that can be restored from standard directories in emergencies does not belong in the container backup; in fact, it complicates the recovery of the data rather than making it easier.

Figure 5: What is true for virtual machines in cloud environments is even more true for containers in Kubernetes: more automation, more orchestration, fewer backups.

Buy this article as PDF

Express-Checkout as PDF
Price $2.95
(incl. VAT)

Buy ADMIN Magazine

Get it on Google Play

US / Canada

Get it on Google Play

UK / Australia

Related content

comments powered by Disqus