Better Backups

For years, an open source version of Bacula has been a popular solution for managing “backup, recovery, and verification of computer data” on a network of diverse computers, operating systems, and storage media. Using the client-server model, Bacula scales from single computers to enterprise installations of hundreds of entities.

The open source version of Bacula was first published in 2002 and quickly found support in the community. Recently, less and less work has been put into the free Bacula, and new commits into the public Git project now occur only once every few months, with the developers seemingly focusing on the commercial Bacula Enterprise Edition, which is not publicly developed.

In 2010, long-standing Bacula developer Marco van Wieringen thus started to maintain enhancements and code cleanups that either were not accepted or were only proposed for integration into the commercial version in a separate Git repository. From this seed grew the decision by some former members of the Bacula community to continue development of an independent fork named Bareos.

The first stable release was Bareos 12.4 in April 2013 (the version number stands for the year and the quarter of the feature freeze). The current beta is version 13.2. On September 25, 2013, at the Open Source Backup Conference, formerly known as the Bacula Conference, the Bareos project was introduced to an interested audience.

Before you start working with Bacula or Bareos or start planning a test installation, you should take a look how the tools function (Figure 1).

Figure 1: Structure of a simple Bacula/Bareos

The basic structure always consists of a control unit, the Backup Director, one or more Storage daemons, and the File daemons on the clients to be backed up.

The File daemons are responsible for backing up the data from the client or restoring the data on the client again. This daemon runs permanently on the clients and carries out the Director’s instructions.

The Director is the controller: It contains all the logic and accounts for most of the settings. Its configuration file describes the following:

  • The database configuration
  • All client systems and how they are addressed
  • Which files should be backed up (a FileSet)
  • The plugin configuration
  • The before and after jobs (i.e., programs that are started before or after a backup job, e.g., to start and stop services)
  • The storage and media pool with its properties and retention times
  • The backup schedules
  • Addresses for messages
  • Jobs and JobDefs (job defaults)

Defining storage, a FileSet, and a client is not enough. These components are brought together by jobs, which define what is where and when to back it up.

The retention period for the backup data is controlled by File Retention, Job Retention, and Volume Retention periods. It makes sense to use only Volume Retention to control the retention times, because if several retention options overlap, you might experience surprising effects.

Volume Retention is defined per pool. By defining several pools, you can also work with different retention periods, such as for different systems or different backup types (e.g., full, differential, or incremental). The specified periods are the minimum retention periods.

Improved Usability

One focus in Bareos’s development is keeping the obstacles for newcomers as low as possible. Because newcomers are usually overwhelmed by configuration options, the Bareos project offers package repositories for popular Linux distributions and Windows. For Windows, additional packages for the OPSI software management solution are also offered. All versions are built automatically by the project’s own instance of the Open Build Service (OBS). In comparison, Bacula.org offers only the source code, and Windows binaries are only available for cash.

On Linux, you just need to add the appropriate repository to install a Bareos server and then install the Bareos packages. Bareos supports three database back ends: MySQL, PostgreSQL, and SQLite. SQLite should only be used for test installations.

Most optimization effort in the future will flow into the PostgreSQL connection. To ensure that the desired back end really is installed, you need to select the packages bareos and bareos-database-postgresql (or bareos-database-mysql , if you prefer).

The database must be installed separately; Bareos only contains dependencies on database clients. This makes it possible for the database to run on a computer other than the Bareos server itself.

Unlike Bacula, Bareos defines the database to be used in the configuration file. In Bacula, you must build a version specifically for the respective database.

When you first install Bareos, it populates the configuration files in the /etc/bareos directory with meaningful values. After the installation, the admin needs to initialize the database and start the services (Listing 1).

Listing 1: Starting Services

su postgres -c /usr/lib/bareos/scripts/create_bareos_database
su postgres -c /usr/lib/bareos/scripts/make_bareos_tables
su postgres -c /usr/lib/bareos/scripts/grant_bareos_privileges

service bareos-dir start
service bareos-sd start
service bareos-fd start

In the automatic configuration, the backup is to disk by default (in /var/lib/bareos/storage ). Bareos backs up to disk in exactly the same way as it backs up to a tape library. That is, files are created below /var/lib/bareos/storage , each corresponding to a tape. The advantage of this method is that uniform rules apply and retention hold times are handled in the same way for tapes and disks. The maximum file size and the maximum number are defined in the Director daemon in the pool resource (i.e., the /etc/bareos/bareos-dir.conf file).

To create a virtual tape, you need to start the bconsole program, which welcomes you with an asterisk prompt. After running label and assigning a name (in this example, file1 ), press 2 for the defined File pool (Listing 2).

Listing 2: Labeling the Virtual Tape

*label
Automatically selected Storage: File
Enter new Volume name: file1
Defined Pools:
 1: Default
 2: File
 3: Scratch
Select the Pool (1-3): 2
Connecting to Storage daemon File at bareos:9103 ...
Sending label command for Volume "file1" Slot 0 ...
3000 OK label. VolBytes=186 Volume="file1" Device="FileStorage" (/var/lib/bareos/storage)
Catalog record for Volume "file1", Slot 0 successfully created.
Requesting to mount FileStorage ...
3001 OK mount requested. Device="FileStorage" (/var/lib/bareos/storage)
*

With status director , you can view the next scheduled jobs (Listing 3).

Listing 3: Status Display

*status director
Scheduled Jobs:
Level  Type Pri Scheduled  Name  Volume
=====================================================
Incremental     Backup 10 18-Jul-13 23:05 BackupClient1 file1
Full            Backup 11 18-Jul-13 23:10 BackupCatalog file1
...

The backups are set in the configuration file to 23:05 hours (BackupClient1 : filesystem) and 23:10 hours (BackupCatalog : backup of the database itself) To perform a test backup, you can launch it with the run command, specifying only which client you want to back up. The results are displayed by calling the status director command (Listing 4).

Listing 4: Status Director

*status director
...
Terminated Jobs:
 JobId Level Files Bytes Status Finished Name
=====================================================
 1 Full 135 6.679 M OK 18-Jul-13 16:00 BackupClient1
 2 Incr  0  0       OK 18-Jul-13 16:01 BackupClient1

...

The status scheduler command shows when jobs are scheduled, and status scheduler days = 365 does this for an entire year in advance.

Improvements

Except for the installation, a number of other improvements make life easier for the Bareos administrator: Anyone who has ever worked with Bacula configuration files will be glad that, with Bareos, almost everything is predefined with sensible default values. In contrast to Bacula, Bareos also supports presets for string values, which means no more worrying about entering the Pid Directory and Working Directory directives in the File daemon configuration on the client. Bareos sets meaningful values for the appropriate platform when it creates the packages.

On Windows systems, you can now easily back up not just one, but all connected drives (Windows Drive Discovery). Bacula only supports this in the commercial version. The Volume Shadow Copy Service (VSS) call now discovers Windows drives automatically.

The use of tape libraries has been simplified. Tapes can now be moved from one slot to another within bconsole . Also, any existing Import/Export slots can be addressed conveniently using the import or export commands.The tray monitor (a small icon in the system tray of the taskbar) runs on Windows and on Linux systems. The icon flashes to indicate that a backup is currently running on the system.

If a Backup job fails, you can easily to start a job with exactly the same parameters:

*rerun jobid=id

The backup administrator must ensure that all relevant data are retained for a specific period of time. For example, tax-related data might require a retention period of up to 10 years; you must plan carefully.

If you want to separate the data according to various properties, you can use pools in Bareos to do so. Sizes and retention times can be defined for the pools.

Complex Environments

Sometimes, calculating how big a backup will be is difficult. A first approach is to exclude certain directories and data types in the file lists that describe the backup. Alternatively, you can exclude files above a certain size. However, exclusion does not guarantee that a client does not accumulate large amounts of data that needs to be backed up.

Bareos has a client quota that lets you determine the total amount of data to back up for a client. Additionally, you can use soft quotas and grace periods to learn at an early stage when a quota is nearly exhausted.

Keep in mind that large amounts of data might be transported across the network, especially during a full backup. Therefore, Bareos’s ability to limit the maximum network bandwidth used per client is useful. The directive Maximum Bandwidth Per Job needs to be added to the corresponding client entry in /etc/bareos/bareos-dir.conf :

Client {
 Name = client2-fd
 Address = client2
 Password = "secret"
 Maximum Bandwidth Per Job = 512 k/s
}

A key innovation is direct support for NDMP (Network Data Management Protocol), the native backup protocol of large NAS devices such as NetApp. Bareos version 12.4 supports full backup and restore, although restoring individual files is still in the testing phase.

A new plugin for backing up Microsoft SQL Server databases has been written that supports full, incremental, and differential backups; it also is in the testing phase.

The next project in the pipeline is backing up virtual machines via the VMware vStorage API. The first steps have already been taken.

Copy Jobs

Backup tapes are still the media of choice for backing up data, but backups on disk also have advantages. Thus, the approaches are often combined: Disk-to-disk-to-tape (D2D2T) backups are common. With this method, the data is first saved to disk, then transferred to a tape by a Migration or Copy job.

Before Bareos v13.2, Migration and Copy jobs were only supported within a Storage daemon (Figure 2).

Figure 2: Previously: Copying was only possible within Storage daemons.

This restriction has been lifted in Bareos v13.2 – data can now be transported between Storage daemons (Figure 3).

Figure 3: Now: Copying is possible between different Storage daemons across the network.

Thus, you can back up data from different firewall compartments, for example.

A corresponding Copy job can also copy data periodically to another Storage daemon. The data properties can be modified here to store the data without compression on the first Storage daemon but with compression on the second, making it possible to design scenarios such as backup-to-disk-to-cloud.

Passive Clients

Firewalls commonly cause problems when setting up the backup environment. In a normal connection in a Bareos/Bacula environment, the Backup Director would establish a connection to the client and tell it what to save and where. It also connects to the backup Storage daemon and tells it to accept and store the data from the client. Finally, the client establishes the actual data connection to the Storage daemon and sends its data to it.

If the client is behind a firewall, then packet filtering and network address translation (NAT) on the firewall can make a connection from the client to the Storage daemon difficult or impossible. The problematic connection is thus the actual data connection between the client and Storage daemon (Figure 4).

Figure 4: Previously: The data connection was initiated from the client to the Storage daemon.

As of Bareos 13.2, this behavior is now configurable. Using the Passive client option, you set up all connections to start with the server components. The client then only needs to accept connections. The process of opening connections between the Director and client and between the Director and the Storage daemon remains the same, but the actual data connection is now initiated not by the client, but by the Storage daemon. After the connection has been established, the data is, of course, sent from the client to the Storage daemon (Figure 5).

Figure 5: Passive client: The data connection is initiated by the Storage daemon to the client.

Besides its firewall friendliness, this approach offers another advantage: Because the passive client does not establish any data connections, it does not need working name resolution. In practical terms, name resolution often has been a problem with the conventional method.

Security

In terms of security, Bareos continues using the familiar safety features of Bacula, such as:

  • checksum computation for each backed up file and verification during the restore, and
  • the ability to encrypt connections between the daemons with TLS.

Additionally, Bareos adds some more interesting security features; for example, you can now choose the encryption method for software encryption. Previously, only AES128 was used. Now, the following methods are additionally available: AES128, AES192, AES256, CAMELIA128, CAMELIA192, CAMELIA256, AES128HNACSHA1, AES256HNACSHA1, and Blowfish.

In addition to the encryption options in the software, you can now directly use LTO tape drive hardware encryption. Since LTO4 encryption is part of the LTO standard, all drives offer this option. Tape drive encryption relies on hardware support and thus has virtually no effect on the speed of your backup.

Whether you use LTO hardware encryption depends on your requirements. It is a particularly efficient option for those who want to outsource their tapes and, in doing so, prevent unauthorized persons from reading them. The passive client option I mentioned earlier also offers safety benefits: Because the connection to the Storage daemon is no longer necessary, the firewalls can prevent all connections into the backup network.

Previously, you could send arbitrary commands to the client through the Director. These commands were backup (run a backup) restore (perform a restore), verify (run a scan job to sync between system data and backed up data), estimate (estimate the amount of backup data), and runscript (run a script on the client system).

Now, you can use the Allowed JobCommand directive to filter these commands on the client. Commands that are not allowed are then not accepted by the client and not executed.

Running scripts on the system to be backed up poses a special security threat. If you cannot completely prohibit this scenario with Allowed JobCommand , you at least have the option of setting the directory in which scripts and commands must be located through Allowed ScriptDir . Commands that do not reside in this directory are not executed.

Integration

It must be possible to distribute a backup client as efficiently as possible to the client systems and to run it there with as little maintenance is possible, especially if many different platforms are connected. Thus, Bareos also works with old client versions and supports Bacula File daemon versions from 2.0 (from 2007).

Univention Corporate environments also have a Bareos version for the Univention App Center. Through the UCS interface, you can specify whether or not to back up a computer. The Bareos server configuration is automatically generated from this and then prepares the client configuration.

Bareos also directly provides packages for the open source Windows software management solution OPSI. These packages can be installed on the OPSI server, assigned the appropriate settings, and then distributed to all connected Windows systems. A script then uses the OPSI JSON-RPC interface to create the appropriate Bareos Director configuration.

To complete the basic configuration of the software, Bareos offers a native installer for Windows that sets passwords and even opens the Windows Firewall. The File daemon and tray monitor are configured so that they work immediately.

A very good system for disaster recovery of Linux machines is available from the Relax-and-Recover (REAR) project. This project’s approach is twofold. Installed on the system to be backed up, the command

sudo /usr/sbin/rear -v mkrescue

creates a rescue system ISO file of about 60MB, including the active kernel, required driver modules, information on the hard disk setup, and network configuration. In the second step, the complete system is backed up using

sudo /usr/sbin/rear -v mkbackup

(e.g., to a shared NFS directory).

You could do without this second step if you use Bareos for your backups. Instead, a Bareos recovery module is built into the Rescue System so that, after booting the recovery system, you see an option for completely deleting the system and replacing it with the backup.

Quality Assurance

The entire development of Bareos occurs openly on GitHub. Communication is handled via mailing lists. Feature requests and bugs can be posted on the bug tracking system. You can find more information at the Bareos Community page.

Three different systems are used for automated quality assurance:

  • Build tests based on Travis
  • Regression tests based on CDASH
  • Tests of the various platforms based on Jenkins and virtual machines

Every commit in GitHub automatically triggers a build process on the Travis CI Bareos repository. The respository is where the source code is compiled, the daemons are started, and a backup and restore is performed, and it basically checks after each commit whether Bareos is still functional. Further tests are carried out on a CDASH regression-based test system. Currently, about 130 different tests check specific Bareos functions.

The development workflow in Bareos envisages that a ticket should not be closed until a regression test has been created for a new property. This is then noted on the ticket.

A new release is only created when packages built for Bareos on an Open Build Server have also successfully passed a test based on Jenkins. In this test, the packages for the various platforms are tested on the corresponding virtual machines. On each platform, the package installation, data backup, and restore are checked automatically.

The Windows packages are built using OBS and cross-compilation. The result is the Windows Installer, and the OPSI packages.

Future

The path taken thus far has earned the Bareos project much encouragement. The decision to build the infrastructure for largely automated packet generation and testing at the start of the project has proven successful. More platforms can now be added with little effort, with the certainty that problems are detected very quickly by continuous testing.

Another positive aspect is that Bareos is developed in a fully open environment. Although Bareos GmbH & Co KG offer commercial subscriptions and support, all additions and new features are developed in an open GitHub project.

The roadmap envisions keeping to the present course for future developments to provide easy access and improved usability for administrators, integration with other projects and distributions, and functional enhancements. Plans to improve the default configuration should make it even easier to get started, and whitepapers will better illuminate certain issues.

A subproject is working hard on developing a configuration API to ensure that certain configuration changes can be carried out at run time without problems (e.g., adding clients). Front ends like Webacula will then be able to expand their functionality easily.

Authors

Jörg Steffens has been working with Linux since 1995 as a consultant at SUSE Linux AG and since 2004 as the director of an open source consulting company dass IT GmbH in Cologne, Germany. In 2012, he joined forces with other long-term Bacula users to initiate the Bareos project and founded Bareos GmbH & Co KG.

Philipp Storz has focused on Linux since 1998 and on Bacula since 2007. He has worked professionally with Linux since 2001, first as a consultant with SUSE Linux AG and since 2004 as a co-founder of dass IT GmbH in Cologne. His book on Bacula was published by Open Source Press in 2012. Since the founding of the Bareos project and the company of the same name, he has pushed forward with the technical development of Bareos together with Marco van Wieringen.

Related content

  • Using rsync for Backups

    Although commercial Linux backup tools are available, many people prefer open source to better understand and control the backup process. One open source tool that can do both full and incremental backups is rsync.

  • GlusterFS

    Sure, you could pay for cloud services, but with GlusterFS, you can take the idle space in your own data center and create a large data warehouse quickly and easily.

  • HPC Cloud Storage

    Many HPC sites with petabytes of data need some sort of backup solution. Among the many candidates, cloud storage is a serious contender. In this article, we look at one solution with some serious advantages: S3QL.

  • Red Hat Releases GlusterFS-based Storage Product
  • Red Hat Storage Server 2.1

    If you believe Red Hat’s marketing hype, the company has no less than revolutionized data storage with version 2.1 of its Storage Server. The facts tell a rather different story.

comments powered by Disqus

Special Edition

  • Happy SysAdmin Day!

    Download the free special edition “10 More Terrific Tools for the Busy Admin” courtesy of ADMIN  magazine.

Newsletter

Subscribe to ADMIN Update for IT news and technical tips.

ADMIN Magazine on Twitter

Follow us on twitter