Knowledge Bank

From how to best plan and execute a server migration through to utilising MySQL replication, this section is full of white papers and best practice guidelines, produced by the experts here at ForLinux to help you get the most from your Linux server.

Life without Backups

10/08/2011

If you run any sort of business from your website, loss of data can be potentially catastrophic. Data loss can range from the merely inconvenient (e.g. losing an image file), to the loss of critical data such as customer details, sales transactions, etc. And due to the small margins that they operate on, small to medium businesses (SMBs) are often most at risk as a result of data loss.

Introduction

If you run any sort of business from your website, loss of data can be potentially catastrophic. Data loss can range from the merely inconvenient (e.g. losing an image file), to the loss of critical data such as customer details, sales transactions, etc. And due to the small margins that they operate on, small to medium businesses (SMBs) are often most at risk as a result of data loss.

Studies have shown that around half of all SMBs suffer some form of data loss at some point. This is a particularly sobering thought when you consider that data loss is also cited as major contributor in the failure of at least a third of SMBs. Factor in the statistics that suggest 30% of businesses have no backup plan whatsoever and less than 30% make daily backups (as low as 23% has been reported by some surveys), so you can see that many businesses are putting themselves at serious risk.

Almost half of all recorded cases of data loss are due to hardware failure, with human error (primarily accidental deletion) the second most common cause. Software corruption and viruses are also a risk - and while they are quoted as a cause of data loss in less than 20% of cases - they still pose a serious threat to your data.

To avoid putting your business at risk, you should ensure you implement a comprehensive backup solution. Then if the worst does happen you will be able to recover your data with the minimum of disruption. 

In this paper we will look at two popular open source backup solutions and suggest situations in which each would be best used.

Bacula

Bacula is a network backup, recovery and verification system. According to information published by SourceForge it is the most downloaded open source backup program, making it a popular choice for backups over a network.

While you could run bacula locally, it's usually run on one or more external servers. One server will run Director, which is the main bacula server daemon. It directs all of bacula's operations and contains the core configuration files and Catalog database. Depending upon your setup, the same server might also contain the storage pools, where the data being backed up is stored. In some situations the storage may be on one or more additional servers.

The client server (the machine being backed up) runs a File daemon – sometimes referred to as FD or as the client daemon. At the start of the backup run the Director connects to the File daemon and requests the files to be backed up. The File daemon then locates the files and sends them to the Storage daemon, which is responsible for placing them on the storage media.

The File daemon can also run scripts on behalf of the Director. A common option is to configure the jobs file on the Director to run a script to dump copies of the databases on the server to a specific directory, before backing up that directory.

The backups are not usually stored indefinitely. The Director will hold Job Retention values for each client, after which the stored data will be pruned to make space for new backups. There is also a master File Retention value, that can override client values, if exceeded. The length of the retention period is dependant on several factors, but the main limiting factor is usually available disk space. If you have multiple clients all backing up large volumes of data and retaining that data for a long period of time, it is likely that you will quickly run out of disk space.

The backups can be either full or incremental, but if you have the disk space and bandwidth available (something that isn't usually an issue, as the backup solution should be in the same cab as your server), full backups are always preferable.

Unless you host your own servers it is unlikely you will ever need to configure Bacula yourself, as your hosting company will typically do this for you as part of a paid backup solution.

Rsnapshot

Rsnapshot is a filesystem snapshot utility which can be used to make backups of local and remote systems. It uses rsync to transfer files, and remote connections are done over ssh.

It can easily be installed on most versions of Linux using their native package manager, i.e. yum for Red Hat / CentOS distributions, and apt for Debian / Ubuntu distributions.

Most of the configuration is done via a central file located at: /etc/rsnapshot.conf

The files and directories to be backed up are defined by the backup directive, e.g.

backup /home/ backup/home/
backup :/home/ backup/home/

 

The first example is a local backup, copying the /home folder to a backup folder on the same server. The second example is a remote backup, connecting to an external server on 10.0.0.2 and pulling a copy of /home from that server across, and writing it to the backup directory.

The frequency of the backups and the retention period are set using the interval directive, e.g.

interval daily 7
interval weekly 4

These options would run the backup everyday and retain each copy for 7 days, run a weekly backup once a week, and retain each of those backups for 4 weeks.

The execution of the snapshots is controlled by a cronjob. You can set up a cron.d file for the rsnapshot user, or just add the jobs as root. The cronjob would typically look something like this:

30 3 * * * root /usr/bin/rsnapshot daily
0  3 * * 1 root /usr/bin/rsnapshot weekly

It is also possible to exclude directories and run scripts using additional directives that can be added to rsnapshot.conf.

 

Below is an example using the exclude_file directive:

exclude_file /etc/rsnapshot.server1.conf

This will read the contents of the file rsnapshot.server1.conf and exclude them from the backups. The excluded file can contain multiple entries, but each must be on a separate line, e.g.

/var/lib/php/session
/var/spool/mail/nobody/*
/tmp/cache/*

The backup_script directive can be used to run a script as part of the snapshot. For example, if you used a script called mysql-backup.sh to take dumps of all the databases, this can be added to the backup run by adding something like this:

backup_script /root/scripts/mysql_backup.sh

If you have two or more servers, rsnapshot can be used to backup data externally. For example, if you have a separate web server and database server, rsnapshot can be set to copy the web server's backup across to the database server, and the database server's backup to the webserver. This adds some level of redundancy, as each of the servers contains enough live and backup data to restore both elements of the whole cluster.

This can also be combined with an off-server backup, using Bacula or a similar service, to provide a comprehensive backup solution.

Backup tips

Backing up data to an external source does provide an increased measure of security by storing the backups off-server. This is the best method to guard against catastrophic hardware failures, as even RAID systems are not totally immune to failures - particularly if the controller fails.

Unless finances and/or elements of your hosting configuration preclude their use, external backups should always form the backbone of your recovery solution.

However, external backups do increase the recovery time necessary to restore files. One solution to this is to keep multiple copies of backups on the client's server itself, only retaining the most recent backup off-server for disaster recovery.

This is particularly straightforward when using a server running a control panel, such as cPanel, which can be configured to run and retain daily, weekly and monthly backups. Bacula can then be set to copy only the most recent daily backups to external storage. The /backup folder would then contain a copy of daily, weekly and monthly backups – as compressed files – that can be quickly used to restore one or multiple sites, if necessary.

This option provides a much easier and quicker way of recovering missing files than going to your external backups. And, as accidental deletions are the second most common form of data loss, this is likely to be a recovery solution you turn to more than once.

This option does result in increased disk space usage, as keeping multiple copies of backups on the server can rapidly use up disk space as your sites grow in size. Backups of databases are particularly prone to rapid growth.

It's always best to try and optimise your backups to exclude any unnecessary directories, such as caches or other internal backups. If your site's /home folder includes the mail folders, consider whether you actually need them backing up too. Databases should also be monitored, and old or unused data archived or deleted at regular intervals to prevent them from growing excessively.

As an added safety measure, any client-side backups should be stored on a separate partition. By default, backups are usually saved to a /backup folder mounted on root, or within the /home directory, i.e. /home/backup.

Creating /backup on it's own partition isolates it from the other partitions, and greatly increases it's resistance to file system problems.

The best method of managing partitions is to use LVM (Logical Volume Management), which provides much greater flexibility, such as allowing partitions to be resized on-the-fly.

Normally, if a partition needs resizing, you would need to reclaim space back from another partition and then reallocate it to the partition you need to increase. This involves unmounting both partitions and shrinking one to then expand the other, which results in downtime.

Using LVM, not all available disk space is allocated to the partitions up front. Disk space is held in a Volume Group which is then assigned as needed to the various partitions on a per-need basis. This allows for a more organic growth of disk space usage, rather than making assumptions about usage in advance that may prove to be inaccurate in a live environment.

Conclusion

There are many more open source solutions available than we have space to cover, but a mixture of Bacula, rsnapshot and LVM form the core of many Enterprise level backup solutions. If these options do not meet your needs, bespoke shell scripts (typically utilising rsync, tar and ssh) are often an attractive option, although Bacula and rsnapshot can be heavily customised to meet most requirements.

Another popular alternative to Bacula is AMANDA (Advanced Maryland Automatic Network Disk Archiver), which is also worth considering.

While this article can barely scratch the surface of this huge topic, it will hopefully encourage you to think seriously about the effectiveness of your current backup solution. If you don't yet have a backup solution then this article should give you some ideas about the options available, and the potential consequences to your business of not adopting a solution.

Further reading

Extensive documentation on Bacula, rsnapshot and LVM can be found at:

More generalised documentation regarding backups can be found at:

Note: The Ubuntu documentation also includes a detailed guide to using and configuring Bacula.

Get In Touch...