When disaster strikes, you must have a plan, and you must have prepared in advance otherwise the work of recovering your system and your files will be considerably greater. For example, if you have not previously saved the partitioning information for your hard disk, how can you properly rebuild it if the disk must be replaced?
Unfortunately, many of the steps one must take before and immediately after a disaster are very operating system dependent. As a consequence, this chapter will discuss in detail disaster recovery (also called Bare Metal Recovery) for Linux and Solaris. For Solaris, the procedures are still quite manual. For FreeBSD the same procedures may be used but they are not yet developed. For Win32, no luck. Apparently an ``emergency boot'' disk allowing access to the full system API without interference does not exist.
Here are a few important considerations concerning disaster recovery that you should take into account before a disaster strikes.
The remainder of this section concerns recovering a Linux computer, and parts of it relate to the Red Hat version of Linux. The Solaris procedures can be found below under the Solaris Bare Metal Recovery section of this chapter.
If you wish to use a floppy for restoration, please see the chapter Bare Metal Floppy Recovery on Linux with a Bacula Floppy Rescue Disk, but be aware that the Bacula floppy disk is deprecated and replaced by the CDROM rescue described in this chapter.
A so called ``Bare Metal'' recovery is one where you start with an empty hard disk and you restore your machine. There are also cases where you may lose a file or a directory and want it restored. Please see the previous chapter for more details for those cases.
Bare Metal Recovery assumes that you have the following items for your system:
In addition, to the above assumptions, the following conditions or restrictions apply:
To build the Bacula Rescue CDROM, you will find the necessary scripts in rescue/linux/cdrom subdirectory of the Bacula source code. If you installed the bacula-rescue rpm package the scripts will be found in the /etc/bacula/rescue/cdrom directory.
Before you can do a Bare Metal recovery, you must create a Bacula Rescue CDROM, which will contain everything you need to begin recovery. This assumes that you will have your Directory and Storage daemon running on a different machine. If you want to recover a machine where the Director and/or the database were previously running things will be much more complicated.
The primary goals of the Bacula rescue CD are:
One of the main of the advantages of a Bacula Rescue CDROM is that it contains a bootable copy of your system, so you should be familiar with it.
You should probably make a new rescue CDROM each time you make any major updates to your kernel, and every time you upgrade a major version of Bacula.
The whole process with the exception of burning the CDROM is done with the following commands:
(Build a working version of Bacula in the bacula-source directory) cd <bacula-source> ./configure (your options) make cd <bacula-source>/rescue/linux/cdrom su (become root) make all
For users of the bacula-rescue rpm the static bacula-fd has already been built and placed in /etc/bacula/rescue/cdrom/bin/ along with a symbolic link to your /etc/bacula/bacula-fd.conf file. Rpm users only need to do the second step:
cd /etc/bacula/rescue/cdrom su (become root) make all
At this point, if the scripts are successful, they should have done the following things:
Once this is accomplished, you need only burn it into a CDROM. This can be done directly from the makefile with:
make burn
However, you may need to modify the Makefile to properly specify your CD burner as the detection process is complicated especially if you have two CDROMs or do not have cdrecord loaded on your system. Users of the rescue rpm package should definitely examine the Makefile since it was configured on the host used to produce the rpm package. If you find that the make burn does not work for you, try doing a:
make scan
and use the output of that to modify the Makefile accordingly.
The ``make all'' that you did above actually does the equivalent to the following:
make kernel make binaries make bacula make iso
If you wish, you can modify what you put on the CDROM and redo any part of the make that you wish. For example, if you want to add a new directory, you might do the first three makes, then add a new directory to the CDROM, and finally do a ``make iso''. Please see the README file in the rescue/linux/cdrom or /etc/bacula/rescue/cdromdirectory for instructions on changing the contents of the CDROM.
At the current time, the size of the CDROM is about 50MB (compressed to about 20MB), so there is quite a bit more room for additional program. Keep in mind that when this CDROM is booted, *everything* is in memory, so the total size cannot exceed your memory size, and even then you will need some reserve memory for running programs, ...
You can put multiple systems on the same rescue CD if you wish. This is because the information that is specific to your OS will be stored in the /bacula-hostname directory, where hostname is the name of the host on which you are building the CD. Suppose for example, you have two systems. One named client1 and one named client2. Assume also that your CD burner is on client1, and that is the machine we start on, and that we can ssh into client2 and also client2's disks are mounted on client1.
ssh client2 cd <bacula-source> ./configure (your options) make cd rescue/linux/cdrom su (enter root password) make bacula exit exit
Again, for rpm package users the above command set would be:
ssh client2 cd /etc/bacula/rescue/cdrom su (enter root password) make bacula exit exit
Thus we have just built a Bacula rescue directory on client2. Now, on client1, we copy the appropriate directory to two places (explained below), then build an ISO and burn it:
cd <bacula-source> ./configure (your options) make cd rescue/linux/cdrom su (enter root password) c=/mnt/client2/home/user/bacula/rescue/linux/cdrom cp -a $c/roottree/bacula-client2 roottree cp -a $c/roottree/bacula-client2 cdtree make all make burn exit
And with the rpm package:
cd /etc/bacula/rescue/cdrom su (enter root password) c=/mnt/client2/etc/bacula/rescue/cdrom cp -a $c/roottree/bacula-client2 roottree cp -a $c/roottree/bacula-client2 cdtree make all make burn exit
In summary, with the above commands, we first build a Bacula directory on client2 in roottree/bacula-client2, then we copied the bacula-client2 directory into the client1's roottree so it is available in memory after booting, and we also copied it into the cdtree so it will also be on the CD as a separate directory and thus can be read without booting the CDROM. Then we made and burned the CDROM for client1, which of course, contains the client2 data.
Now, let's assume that your hard disk has just died and that you have replaced it with an new identical drive. In addition, we assume that you have:
This is a relatively simple case, and later in this chapter, as time permits, we will discuss how you might recover from a situation where the machine that crashes is your main Bacula server (i.e. has the Director, the Catalog, and the Storage daemon).
You will take the following steps to get your system back up and running:
Now for the details ...
When the CDROM boots, you will be presented with a script that looks like:
Welcome to the Bacula Rescue Disk 1.1.0 To proceed, press the <ENTER> key or type "linux <runlevel>" linux 1 -> shell linux 2 -> login (default if ENTER pressed) linux 3 -> network started and login (network not working yet) linux debug -> print debug during boot then login
Normally, at this point, you simply press ENTER. However, you may supply options for the boot if you wish.
Once it has booted, you will be requested to login something like:
Welcome to the Bacula Rescue CDROM 2.4.21-15.0.4.EL #1 Wed Aug 4 03:08:03 EDT 2004 Please login using root and your root password ... RescueCD login:
Note, you must enter the root password for the system on which you loaded the kernel or on which you did the build of the CDROM. Once you are logged in, your will be in the home directory for root, and you can proceed to examine your system.
The complete Bacula rescue part of the CD will be in the directory: /bacula-hostname, where hostname is replaced by the name of the host machine on which you did the build for the CDROM. This naming procedure allows you to put multiple restore environments for each of your machines on a single CDROM if you so wish to do. Please see the README document in the rescue/linux/cdrom directory for more information on adding to the CDROM.
At this point, you should bring up your network. Normally, this is quite simple and requires just a few commands. Please cd into the /bacula-hostname directory before continuing. To simplify your task, we have created a script that should work in most cases by typing:
cd /bacula-hostname ./start_network
You can test it by pinging another machine, or pinging your broken machine machine from another machine. Do not proceed until your network is up.
Assuming that your hard disk crashed and needs repartitioning, proceed with:
./partition.hda
If you have multiple disks, do the same for each of them. For SCSI disks, the repartition script will be named: partition.sda. If the script complains about the disk being in use, simply go back and redo the df command and umount commands until you no longer have your hard disk mounted. Note, in many cases, if your hard disk was seriously damaged or a new one installed, it will not automatically be mounted. If it is mounted, it is because the emergency kernel found one or more possibly valid partitions.
If for some reason this procedure does not work, you can use the information in partition.hda to re-partition your disks by hand using fdisk.
If you have repartitioned your hard disk, you must format it appropriately. The formatting script will put back swap partitions, normal Unix partitions (ext2) and journaled partitions (ext3) as well as Reiser partitions (rei). Do so by entering for each disk:
./format.hda
The format script will ask you if you want a block check done. We recommend to answer yes, but realize that for very large disks this can take hours.
Once the disks are partitioned and formatted, you can remount them with the mount_drives script. All your drives must be mounted for Bacula to be able to access them. Run the script as follows:
./mount_drives df
The df command will tell you if the drives are mounted. If not, re-run the script again. It isn't always easy to figure out and create the mount points and the mounts in the proper order, so repeating the ./mount_drives command will not cause any harm and will most likely work the second time. If not, correct it by hand before continuing.
If you have booted with a Bacula Rescue CDROM, your statically linked Bacula File daemon and the bacula-fd.conf file with be in the /bacula-hostname/bin directory. Make sure bacula-fd and bacula-fd.conf are both there.
Edit the Bacula configuration file, create the working/pid/subsys directory if you haven't already done so above, and start Bacula. Before starting Bacula, you will need to move it and bacula-fd.conf from /bacula-hostname/bin, to the /mnt/disk/tmp directory so that it will be on your hard disk. Then start it with the following command:
chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf
The above command starts the Bacula File daemon with your the proper root disk location (i.e. /mnt/disk/tmp. If Bacula does not start correct the problem and start it. You can check if it is running by entering:
ps fax
You can kill Bacula by entering:
kill -TERM <pid>
where pid is the first number printed in front of the first occurrence of bacula-fd in the ps fax command.
Now, you should be able to use another computer with Bacula installed to check the status by entering:
status client=xxxx
into the Console program, where xxxx is the name of the client you are restoring.
One common problem is that your bacula-dir.conf may contain machine addresses that are not properly resolved on the stripped down system to be restored because it is not running DNS. This is particularly true for the address in the Storage resource of the Director, which may be very well resolved on the Director's machine, but not on the machine being restored and running the File daemon. In that case, be prepared to edit bacula-dir.conf to replace the name of the Storage daemon's domain name with its IP address.
On the computer that is running the Director, you now run a restore command and select the files to be restored (normally everything), but before starting the restore, there is one final change you must make using the mod option. You must change the Where directory to be the root by using the mod option just before running the job and selecting Where. Set it to:
/
then run the restore.
You might be tempted to avoid using chroot and running Bacula directly and then using a Where to specify a destination of /mnt/disk. This is possible, however, the current version of Bacula always restores files to the new location, and thus any soft links that have been specified with absolute paths will end up with /mnt/disk prefixed to them. In general this is not fatal to getting your system running, but be aware that you will have to fix these links if you do not use chroot.
At this point, the restore should have finished with no errors, and all your files will be restored. One last task remains and that is to write a new boot sector so that your machine will boot. For lilo, you enter the following command:
./run_lilo
If you are using grub instead of lilo, you must enter the following:
./run_grub
Note, I've had quite a number of problems with grub because it is rather complicated and not designed to install easily under a simplified system. So, if you experience errors or end up unexpectedly in a chroot shell, simply exit back to the normal shell and type in the appropriate commands from the run_grub script by hand until you get it to install. When you run the run_grub script, it will print the commands that you should manually enter if that is necessary.
First unmount all your hard disks, otherwise they will not be cleanly shutdown, then reboot your machine by entering exit until you get to the main prompt then enter ctl-d. Once back to the main CDROM prompt, you will need to turn the power off then back on to your machine to get it to reboot.
If everything went well, you should now be back up and running. If not, re-insert the emergency boot CDROM, boot, and figure out what is wrong.
Above, we considered how to recover a client machine where a valid Bacula server was running on another machine. However, what happens if your server goes down and you no longer have a running Director, Catalog, or Storage daemon? There are several solutions:
The first option, is very difficult because it requires you to have created a static version of the Director and the Storage daemon as well as the Catalog. If the Catalog uses MySQL or PostgreSQL, this may or may not be possible. In addition, to loading all these programs on a bare system (quite possible), you will need to make sure you have a valid driver for your tape drive.
The second suggestion is probably a much simpler solution, and one I have done myself. To do so, you might want to consider the following steps:
Since every flavor and every release of Linux is different, there are likely to be some small difficulties with the scripts, so please be prepared to edit them in a minimal environment. A rudimentary knowledge of vi is very useful. Also, these scripts do not do everything. You will need to reformat Windows partitions by hand, for example.
Getting the boot loader back can be a problem if you are using grub because it is so complicated. If all else fails, reboot your system from your floppy but using the restored disk image, then proceed to a reinstallation of grub (looking at the run-grub script can help). By contrast, lilo is a piece of cake.
The same basic techniques described above also apply to FreeBSD. Although we don't yet have a fully automated procedure, Alex Torres Molina has provided us with the following instructions with a few additions from Jesse Guardiani and Dan Languille:
The same basic techniques described above apply to Solaris:
However, during the recovery phase, the boot and disk preparation procedures are different:
Once the disk is partitioned, formatted and mounted, you can continue with bringing up the network and reloading Bacula.
As mentioned above, before a disaster strikes, you should prepare the information needed in the case of problems. To do so, in the rescue/solaris subdirectory enter:
su ./getdiskinfo ./make_rescue_disk
The getdiskinfo script will, as in the case of Linux described above, create a subdirectory diskinfo containing the output from several system utilities. In addition, it will contain the output from the SysAudit program as described in Curtis Preston's book. This file diskinfo/sysaudit.bsi will contain the disk partitioning information that will allow you to manually follow the procedures in the ``Unix Backup & Recovery'' book to repartition and format your hard disk. In addition, the getdiskinfo script will create a start_network script.
Once you have your your disks repartitioned and formatted, do the following:
When a pre-1.30 version of Bacula restores a directory, it first must create the directory, then it populates the directory with its files and subdirectories. The act of creating the files and subdirectories updates both the modification and access times associated with the directory itself. As a consequence, all modification and access times of all directories will be updated to the time of the restore.
This has been corrected in Bacula version 1.30 and later. The directory modification and access times is reset to the value saved in the backup after all the files and subdirectories have been restored. This has been tested and verified on normal restore operations, but not verified during a bare metal recovery.
If any of you look closely at the bootstrap file that is produced and used for the restore (I sure do), you will probably notice that the FileIndex item does not include all the files saved to the tape. This is because in some instances there are duplicates (especially in the case of an Incremental save), and in such circumstances, Bacula restores only the last of multiple copies of a file or directory.
Due to open system files, and registry problems, Bacula cannot save and restore a complete Win2K/XP/NT environment.
A suggestion by Damian Coutts using Microsoft's NTBackup utility in conjunction with Bacula should permit a Full bare metal restore of Win2K/XP (and possibly NT systems). His suggestion is to do an NTBackup of the critical system state prior to running a Bacula backup with the following command:
ntbackup backup systemstate /F c:\systemstate.bkf
The backup is the command, the systemstate says to backup only the system state and not all the user files, and the /F c:\systemstate.bkf specifies where to write the state file. this file must then be saved and restored by Bacula.
To restore the system state, you first reload a base operating system, then you would use Bacula to restore all the users files and to recover the c:\systemstate.bkf file, and finally, run NTBackup and catalogue the system statefile, and then select it for restore. The documentation says you can't run a command line restore of the systemstate.
This procedure has been confirmed to work by Ludovic Strappazon - many thanks!
A new tool is provided in the form of a bacula plugin for the BartPE rescue CD. BartPE is a self-contained WindowsXP boot CD which you can make using the PeBuilder tools available at http://www.nu2.nu/pebuilder/ and a valid Windows XP SP1 CDROM. The plugin is provided as a zip archive. Unzip the file and copy the bacula directory into the plugin directory of your BartPE installation. Edit the configuration files to suit your installation and build your CD according to the instructions at Bart's site. This will permit you to boot from the cd, configure and start networking, start the bacula file client and access your director with the console program. The programs menu on the booted CD contains entries to install the file client service, start the file client service, and start the WX-Console. You can also open a command line window and CD Programs\Bacula and run the command line console bconsole.
Bacula versions after 1.31 should properly restore ownership and permissions on all WinNT/XP/2K systems. If you do experience problems, generally in restores to alternate directories because higher level directories were not backed up by Bacula, you can correct any problems with the SetACL available under the GPL license at: http://sourceforge.net/projects/setacl/.
Ludovic Strappazon has suggested an interesting way to backup and restore complete Win32 partitions. Simply boot your Win32 system with a Linux Rescue disk as described above for Linux, install a statically linked Bacula, and backup any of the raw partitions you want. Then to restore the system, you simply restore the raw partition or partitions. Here is the email that Ludovic recently sent on that subject:
I've just finished testing my brand new cd LFS/Bacula with a raw Bacula backup and restore of my portable. I can't resist sending you the results: look at the rates !!! hunt-dir: Start Backup JobId 100, Job=HuntBackup.2003-04-17_12.58.26 hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:14 JobId: 100 Job: HuntBackup.2003-04-17_12.58.26 FileSet: RawPartition Backup Level: Full Client: sauvegarde-fd Start time: 17-Apr-2003 12:58 End time: 17-Apr-2003 13:14 Files Written: 1 Bytes Written: 10,058,586,272 Rate: 10734.9 KB/s Software Compression: None Volume names(s): 000103 Volume Session Id: 2 Volume Session Time: 1050576790 Last Volume Bytes: 10,080,883,520 FD termination status: OK SD termination status: OK Termination: Backup OK hunt-dir: Begin pruning Jobs. hunt-dir: No Jobs found to prune. hunt-dir: Begin pruning Files. hunt-dir: No Files found to prune. hunt-dir: End auto prune. hunt-dir: Start Restore Job RestoreFilesHunt.2003-04-17_13.21.44 hunt-sd: Forward spacing to file 1. hunt-dir: Bacula 1.30 (14Apr03): 17-Apr-2003 13:54 JobId: 101 Job: RestoreFilesHunt.2003-04-17_13.21.44 Client: sauvegarde-fd Start time: 17-Apr-2003 13:21 End time: 17-Apr-2003 13:54 Files Restored: 1 Bytes Restored: 10,056,130,560 Rate: 5073.7 KB/s FD termination status: OK Termination: Restore OK hunt-dir: Begin pruning Jobs. hunt-dir: No Jobs found to prune. hunt-dir: Begin pruning Files. hunt-dir: No Files found to prune. hunt-dir: End auto prune.
If for some reason you want to do a Full restore to a system that has a working kernel, you will need to take care not to overwrite the following files:
/etc/grub.conf /etc/X11/Conf /etc/fstab /etc/mtab /lib/modules /usr/modules /usr/X11R6 /etc/modules.conf
Many thanks to Charles Curley who wrote Linux Complete Backup and Recovery HOWTO for the The Linux Documentation Project. This is an excellent document on how to do Bare Metal Recovery on Linux systems, and it was this document that made me realize that Bacula could do the same thing.
You can find quite a few additional resources, both commercial and free at Storage Mountain, formerly known as Backup Central.
And finally, the O'Reilly book, ``Unix Backup & Recovery'' by W. Curtis Preston covers virtually every backup and recovery topic including bare metal recovery for a large range of Unix systems.