Please note that the Bacula Rescue Floppy is now deprecated and is and is replaced by the Bacula Rescue CDROM described in another chapter of this manual.
When disaster strikes, you must have a plan, and you must have prepared in advance otherwise the work of recovering your system and your files will be considerably greater. For example, if you have not previously saved the partitioning information for your hard disk, how can you properly rebuild it if the disk must be replaced?
Unfortunately, many of the steps one must take before and immediately after a disaster are very operating system dependent. As a consequence, this chapter will discuss in detail disaster recovery (also called Bare Metal Recovery) for Linux and Solaris. For Solaris, the procedures are still quite manual. For FreeBSD the same procedures may be used but they are not yet developed. For Win32, no luck. Apparently an ``emergency boot'' disk allowing access to the full system API without interference does not exist.
Here are a few important considerations concerning disaster recovery that you should take into account before a disaster strikes.
Since floppies are being used less and less, the Bacula Floppy rescue disk is deprecated, which means that it is no longer really supported. For those of you who have or need floppy rescue, we include the recovery instructions here for your reference.
The remainder of this section concerns recovering a Linux computer using a floppy, and parts of it relate to the Red Hat version of Linux.
A so called ``Bare Metal'' recovery is one where you start with an empty hard disk and you restore your machine. There are also cases where you may lose a file or a directory and want it restored. Please see the previous chapter for more details for those cases.
Bare Metal Recovery assumes that you have the following four items for your system:
In addition, to the above assumptions, the following conditions or restrictions apply:
If you are building a self-contained Bacula Rescue CDROM, you will find the necessary scripts in rescule/linux/cdrom subdirectory of the Bacula source code.
If you wish to build the Bacula Rescue floppy disk, the scripts discussed below can be found in the rescue/linux/floppy subdirectory of the Bacula source code.
There are two things you should do immediately on all (Linux) systems for which you wish to do a bare metal recovery:
Here you have several choices:
If you have created a Bacula Rescue CDROM, you can skip this section.
If you *must* use a boot floppy, my preference is to create and use a tomsrtbt emergency boot disk because it gives you a very clean Linux environment (with a 2.2 kernel) and the most utilities. See http://www.toms.net/rb/ for more details on this. It is very easy to do and well worth the effort. However, I recommend that you create both especially if you have non-standard hardware. You may find that tomsrtbt will not work with your network driver (he surely has one, but you must explicitly put it on the disk), whereas the Linux rescue is more likely to work.
If you have created a Bacula Rescue CDROM, you can skip this section.
To create a standard Linux emergency boot disk you must first know the name of the kernel, which you can find with:
ls -l /boot
and looking on the vmlinux-... line or alternative do an
uname -a
then become root and with a blank floppy in the drive, enter the following command:
mkbootdisk --device /dev/fd0 2.4.18-18
where you replace ``2.4.18-18'' by your system name.
This disk can then be booted and you will be in an environment with a number of important tools available. Some disadvantages of this environment as opposed to tomsrtbt are that you must enter linux rescue at the boot prompt or the boot will fail without a hard disk; it requires a disk boot image or a CDROM to be mounted, if the CDROM is released, you will loose a large number of the tools.
If you have created a Bacula Rescue CDROM, you can skip this section.
Specific to Red Hat Linux, is to create an Installation floppy, which can also be used as an emergency boot disk. The advantage of this method is that it works in conjunction with the installation CDROM and hence during the first part of restoring the system, you have a much larger number of tools available (on the CDROM). This can be extremely useful if you are not sure what really happened and you need to examine your system in detail.
To make a Red Hat Linux installation disk, do the following:
mount the Installation CDROM (/mnt/cdrom) cd /mnt/cdrom/images dd if=boot.img of=/dev/fd0 bs=1440k
Now that you have either an emergency boot disk or an installation floppy, you will be able to reboot your system in the absence of your hard disk or with a damaged hard disk. This method has the same disadvantages compared to tomsrtbt disk as mentioned above for the Emergency Boot Disk.
If you have created a Bacula Rescue CDROM, this step will be automatically done for you.
Simply having a boot disk is not sufficient to re-create things as they were. To solve this problem, we will create a Bacula Rescue disk. Everything that will be written to this disk will first be placed into the <bacula-src>/rescue/linux directory.
The first step is while your system is up and running normally, you use a Bacula script called getdiskinfo to capture certain important information about your hard disk configuration (partitioning, formatting, mount points, ...). getdiskinfo will also create a number of scripts using the information found that can be used in an emergency to repartition your disks, reformat them, and restore a statically linked version of the Bacula file daemon so that your disk can be restored from within a minimal boot environment.
The first step is to run getdiskinfo as follows:
su cd <bacula-src>/rescue/linux ./getdiskinfo
getdiskinfo works for either IDE or SCSI drives and recognizes both ext2 and ext3 file systems. If you wish to restore other file systems, you will need to modify the code. This script can be run multiple times, but really only needs to be run once unless you change your hard disk configuration.
Assuming you have a single hard disk on device /dev/hda, getdiskinfo will create the following files:
/mnt/drive/. This is used just before running the statically linked Bacula so that it can access your drives for the restore.
The getdiskinfo program (actually a shell script) will also create a subdirectory named diskinfo, which contains the following files:
df.bsi disks.bsi fstab.bsi ifconfig.bsi mount.bsi mount.ext2.bsi mount.ext3.bsi mtab.bsi route.bsi sfdisk.disks.bsi sfdisk.hda.bsi sfdisk.make.hda.bsi
Each of these files contains some important piece of information (sometimes redundant) about your hard disk setup or your network. Normally, you will not need this information, but it will be written to the Bacula Rescue disk just in case. Since it is normally not used, we will leave it to you to examine those files at your leisure.
If you have created a Bacula Rescue CDROM, this step will be automatically done for you.
The second of the three steps in creating your Bacula Rescue disk is to build a static version of the File daemon. Do so by either configuring Bacula as follows or by allowing the make_rescue_disk script described below make it for you:
cd <bacula-src> ./configure <normal-options> make cd src/filed make static-bacula-fd strip static-bacula-fd cp static-bacula-fd ../../rescue/linux/bacula-fd cp bacula-fd.conf ../../rescue/linux
Note, above, we built static-bacula-fd and changed its name to bacula-fd when copying it to the rescue/linux directory.
Finally, in <bacula-src>/rescue/linux, ensure that the WorkingDirectory and PIDDirectory both point to reasonable locations on a stripped down system. If you are using tomsrtbt you will also want to replace machine names with IP addresses since there is no resolver running. With the Linux Rescue disk, network address mapping seems to work. Don't forget that at the time this version of the Bacula File daemon runs, your file system will not be restored. In my bacula-fd.conf, I use /var/working.
When you have everything you need (output of getdiskinfo, Bacula File daemon, ...), you create your rescue floppy by putting a blank tape into your floppy disk drive and entering:
su ./make_rescue_disk
This script will reformat the floppy and write everything in the current directory and all files in the diskinfo directory to the floppy. If you supply the appropriate command line options, it will also build a static version of the Bacula file daemon and copy it along with the configuration file to the disk. Also using a command line option, you can make it write a compressed tar file containing all the files whose names are in backup.etc.list to the floppy. The list as provided contains names of files in /etc that you might need in a disaster situation. It is not needed, but in some cases such as a complex network setup, you may find it useful.
The following command line options are available for the make_rescue_disk script:
Usage: make_rescue_disk -h, --help print this message --make-static-bacula make static File daemon and add to diskette --copy-static-bacula copy static File daemon to diskette --copy-etc-files copy files in etc list to diskette
Briefly the options are:
--
make-static-bacula
--
copy-static-bacula
--
copy-etc-filesPlease examine the contents of the rescue floppy to ensure that it has everything you want and need. If not modify the scripts as necessary and re-run it until it is correct.
Now that you have both a system boot floppy and a Bacula Rescue floppy, assuming you have a full backup of your system made by Bacula, you are ready to handle nearly any kind of emergency restoration situation.
Now, let's assume that your hard disk has just died and that you have replaced it with an new identical drive. In addition, we assume that you have:
This is a relatively simple case, and later in this chapter, as time permits, we will discuss how you might recover from a situation where the machine that crashes is your main Bacula server (i.e. has the Director, the Catalog, and the Storage daemon).
You will take the following steps to get your system back up and running:
Now for the details ...
First you will boot with your emergency floppy. If you use the Installation floppy described above, when you get to the boot prompt:
boot:
you enter linux rescue.
If you are booting from tomsrtbt simply enter the default responses.
When your machine finishes booting, you should be at the command prompt possibly with your hard disk mounted on /mount/sysimage (Linux emergency only). To see what is actually mounted, use:
df
Make sure that the mount point /mnt/floppy exists. If not, enter:
mkdir -p /mnt/floppy
the mount your Bacula Rescue disk and cd to it with:
mount /dev/fd0 /mnt/floppy cd /mnt/floppy
To simplify running the scripts make sure the current directory is on your path by:
PATH=$PATH:.
At this point, you should bring up your network. Normally, this is quite simple and requires just a few commands. If you have booted from your Bacula Rescue CDROM, please cd into the /bacula-hostname directory before continuing. To simplify your task, we have created a script that should work in most cases by typing:
./start_network
You can test it by pinging another machine, or pinging your broken machine machine from another machine. Do not proceed until your network is up.
When you are sure you want to repartition your disk, normally, if your disk was damaged or if you are using tomsrtbt your hard disk will not be mounted. However, if it is you must first unmount it so that it is not in use. Do so by entering df and then enter the correct commands to unmount the disks. For example:
umount /mnt/sysimage/boot umount /mnt/sysimage/usr umount /mnt/sysimage/proc umount /mnt/sysimage/
where you explicitly unmount (umount) each sysimage partition and finally, the last one being the root. Do another df command to be sure you successfully unmount all the sysimage partitions.
This is necessary because sfdisk will refuse to partition a disk that is currently mounted. As mentioned, this should never be necessary with tomsrtbt.
If you are using tomsrtbt, you will need to do the following steps to get the correct sfdisk:
rm -f sfdisk bzip2 -d sfdisk.bz2
Do not do the above steps if you are using a standard Linux boot disk or the Bacula Rescue CDROM.
Then proceed with partitioning your hard disk by:
./partition.hda
If you have multiple disks, do the same for each of them. For SCSI disks, the repartition script will be named: partition.sda. If the script complains about the disk being in use, simply go back and redo the df command and umount commands until you no longer have your hard disk mounted. Note, in many cases, if your hard disk was seriously damaged or a new one installed, it will not automatically be mounted. If it is mounted, it is because the emergency kernel found one or more possibly valid partitions.
If for some reason this procedure does not work, you can use the information in partition.hda to re-partition your disks by hand using fdisk.
After partitioning your disk, you must format it appropriately. The formatting script will put back swap partitions, normal Unix partitions (ext2) and journaled partitions (ext3). Do so by entering for each disk:
./format.hda
The format script will ask you if you want a block check done. We recommend to answer yes, but realize that for very large disks this can take hours.
Once the disks are partitioned and formatted, you can remount them with the mount_drives script. All your drives must be mounted for Bacula to be able to access them. Run the script as follows:
./mount_drives df
The df will tell you if the drives are mounted. If not, re-run the script again. It isn't always easy to figure out and create the mount points and the mounts in the proper order, so repeating the ./mount_drives command will not cause any harm and will most likely work the second time. If not, correct it by hand before continuing.
Next, if you are using the Red Hat installation disk, unmount the CDROM drive by doing:
umount /mnt/cdrom
This is not necessary if you are running tomsrtbt. In doing this, I find it is always busy, and I haven't figured out how to unmount it (Linux boot only).
If you have booted with a Bacula Rescue CDROM, your statically linked Bacula File daemon and the bacula-fd.conf file with be in the /bacula-hostname/bin directory. Please skip the following paragraph and continue with editing the Bacula configuration file.
If you have not used a Bacula Rescue CDROM, now change (cd) to some directory where you want to put the image of the Bacula File daemon. I use the tmp directory my hard disk (mounted as /mnt/disk/tmp) because it is easy. Then install into the current directory Bacula by running the restore_bacula script from the floppy drive. For example:
cd /mnt/disk mkdir -p /mnt/disk/tmp mkdir -p /mnt/disk/tmp/working /mnt/floppy/restore_bacula ls -l
Make sure bacula-fd and bacula-fd.conf are both there.
Edit the Bacula configuration file, create the working/pid/subsys directory if you haven't already done so above, and start Bacula by entering:
chroot /mnt/disk /tmp/bacula-fd -c /tmp/bacula-fd.conf
The above command starts the Bacula File daemon with your the proper root disk location (i.e. /mnt/disk/tmp. If Bacula does not start correct the problem and start it. You can check if it is running by entering:
ps fax
You can kill Bacula by entering:
kill -TERM <pid>
where pid is the first number printed in front of the first occurrence of bacula-fd in the ps fax command.
Now, you should be able to use another computer with Bacula installed to check the status by entering:
status client=xxxx
into the Console program, where xxxx is the name of the client you are restoring.
One common problem is that your bacula-dir.conf may contain machine addresses that are not properly resolved on the stripped down system to be restored because it is not running DNS. This is particularly true for the address in the Storage resource of the Director, which may be very well resolved on the Director's machine, but not on the machine being restored and running the File daemon. In that case, be prepared to edit bacula-dir.conf to replace the name of the Storage daemon's domain name with its IP address.
Suppose your system was damaged for one reason or another, so that the hard disk and the partitioning and much of the filesystems are intact, but you want to do a full restore. If you have booted into your system with the RedHat Installation Disk by specifying linux rescue at the boot: prompt, you will find yourself in a shell command with your disks already mounted (if it was possible) in /mnt/sysimage. In this case, you can do much like you did above to restore your system:
cd /mnt/sysimage/tmp mkdir -p /mnt/sysimage/tmp/working /mnt/floppy/restore_bacula ls -l
Make sure that bacula-fd and bacula-fd.conf are both in the current directory and that the directory names in the bacula-fd.conf correctly point to the appropriate directories. Then start Bacula with:
chroot /mnt/sysimage /tmp/bacula-fd -c /tmp/bacula-fd.conf
On the computer that is running the Director, you now run a restore command and select the files to be restored (normally everything), but before starting the restore, there is one final change you must make using the mod option. You must change the Where directory to be the root by using the mod option just before running the job and selecting Where. Set it to:
/
then run the restore.
You might be tempted to avoid using chroot and running Bacula directly and then using a Where to specify a destination of /mnt/disk. This is possible, however, the current version of Bacula always restores files to the new location, and thus any soft links that have been specified with absolute paths will end up with /mnt/disk prefixed to them. In general this is not fatal to getting your system running, but be aware that you will have to fix these links if you do not use chroot.
At this point, the restore should have finished with no errors, and all your files will be restored. One last task remains and that is to write a new boot sector so that your machine will boot. For lilo, you enter the following command:
run_lilo
If you are using grub instead of lilo, you must enter the following:
run_grub
Note, I've had quite a number of problems with grub because it is rather complicated and not designed to install easily under a simplified system. So, if you experience errors or end up unexpectedly in a chroot shell, simply exit back to the normal shell and type in the appropriate commands from the run_grub script by hand until you get it to install.
Reboot your machine by entering exit until you get to the main prompt then enter ctl-d.
If everything went well, you should now be back up and running. If not, re-insert the emergency boot floppy, boot, and figure out what is wrong.
At this point, you will probably want to remove the temporary copy of Bacula that you installed. Do so with:
rm -f /bacula-fd /bacula-fd.conf rm -rf /working
Since every flavor and every release of Linux is different, there are likely to be some small difficulties with the scripts, so please be prepared to edit them in a minimal environment. A rudimentary knowledge of vi is very useful. Also, these scripts do not do everything. You will need to reformat Windows partitions by hand, for example.
Getting the boot loader back can be a problem if you are using grub because it is so complicated. If all else fails, reboot your system from your floppy but using the restored disk image, then proceed to a reinstallation of grub (looking at the run-grub script can help). By contrast, lilo is a piece of cake.
When performing the bare metal recovery using the Red Hat emergency boot disk (actually the installation boot disk), I was never able to release the cdrom, and when the system came up /mnt/cdrom was soft linked to /mnt/disk/dev/hdd, which is not correct. I fixed this in each case by deleting and simply remaking it with mkdir -p /mnt/cdrom.
This is a single floppy (1.722Meg) that really has A LOT of software. For example, by default (version 2.0.103) you get:
AHA152X AHA1542 AIC7XXX BUSLOGIC DAC960 DEC_ELCP(TULIP) EATA
EEXPRESS/PRO/PRO100 EL2 EL3 EXT2 EXT3 FAT FD IDE-CD/DISK/TAPE IMM INITRD
ISO9660 JOLIET LOOP MATH_EMULATION MINIX MSDOS NCR53C8XX NE2000 NFS NTFS
PARPORT PCINE2K PCNET32 PLIP PPA RTL8139 SD SERIAL/_CONSOLE SLIP SMC_ULTRA
SR ST VFAT VID_SELECT VORTEX WD80x3 .exrc 3c589_cs agetty ash badblocks
basename boot.b buildit.s busybox bz2bzImage bzip2 cardmgr cardmgr.pid cat
chain.b chattr chgrp chmod chown chroot clear clone.s cmp common config cp
cpio cs cut date dd dd-lfs debugfs ddate df dhcpcd--
dirname dmesg domainname
ds du dumpe2fs e2fsck echo egrep elvis ex false fdflush fdformat fdisk
filesize find findsuper fmt fstab grep group gunzip gzip halt head hexdump
hexedit host.conf hostname hosts httpd i82365 ifconfig ile init inittab insmod
install.s issue kernel key.lst kill killall killall5 ld ld-linux length less
libc libcom_err libe2p libext2fs libtermcap libuuid lilo lilo.conf ln
loadkmap login ls lsattr lsmod lua luasocket man map md5sum miterm mkdir
mkdosfs mke2fs mkfifo mkfs.minix mknod mkswap more more.help mount mt mtab mv
nc necho network networks nmclan_cs nslookup passwd pax pcmcia_core
pcnet_cs pidof ping poweroff printf profile protocols ps pwd rc.0 rc.S
rc.custom rc.custom.gz rc.pcmcia reboot rescuept reset resolv.conf rm rmdir
rmmod route rsh rshd script sed serial serial_cs services setserial
settings.s sh shared slattach sleep sln sort split stab strings swapoff swapon
sync tail tar tcic tee telnet telnetd termcap test tomshexd tomsrtbt.FAQ touch
traceroute true tune2fs umount undeb--
unpack.s unrpm--
update utmp vi vi.help
view watch wc wget which xargs xirc2ps_cs yecho yes zcat
In addition, at Tom's Web Site, you can find a lot of additional kernel drivers and other software (such as sdisk, which is used by Bacula.
Building his floppy is a piece of cake. Simply download his .tar.gz file then:
- detar the .tar.gz archive - become root - cd to the tomsrtbt-<version> directory - load a blank floppy with no bad sectors - ./install.s