This chapter is intended to be a technical discussion of the Catalog services and as such is not targeted at end users but rather at developers and system administrators that want or need to know more of the working details of Bacula.
The Bacula Catalog services consist of the programs that provide the SQL database engine for storage and retrieval of all information concerning files that were backed up and their locations on the storage media.
We have investigated the possibility of using the following SQL engines for Bacula: Beagle, mSQL, GNU SQL, PostgreSQL, SQLite, Oracle, and MySQL. Each presents certain problems with either licensing or maturity. At present, we have chosen for development purposes to use MySQL, PostgreSQL and SQLite. MySQL was chosen because it is fast, proven to be reliable, widely used, and actively being developed. MySQL is released under the GNU GPL license. PostgreSQL was chosen because it is a full-featured, very mature database, and because Dan Langille did the Bacula driver for it. PostgreSQL is distributed under the BSD license. SQLite was chosen because it is small, efficient, and can be directly embedded in Bacula thus requiring much less effort from the system administrator or person building Bacula. In our testing SQLite has performed very well, and for the functions that we use, it has never encountered any errors except that it does not appear to handle databases larger than 2GBytes.
The Bacula SQL code has been written in a manner that will allow it to be easily modified to support any of the current SQL database systems on the market (for example: mSQL, iODBC, unixODBC, Solid, OpenLink ODBC, EasySoft ODBC, InterBase, Oracle8, Oracle7, and DB2).
If you do not specify either --
with-mysql or --
with-postgresql or
--
with-sqlite on the ./configure line, Bacula will use its minimalist
internal database. This database is kept for build reasons but is no longer
supported. Bacula requires one of the three databases (MySQL,
PostgreSQL, or SQLite) to run.
In general, either MySQL, PostgreSQL or SQLite permit storing arbitrary long path names and file names in the catalog database. In practice, there still may be one or two places in the Catalog interface code that restrict the maximum path length to 512 characters and the maximum file name length to 512 characters. These restrictions are believed to have been removed. Please note, these restrictions apply only to the Catalog database and thus to your ability to list online the files saved during any job. All information received and stored by the Storage daemon (normally on tape) allows and handles arbitrarily long path and filenames.
For the details of installing and configuring MySQL, please see the Installing and Configuring MySQL chapter of this manual.
For the details of installing and configuring PostgreSQL, please see the Installing and Configuring PostgreSQL chapter of this manual.
For the details of installing and configuring SQLite, please see the Installing and Configuring SQLite chapter of this manual.
Please see the Internal Bacula Database chapter of this manual for more details.
All discussions that follow pertain to the MySQL database. The details for the PostgreSQL and SQLite databases are essentially identical except for that all fields in the SQLite database are stored as ASCII text and some of the database creation statements are a bit different. The details of the internal Bacula catalog are not discussed here.
Because the Catalog database may contain very large amounts of data for large sites, we have made a modest attempt to normalize the data tables to reduce redundant information. While reducing the size of the database significantly, it does, unfortunately, add some complications to the structures.
In simple terms, the Catalog database must contain a record of all Jobs run by Bacula, and for each Job, it must maintain a list of all files saved, with their File Attributes (permissions, create date, ...), and the location and Media on which the file is stored. This is seemingly a simple task, but it represents a huge amount interlinked data. Note: the list of files and their attributes is not maintained when using the internal Bacula database. The data stored in the File records, which allows the user or administrator to obtain a list of all files backed up during a job, is by far the largest volume of information put into the Catalog database.
Although the Catalog database has been designed to handle backup data for multiple clients, some users may want to maintain multiple databases, one for each machine to be backed up. This reduces the risk of confusion of accidental restoring a file to the wrong machine as well as reducing the amount of data in a single database, thus increasing efficiency and reducing the impact of a lost or damaged database.
Start with StartDate, ClientName, Filename, Path, Attributes, MediaName, MediaCoordinates. (PartNumber, NumParts). In the steps below, ``Create new'' means to create a new record whether or not it is unique. ``Create unique'' means each record in the database should be unique. Thus, one must first search to see if the record exists, and only if not should a new one be created, otherwise the existing RecordId should be used.
Filename | ||
Column Name | Data Type | Remark |
FilenameId | integer | Primary Key |
Name | Blob | Filename |
The Filename table shown above contains the name of each file backed up with the path removed. If different directories or machines contain the same filename, only one copy will be saved in this table.
Path | ||
Column Name | Data Type | Remark |
PathId | integer | Primary Key |
Path | Blob | Full Path |
The Path table contains shown above the path or directory names of all directories on the system or systems. The filename and any MSDOS disk name are stripped off. As with the filename, only one copy of each directory name is kept regardless of how many machines or drives have the same directory. These path names should be stored in Unix path name format.
Some simple testing on a Linux file system indicates that separating the filename and the path may be more complication than is warranted by the space savings. For example, this system has a total of 89,097 files, 60,467 of which have unique filenames, and there are 4,374 unique paths.
Finding all those files and doing two stats() per file takes an average wall clock time of 1 min 35 seconds on a 400MHz machine running RedHat 6.1 Linux.
Finding all those files and putting them directly into a MySQL database with the path and filename defined as TEXT, which is variable length up to 65,535 characters takes 19 mins 31 seconds and creates a 27.6 MByte database.
Doing the same thing, but inserting them into Blob fields with the filename indexed on the first 30 characters and the path name indexed on the 255 (max) characters takes 5 mins 18 seconds and creates a 5.24 MB database. Rerunning the job (with the database already created) takes about 2 mins 50 seconds.
Running the same as the last one (Path and Filename Blob), but Filename indexed on the first 30 characters and the Path on the first 50 characters (linear search done there after) takes 5 mins on the average and creates a 3.4 MB database. Rerunning with the data already in the DB takes 3 mins 35 seconds.
Finally, saving only the full path name rather than splitting the path and the file, and indexing it on the first 50 characters takes 6 mins 43 seconds and creates a 7.35 MB database.
File | ||
Column Name | Data Type | Remark |
FileId | integer | Primary Key |
FileIndex | integer | The sequential file number in the Job |
JobId | integer | Link to Job Record |
PathId | integer | Link to Path Record |
FilenameId | integer | Link to Filename Record |
MarkId | integer | Used to mark files during Verify Jobs |
LStat | tinyblob | File attributes in base64 encoding |
MD5 | tinyblob | MD5 signature in base64 encoding |
The File table shown above contains one entry for each file backed up by Bacula. Thus a file that is backed up multiple times (as is normal) will have multiple entries in the File table. This will probably be the table with the most number of records. Consequently, it is essential to keep the size of this record to an absolute minimum. At the same time, this table must contain all the information (or pointers to the information) about the file and where it is backed up. Since a file may be backed up many times without having changed, the path and filename are stored in separate tables.
This table contains by far the largest amount of information in the Catalog database, both from the stand point of number of records, and the stand point of total database size. As a consequence, the user must take care to periodically reduce the number of File records using the retention command in the Console program.
Job | ||
Column Name | Data Type | Remark |
JobId | integer | Primary Key |
Job | tinyblob | Unique Job Name |
Name | tinyblob | Job Name |
PurgedFiles | tinyint | Used by Bacula for purging/retention periods |
Type | binary(1) | Job Type: Backup, Copy, Clone, Archive, Migration |
Level | binary(1) | Job Level |
ClientId | integer | Client index |
JobStatus | binary(1) | Job Termination Status |
SchedTime | datetime | Time/date when Job scheduled |
StartTime | datetime | Time/date when Job started |
EndTime | datetime | Time/date when Job ended |
JobTDate | bigint | Start day in Unix format but 64 bits; used for Retention period. |
VolSessionId | integer | Unique Volume Session ID |
VolSessionTime | integer | Unique Volume Session Time |
JobFiles | integer | Number of files saved in Job |
JobBytes | bigint | Number of bytes saved in Job |
JobErrors | integer | Number of errors during Job |
JobMissingFiles | integer | Number of files not saved (not yet used) |
PoolId | integer | Link to Pool Record |
FileSetId | integer | Link to FileSet Record |
PurgedFiles | tiny integer | Set when all File records purged |
HasBase | tiny integer | Set when Base Job run |
The Job table contains one record for each Job run by Bacula. Thus normally, there will be one per day per machine added to the database. Note, the JobId is used to index Job records in the database, and it often is shown to the user in the Console program. However, care must be taken with its use as it is not unique from database to database. For example, the user may have a database for Client data saved on machine Rufus and another database for Client data saved on machine Roxie. In this case, the two database will each have JobIds that match those in another database. For a unique reference to a Job, see Job below.
The Name field of the Job record corresponds to the Name resource record given in the Director's configuration file. Thus it is a generic name, and it will be normal to find many Jobs (or even all Jobs) with the same Name.
The Job field contains a combination of the Name and the schedule time of the Job by the Director. Thus for a given Director, even with multiple Catalog databases, the Job will contain a unique name that represents the Job.
For a given Storage daemon, the VolSessionId and VolSessionTime form a unique identification of the Job. This will be the case even if multiple Directors are using the same Storage daemon.
The Job Type (or simply Type) can have one of the following values:
Value | Meaning |
B | Backup Job |
V | Verify Job |
R | Restore Job |
C | Console program (not in database) |
D | Admin Job |
A | Archive Job (not implemented) |
The JobStatus field specifies how the job terminated, and can be one of the following:
Value | Meaning |
C | Created but not yet running |
R | Running |
B | Blocked |
T | Terminated normally |
E | Terminated in Error |
e | Non-fatal error |
f | Fatal error |
D | Verify Differences |
A | Canceled by the user |
F | Waiting on the File daemon |
S | Waiting on the Storage daemon |
m | Waiting for a new Volume to be mounted |
M | Waiting for a Mount |
s | Waiting for Storage resource |
j | Waiting for Job resource |
c | Waiting for Client resource |
d | Wating for Maximum jobs |
t | Waiting for Start Time |
p | Waiting for higher priority job to finish |
FileSet | ||
Column Name | Data Type | Remark |
FileSetId | integer | Primary Key |
FileSet | tinyblob | FileSet name |
MD5 | tinyblob | MD5 checksum of FileSet |
CreateTime | datetime | Time and date Fileset created |
The FileSet table contains one entry for each FileSet that is used. The MD5 signature is kept to ensure that if the user changes anything inside the FileSet, it will be detected and the new FileSet will be used. This is particularly important when doing an incremental update. If the user deletes a file or adds a file, we need to ensure that a Full backup is done prior to the next incremental.
JobMedia | ||
Column Name | Data Type | Remark |
JobMediaId | integer | Primary Key |
JobId | integer | Link to Job Record |
MediaId | integer | Link to Media Record |
FirstIndex | integer | The index (sequence number) of the first file written for this Job to the Media |
LastIndex | integer | The index of the last file written for this Job to the Media |
StartFile | integer | The physical media (tape) file number of the first block written for this Job |
EndFile | integer | The physical media (tape) file number of the last block written for this Job |
StartBlock | integer | The number of the first block written for this Job |
EndBlock | integer | The number of the last block written for this Job |
VolIndex | integer | The Volume use sequence number within the Job |
The JobMedia table contains one entry for each volume written for the current Job. If the Job spans 3 tapes, there will be three JobMedia records, each containing the information to find all the files for the given JobId on the tape.
Media | ||
Column Name | Data Type | Remark |
MediaId | integer | Primary Key |
VolumeName | tinyblob | Volume name |
Slot | integer | Autochanger Slot number or zero |
PoolId | integer | Link to Pool Record |
MediaType | tinyblob | The MediaType supplied by the user |
FirstWritten | datetime | Time/date when first written |
LastWritten | datetime | Time/date when last written |
LabelDate | datetime | Time/date when tape labeled |
VolJobs | integer | Number of jobs written to this media |
VolFiles | integer | Number of files written to this media |
VolBlocks | integer | Number of blocks written to this media |
VolMounts | integer | Number of time media mounted |
VolBytes | bigint | Number of bytes saved in Job |
VolErrors | integer | Number of errors during Job |
VolWrites | integer | Number of writes to media |
MaxVolBytes | bigint | Maximum bytes to put on this media |
VolCapacityBytes | bigint | Capacity estimate for this volume |
VolStatus | enum | Status of media: Full, Archive, Append, Recycle, Read-Only, Disabled, Error, Busy |
Recycle | tinyint | Whether or not Bacula can recycle the Volumes: Yes, No |
VolRetention | bigint | 64 bit seconds until expiration |
VolUseDuration | bigint | 64 bit seconds volume can be used |
MaxVolJobs | integer | maximum jobs to put on Volume |
MaxVolFiles | integer | maximume EOF marks to put on Volume |
The Volume table (internally referred to as the Media table) contains one entry for each volume, that is each tape, cassette (8mm, DLT, DAT, ...), or file on which information is or was backed up. There is one Volume record created for each of the NumVols specified in the Pool resource record.
Pool | ||
Column Name | Data Type | Remark |
PoolId | integer | Primary Key |
Name | Tinyblob | Pool Name |
NumVols | Integer | Number of Volumes in the Pool |
MaxVols | Integer | Maximum Volumes in the Pool |
UseOnce | tinyint | Use volume once |
UseCatalog | tinyint | Set to use catalog |
AcceptAnyVolume | tinyint | Accept any volume from Pool |
VolRetention | bigint | 64 bit seconds to retain volume |
VolUseDuration | bigint | 64 bit seconds volume can be used |
MaxVolJobs | integer | max jobs on volume |
MaxVolFiles | integer | max EOF marks to put on Volume |
MaxVolBytes | bigint | max bytes to write on Volume |
AutoPrune | tinyint | yes|no for autopruning |
Recycle | tinyint | yes|no for allowing auto recycling of Volume |
PoolType | enum | Backup, Copy, Cloned, Archive, Migration |
LabelFormat | Tinyblob | Label format |
The Pool table contains one entry for each media pool controlled by Bacula in this database. One media record exists for each of the NumVols contained in the Pool. The PoolType is a Bacula defined keyword. The MediaType is defined by the administrator, and corresponds to the MediaType specified in the Director's Storage definition record. The CurrentVol is the sequence number of the Media record for the current volume.
Client | ||
Column Name | Data Type | Remark |
ClientId | integer | Primary Key |
Name | TinyBlob | File Services Name |
UName | TinyBlob | uname -a from Client (not yet used) |
AutoPrune | tinyint | yes|no for autopruning |
FileRetention | bigint | 64 bit seconds to retain Files |
JobRetention | bigint | 64 bit seconds to retain Job |
The Client table contains one entry for each machine backed up by Bacula in this database. Normally the Name is a fully qualified domain name.
UnsavedFiles | ||
Column Name | Data Type | Remark |
UnsavedId | integer | Primary Key |
JobId | integer | JobId corresponding to this record |
PathId | integer | Id of path |
FilenameId | integer | Id of filename |
The UnsavedFiles table contains one entry for each file that was not saved. Note! This record is not yet implemented.
Counter | ||
Column Name | Data Type | Remark |
Counter | tinyblob | Counter name |
MinValue | integer | Start/Min value for counter |
MaxValue | integer | Max value for counter |
CurrentValue | integer | Current counter value |
WrapCounter | tinyblob | Name of another counter |
The Counter table contains one entry for each permanent counter defined by the user.
Version | ||
Column Name | Data Type | Remark |
VersionId | integer | Primary Key |
The Version table defines the Bacula database version number. Bacula checks this number before reading the database to ensure that it is compatible with the Bacula binary file.
BaseFiles | ||
Column Name | Data Type | Remark |
BaseId | integer | Primary Key |
BaseJobId | integer | JobId of Base Job |
JobId | integer | Reference to Job |
FileId | integer | Reference to File |
FileIndex | integer | File Index number |
The BaseFiles table contains all the File references for a particular JobId that point to a Base file - i.e. they were previously saved and hence were not saved in the current JobId but in BaseJobId under FileId. FileIndex is the index of the file, and is used for optimization of Restore jobs to prevent the need to read the FileId record when creating the in memory tree. This record is not yet implemented.
The commands used to create the MySQL tables are as follows:
USE bacula; CREATE TABLE Filename ( FilenameId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, Name BLOB NOT NULL, PRIMARY KEY(FilenameId), INDEX (Name(30)) ); CREATE TABLE Path ( PathId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, Path BLOB NOT NULL, PRIMARY KEY(PathId), INDEX (Path(50)) ); CREATE TABLE File ( FileId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, FileIndex INTEGER UNSIGNED NOT NULL DEFAULT 0, JobId INTEGER UNSIGNED NOT NULL REFERENCES Job, PathId INTEGER UNSIGNED NOT NULL REFERENCES Path, FilenameId INTEGER UNSIGNED NOT NULL REFERENCES Filename, MarkId INTEGER UNSIGNED NOT NULL DEFAULT 0, LStat TINYBLOB NOT NULL, MD5 TINYBLOB NOT NULL, PRIMARY KEY(FileId), INDEX (JobId), INDEX (PathId), INDEX (FilenameId) ); CREATE TABLE Job ( JobId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, Job TINYBLOB NOT NULL, Name TINYBLOB NOT NULL, Type BINARY(1) NOT NULL, Level BINARY(1) NOT NULL, ClientId INTEGER NOT NULL REFERENCES Client, JobStatus BINARY(1) NOT NULL, SchedTime DATETIME NOT NULL, StartTime DATETIME NOT NULL, EndTime DATETIME NOT NULL, JobTDate BIGINT UNSIGNED NOT NULL, VolSessionId INTEGER UNSIGNED NOT NULL DEFAULT 0, VolSessionTime INTEGER UNSIGNED NOT NULL DEFAULT 0, JobFiles INTEGER UNSIGNED NOT NULL DEFAULT 0, JobBytes BIGINT UNSIGNED NOT NULL, JobErrors INTEGER UNSIGNED NOT NULL DEFAULT 0, JobMissingFiles INTEGER UNSIGNED NOT NULL DEFAULT 0, PoolId INTEGER UNSIGNED NOT NULL REFERENCES Pool, FileSetId INTEGER UNSIGNED NOT NULL REFERENCES FileSet, PurgedFiles TINYINT NOT NULL DEFAULT 0, HasBase TINYINT NOT NULL DEFAULT 0, PRIMARY KEY(JobId), INDEX (Name(128)) ); CREATE TABLE FileSet ( FileSetId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, FileSet TINYBLOB NOT NULL, MD5 TINYBLOB NOT NULL, CreateTime DATETIME NOT NULL, PRIMARY KEY(FileSetId) ); CREATE TABLE JobMedia ( JobMediaId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, JobId INTEGER UNSIGNED NOT NULL REFERENCES Job, MediaId INTEGER UNSIGNED NOT NULL REFERENCES Media, FirstIndex INTEGER UNSIGNED NOT NULL DEFAULT 0, LastIndex INTEGER UNSIGNED NOT NULL DEFAULT 0, StartFile INTEGER UNSIGNED NOT NULL DEFAULT 0, EndFile INTEGER UNSIGNED NOT NULL DEFAULT 0, StartBlock INTEGER UNSIGNED NOT NULL DEFAULT 0, EndBlock INTEGER UNSIGNED NOT NULL DEFAULT 0, VolIndex INTEGER UNSIGNED NOT NULL DEFAULT 0, PRIMARY KEY(JobMediaId), INDEX (JobId, MediaId) ); CREATE TABLE Media ( MediaId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, VolumeName TINYBLOB NOT NULL, Slot INTEGER NOT NULL DEFAULT 0, PoolId INTEGER UNSIGNED NOT NULL REFERENCES Pool, MediaType TINYBLOB NOT NULL, FirstWritten DATETIME NOT NULL, LastWritten DATETIME NOT NULL, LabelDate DATETIME NOT NULL, VolJobs INTEGER UNSIGNED NOT NULL DEFAULT 0, VolFiles INTEGER UNSIGNED NOT NULL DEFAULT 0, VolBlocks INTEGER UNSIGNED NOT NULL DEFAULT 0, VolMounts INTEGER UNSIGNED NOT NULL DEFAULT 0, VolBytes BIGINT UNSIGNED NOT NULL DEFAULT 0, VolErrors INTEGER UNSIGNED NOT NULL DEFAULT 0, VolWrites INTEGER UNSIGNED NOT NULL DEFAULT 0, VolCapacityBytes BIGINT UNSIGNED NOT NULL, VolStatus ENUM('Full', 'Archive', 'Append', 'Recycle', 'Purged', 'Read-Only', 'Disabled', 'Error', 'Busy', 'Used', 'Cleaning') NOT NULL, Recycle TINYINT NOT NULL DEFAULT 0, VolRetention BIGINT UNSIGNED NOT NULL DEFAULT 0, VolUseDuration BIGINT UNSIGNED NOT NULL DEFAULT 0, MaxVolJobs INTEGER UNSIGNED NOT NULL DEFAULT 0, MaxVolFiles INTEGER UNSIGNED NOT NULL DEFAULT 0, MaxVolBytes BIGINT UNSIGNED NOT NULL DEFAULT 0, InChanger TINYINT NOT NULL DEFAULT 0, MediaAddressing TINYINT NOT NULL DEFAULT 0, VolReadTime BIGINT UNSIGNED NOT NULL DEFAULT 0, VolWriteTime BIGINT UNSIGNED NOT NULL DEFAULT 0, PRIMARY KEY(MediaId), INDEX (PoolId) ); CREATE TABLE Pool ( PoolId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, Name TINYBLOB NOT NULL, NumVols INTEGER UNSIGNED NOT NULL DEFAULT 0, MaxVols INTEGER UNSIGNED NOT NULL DEFAULT 0, UseOnce TINYINT NOT NULL, UseCatalog TINYINT NOT NULL, AcceptAnyVolume TINYINT DEFAULT 0, VolRetention BIGINT UNSIGNED NOT NULL, VolUseDuration BIGINT UNSIGNED NOT NULL, MaxVolJobs INTEGER UNSIGNED NOT NULL DEFAULT 0, MaxVolFiles INTEGER UNSIGNED NOT NULL DEFAULT 0, MaxVolBytes BIGINT UNSIGNED NOT NULL, AutoPrune TINYINT DEFAULT 0, Recycle TINYINT DEFAULT 0, PoolType ENUM('Backup', 'Copy', 'Cloned', 'Archive', 'Migration', 'Scratch') NOT NULL, LabelFormat TINYBLOB, Enabled TINYINT DEFAULT 1, ScratchPoolId INTEGER UNSIGNED DEFAULT 0 REFERENCES Pool, RecyclePoolId INTEGER UNSIGNED DEFAULT 0 REFERENCES Pool, UNIQUE (Name(128)), PRIMARY KEY (PoolId) ); CREATE TABLE Client ( ClientId INTEGER UNSIGNED NOT NULL AUTO_INCREMENT, Name TINYBLOB NOT NULL, Uname TINYBLOB NOT NULL, /* full uname -a of client */ AutoPrune TINYINT DEFAULT 0, FileRetention BIGINT UNSIGNED NOT NULL, JobRetention BIGINT UNSIGNED NOT NULL, UNIQUE (Name(128)), PRIMARY KEY(ClientId) ); CREATE TABLE BaseFiles ( BaseId INTEGER UNSIGNED AUTO_INCREMENT, BaseJobId INTEGER UNSIGNED NOT NULL REFERENCES Job, JobId INTEGER UNSIGNED NOT NULL REFERENCES Job, FileId INTEGER UNSIGNED NOT NULL REFERENCES File, FileIndex INTEGER UNSIGNED, PRIMARY KEY(BaseId) ); CREATE TABLE UnsavedFiles ( UnsavedId INTEGER UNSIGNED AUTO_INCREMENT, JobId INTEGER UNSIGNED NOT NULL REFERENCES Job, PathId INTEGER UNSIGNED NOT NULL REFERENCES Path, FilenameId INTEGER UNSIGNED NOT NULL REFERENCES Filename, PRIMARY KEY (UnsavedId) ); CREATE TABLE Version ( VersionId INTEGER UNSIGNED NOT NULL ); -- Initialize Version INSERT INTO Version (VersionId) VALUES (7); CREATE TABLE Counters ( Counter TINYBLOB NOT NULL, MinValue INTEGER, MaxValue INTEGER, CurrentValue INTEGER, WrapCounter TINYBLOB NOT NULL, PRIMARY KEY (Counter(128)) );