Database housekeeping and garbage collection

<< Setting the database dialect | Firebird Database Housekeeping Utility | Database startup and shutdown >>

Database housekeeping and garbage collection

Garbage

Garbage, for want of a better name, is the detritus that Firebird leaves around in the database after a rollback has been carried out. This is basically a copy of the row(s) from the table(s) that were being updated (or deleted) by the transaction prior to the rollback.

Because Firebird uses multi-generational architecture, every time a row is updated or deleted, Firebird keeps a copy in the database. These copies use space in the pages and can remain in the database for some time.

In addition to taking up space in the database, these old copies can lead to increased transaction startup times.

There are two types of garbage:

  • Remnants from a committed transaction.
  • Remnants from an aborted (rolled-back) transaction.

These remnants are simply older copies of the rows that were being updated by the respective transactions.

The differences are that:

  • Whenever a subsequent transaction reaches garbage from a committed transaction, that garbage is automatically cleared out.
  • Rolled-back garbage is never automatically cleared out.
    This means that on a database with a lot of rolled-back transactions, there could be a large build up of old copies of the rows that were updated and then rolled back.

Firebird will automatically sweep through the database and remove the remnants of rolled-back transactions and this has two effects:

  • The database size is reduced as the old copies of rows are deleted.
  • The performance of the database may be affected while the sweep is in progress.

Note: One other method of clearing out old rolled-back transactions' garbage is simply to carry out a database backup.

In the Superserver version of Firebird 2.0, garbage collection has been vastly improved. There are now three different ways of operation and these are configurable by setting the GCPOLICY parameter in the firebird.conf configuration file. By default, Superserver uses combined' while Classic Server uses cooperative. The other option is background.

Note: Classic Server ignores the setting and always uses cooperative garbage collection.

back to top of page

Cooperative garbage collection

This is the default setting, indeed the only setting, that Classic Server uses. In this mode, the normal operation - as described above - takes place. When a full scan is performed (perhaps during a backup) old versions of the rows are deleted at that point in time.

Background garbage collection

Superserver has, even since before version 1.0, performed background garbage collection where the server informs the garbage collector about old versions of updated and deleted rows when they are ready to be cleaned up. This helps avoid the need to force a full scan of each record in the database tables to get the garbage collector to remove these old versions.

When all rows in a table are read by the server, any old record versions are flagged to the garbage collector as being ready to be cleared out. They are not deleted by the scanning process as in the cooperative method. The garbage collector runs as a separate background thread and it will, at some point, remove these older record versions from the database.

Combined garbage collection

This is the default garbage collection method for Superserver installations. In this mode, both the above methods are used together.

back to top of page

Setting the sweep interval

The default sweep interval for a new database is 20,000. The sweep interval is the difference between the oldest interesting transaction or OIT and the next transaction number.

Note: This doesn't mean that every 20,000 transaction a sweep will take place. It will take place when the difference between the OIT and the next transaction is greater than the sweep interval.

An interesting transaction is one which has not yet committed. It may be still active, in limbo or may have been rolled back.

The sweep facility runs through the database and gets rid of old rows in tables that are out of date. This prevents the database from growing too big and helps reduce the time it takes to start a new transaction on the database.

Note: If you find that starting a new transaction takes a long time, it may be a good idea to run a manual sweep of the database in case the need for a sweep is causing the hold-up.

You can check if a manual sweep may be required by running the gstat utility to check the database header page and extract the oldest and next transaction numbers from the output. If the gap is small (less than the sweep interval) then a manual sweep may be in order. Alternatively, the SHOW DATABASE command in isql will also show the details you need.

A manual sweep can be run by using the -s[weep] command. (See below.)

To alter the database's automatic sweep interval, use the following command:

 gfix -h[ousekeeping] INTERVAL database_name

The INTERVAL parameter is the new value for the sweep interval. The database_name parameter is the database upon which you wish to alter the setting for automatic sweeping. The following example shows the setting being changed from the default to a new value of 1,000:

 linux> gfix -h 1000 my_employee

 linux> gstat -header my_employee | grep Sweep
 Sweep interval: 1000

back to top of page

Manual garbage collection

If automatic sweeping has been turned off, or only runs rarely because of the sweep interval setting, the DBA can manually force a sweep to be performed. The command to carry out this task is:

 gfix -s[weep] [-i[gnore]] database_name

This command will force the garbage left over from old rolled-back transactions to be removed, reducing the database size and improving the performance of new transactions.

The -i[gnore] option may be supplied. This forces Firebird to ignore checksum errors on database pages. This is not a good idea and should rarely need to be used, however, if your database has suffered some problems it might be necessary to use it.

The following example shows a manual database sweep being implemented:

 linux> gfix -sweep my_employee

back to top of page

Disabling automatic sweeping

If you set the sweep interval to zero then automatic sweeping will be disabled. This implies that there will be no automatic housekeeping done so your database performance will not suffer as a result of the processing requirements of the automatic sweep.

If you disable sweeping you are advised to run a manual sweep at regular intervals when the database is quiet. Alternatively, simply make sure that you take regular backups of the database and as this is something you should be doing anyway, it shouldn't be a problem.

See also:
Garbage collection
Firebird administration using IBExpert: Garbage collection
Garbage
Garbage collectors
How do you know if your database server garbage collection is working?
Firebird 2.1.3 Release Notes: Garbage collector rationalisation
Database sweep
Database sweeps

back to top of page
<< Setting the database dialect | Firebird Database Housekeeping Utility | Database startup and shutdown >>