Global improvements in Firebird 2.1

<< Changes to the Firebird API and ODS | Firebird 2.1.6 Release Notes | Data Definition Language (DDL) >>

Global improvements in Firebird 2.1

Some global improvements and changes have been implemented in Firebird 2.1, as engine development moves towards the architectural changes planned for Firebird 3.

Note: Unless otherwise indicated, these improvements apply from v.2.1 forward.

Forced Writes on Linux now works!

A. Peshkov

For maximum database safety, we configure databases for synchronous writes, a.k.a. Forced Writes ON. This mode - strongly recommended for normal production usage - makes the write() system call return only after the physical write to disk is complete. In turn, it guarantees that, after a COMMIT, any data modified by the transaction is physically on the hard-drive, not waiting in the operating system's cache.

Its implementation on Linux was very simple - invoke fcntl(dbFile, F_SETFL, O_SYNC).

Yet databases on Linux were sometimes corrupted anyway.

Forensics

Speed tests on Linux showed that setting O_SYNC on a file has no effect at all on performance! Fine, fast operating system we may think? Alas, no, it's a documented bug in the Linux kernel!

According to the Linux manual, "On Linux this command (i.e. fcntl(fd, F_SETFL, flags)) can only change the O_APPEND, O_ASYNC, O_DIRECT, O_NOATIME, and O_NONBLOCK flags". Though it is not documented in any place known to me, it turns out that an attempt to set any flag other than those listed in the manual (such as O_SYNC, for example) won't work but it does not cause fcntl() to return an error, either.

For Firebird and for InterBase versions since Day One, it means that Forced Writes has never worked on Linux. It certainly works on Windows. It seems likely that this is not a problem that affects other operating systems, although we cannot guarantee that. To make sure, you can check whether the implementation of fcntl() on your OS is capable of setting the O_SYNC flag.

The technique used currently, introduced in the Beta 2 release of Firebird 2.1, is to re-open the file. It should guarantee correct operation on any OS, provided the open() system call works correctly in this respect. It appears that no such problems are reported.

The Firebird developers have no idea why such a bug would remain unfixed almost two years after getting into the Linux kernel's bug-tracker. Apparently, in Linux, a documented bug evolves into a feature...

Instant fix for an older Firebird

Here's a tip if you want to do an instant fix for the problem in an older version of Firebird: use the sync option when mounting any partition with a Firebird database on board. An example of a line in /etc/fstab:

 /dev/sda9   /usr/database   ext3   noatime,sync     1 2

Databases on raw devices

A. Peshkov

File system I/O can degrade performance severely when a database in Forced Writes mode grows rapidly. On Linux, which lacks the appropriate system calls to grow the database efficiently, performance with Forced Writes can be as much as three times slower than with asynchronous writes.

When such conditions prevail, performance may be greatly enhanced by bypassing the file system entirely and restoring the database directly to a raw device. A Firebird database can be recreated on any type of block device.

Moving a database to a raw device

Moving your database to a raw device can be as simple as restoring a backup directly to an unformatted partition in the local storage system. For example,

 gbak -c my.fbk /dev/sda7

will restore your database on the third logical disk in the extended partition of your first SCSI or SATA harddrive (disk0).

Note: The database does not have a "database name" other than the device name itself. In the example given, the name of the database is /dev/sda7.

Special issues for `nbak/nbackup`

The physical backup utility nbackup must be supplied with an explicit file path and name for its difference file, in order to avoid this file being written into the /dev/ directory. You can achieve this with the following statement, using isql:

 # isql /dev/sda7
 SQL> alter database add difference file '/tmp/dev_sda7';

To keep the size of the nbak copy within reasonable bounds, it is of benefit to know how much storage on the device is actually occupied. The -s switch of nbackup will return the size of the database in database pages:

 # nbackup -s -l /dev/sda7
 77173

Don't confuse the result here with the block size of the device. The figure returned Ч 77173 Ч is the number of pages occupied by the database. Calculate the physical size (in bytes) as (number of pages * page size). If you are unsure of the page size, you can query it from the database header using gstat -h:

 # gstat -h /dev/sda7 
 Database "/dev/sda7"
 Database header page information:
     Flags 0
     Checksum 12345
     Generation 43
     Page size 4096 <ЧЧЧ
     ODS version 11.1
 . . . . . . .

Examples of `nbackup` usage with a raw device

1. A backup can be performed in a script, using the output from the -s switch directly. For example,

 # DbFile=/dev/sda7
 # DbSize='nbackup -L $DbFile -S' || exit 1
 # dd if=$DbFile ibs=4k count=$DbSize | # compress and record DVD
 # nbackup -N $DbFile

2. A physical backup using nbackup directly from the command line:

 # nbackup -B 0 /dev/sda7 /tmp/lvl.0

Further advice about raw devices

Although no other specific issues are known at this point about the use of raw device storage for databases, keep in mind that

the growth and potential growth of the database is less obvious to end-users than one that lives as a file within a file system. If control of the production system's environment is out of your direct reach, be certain to deploy adequate documentation for any monitoring that will be required!
the very Windows-knowledgeable might want to try out the concept of raw device storage on Windows systems. It has not been a project priority to explore how it might be achieved on that platform. However, if you think you know a way to do it, please feel welcome to test the idea in your Windows lab and report your observations - good or bad or indifferent - back to the firebird-devel list.

Tip: Maintain your raw devices in aliases.conf. That way, in the event of needing to reconfigure the storage hardware, there will be no need to alter any connection strings in your application code.

Remote interface improvements

V. Khorsun, D. Yemanov

Feature request CORE-971

The remote protocol has been slightly improved to perform better in slow networks. In order to achieve this, more advanced packets batching is now performed, along with some buffer transmission optimizations. In a real world test scenario, these changes showed about 50 per cent fewer API round trips, thus incurring about 40 per cent fewer TCP roundtrips.

In Firebird 2.1 the remote interface limits the packet size of the response to various isc_XXX_info calls to the real used length of the contained data, whereas before it sent the full specified buffer back to the client buffer, even if only 10 bytes were actually filled. Firebird 2.1 remote interface sends back only 10 bytes in this case.

Some of our users should see a benefit from the changes, especially two-tier clients accessing databases over the Internet.

The changes can be summarised as

a. Batched packets delivery. Requires both server and client of version v.2.1, enabled upon a successful protocol handshake. Delays sending packets of certain types which can be deferred for batched transfer with the next packet. (Allocate/deallocate statement operations come into this category, for example.)

b. Pre-fetching some pieces of information about a statement or request and caching them on the client side for (probable) following API calls. Implemented on the client side only, but relies partly on the benefits of reduced round trips described in (a).

It works with any server version, even possibly providing a small benefit for badly written client applications, although best performance is not to be expected if the client is communicating with a pre-v.2.1 server.

c. Reduced information responses from the engine (no trailing zeroes). As the implementation is server-side only, it requires a v.2.1 server and any client. Even old clients will work with Firebird 2.1 and see some benefit from the reduction of round trips, although the old remote interface, unlike the new, will still send back big packets for isc_dsql_prepare().

d. Another round-trip saver, termed "defer execute", whereby SELECT requests will be held at the point just before execution of the isc_dsql_execute until the next API call on the same statement. The benefit of the saved round-trip becomes most visible where there is a bunch of SELECT requests whose result set fits into one or two network packets.

This enhancement takes effect only if both client and server are v.2.1 or higher.

Note: A faintly possible side-effect is that, if isc_dsql_execute should happen to fail with a certain exception, this exception is returned to the client in the response to the API call that was actually responsible; i.e., instead of being returned by isc_dsql_execute it would be returned by isc_dsql_fetch, isc_dsql_info, or whichever API call actually dispatched the op_execute call.

In most cases, the side-effect would be transparent: it might show up in a case where some error occurred with default values for PSQL parameters or variables and would be noticed as an exception array where the exceptions were delivered in an unusual sequence.

The changes work with either TCP/IP or NetBEUI. They are backward-compatible, so existing client code will not be broken. However, when you are using a driver layer that implements its own interpretation of the remote protocol - such as the Jaybird JDBC and the FirebirdClient .NET drivers Ч your existing code will not enable the enhancements unless you use drivers that are updated.

API changes

`XSQLVAR`

A. dos Santos Fernandes

The identifier of the connection character set or, when the connection character set is NONE, the BLOB character set, is now passed in the XSQLVAR::sqlscale item of text BLOBs.

Optimization

Optimization for multiple index scans

V. Khorsun

Feature request CORE-1069

An optimization was done for index scanning when more than one index is to be scanned with AND conjunctions.

Optimize sparse bitmap operations

V. Khorsun

Feature request CORE-1070

Optimization was done for sparse bitmap operations (set, test and clear) when values are mostly consecutive.

Configuration and tuning

Increased Lock Manager limits & defaults

D. Yemanov

Feature requests CORE-958 and CORE-937

The maximum number of hash slots is raised from 2048 to 65,536. Because the actual setting should be a prime number, the exact supported maximum is 65,521 (the biggest prime number below 65,536). The minimum is 101.
The new default number of hash slots is 1009.
The default lock table size has been increased to 1 Mb on all platforms.

Page sizes of 1K and 2K deprecated

D. Yemanov

Feature request CORE-969

Page sizes of 1K and 2K are deprecated as inefficient.

Note: The small page restriction applies to new databases only. Old ones can be attached to regardless of their page size.

Enlarge disk allocation chunks

V. Khorsun

Feature request CORE-1229

Until v.2.1, Firebird had no special rules about allocating disk space for database file pages. Because of dependencies between pages that it maintains itself, to service its "careful write" strategy, it has just written to newly-allocated pages in indeterminate order.

For databases using ODS 11.1 and higher, Firebird servers from v.2.1 onward use a different algorithm for allocating disk space, to address two recognised problems associated with the existing approach:

1. Corruptions resulting from out-of-space conditions on disk

The indeterminate order of writes can give rise to a situation that, at a point where the page cache contains a large number of dirty pages and Firebird needs to request space for a new page in the process of writing them out, there is insufficient disk space to fulfil the request. Under such conditions it often happens that the administrator decides to shut down the database in order to make some more disk space available, causing the remaining dirty pages in the cache to be lost. This leads to serious corruptions.

2. File fragmentation

Allocating disk space in relatively small chunks can lead to significant fragmentation of the database file at file system level, impairing the performance of large scans, as during a backup, for example.

The solution

The solution is to introduce some rules and rationales to govern page writes according to the state of available disk space, as follows:

a. Each newly allocated page writes to disk immediately before returning to the engine. If the page cannot be written then the allocation does not happen: the PIP bit remains uncleared and the appropriate I/O error is raised. Corruption cannot arise, since it is guaranteed that all dirty pages in cache have disk space allocated and can be written safely.

Because this change adds an extra write for each newly-allocated page, some performance penalty is to be expected. To mitigate the effect, writes of newly-allocated pages are performed in batches of up to 128 Kb and Firebird keeps track of the number of these "initialized" pages in the PIP header.

Note: A page that has been allocated, released and re-allocated is already "space in hand", meaning that no further verification is required in order to "initialize" it. Hence, a newly allocated page is subjected to this double-write only if it is a block that has never been allocated before.

b. To address the issue of file fragmentation, Firebird now uses the appropriate call to the API of the file system to preallocate disk space in relatively large chunks.

Preallocation also gives room to avoid corruptions in the event of an "out of disk space" condition. Chances are that the database will have enough space preallocated to continue operating until the administrator can make some disk space available.

Important:

Windows only (for now)

Currently, only Windows file systems publish such API calls, which means that, for now, this aspect of the solution is supported only in the Windows builds of Firebird. However, similar facilities have recently been added to the Linux API, allowing the prospect that a suitable API function call will appear in such popular file systems as ext3 in future.

`DatabaseGrowthIncrement` configuration parameter

For better control of disk space preallocation, the new parameter DatabaseGrowthIncrement has been added to firebird.conf. It represents the upper limit for the preallocation chunk size in bytes.

Important: Please be sure to read the details regarding this configuration, under DatabaseGrowthIncrement in the chapter entitled New configuration parameters and changes.

Bypass file system caching on Superserver

V. Khorsun

Feature requests CORE-1381 and CORE-1480

Firebird uses and maintains its own cache in memory for page buffers. The operating system, in turn, may recache Firebird's cache in its own file system cache. If Firebird is configured to use a cache that is large relative to the available RAM and Forced Writes is on, this cache duplication drains resources for little or no benefit.

Often, when the operating system tries to cache a big file, it moves the Firebird page cache to the swap, causing intensive, unnecessary paging. In practice, if the Firebird page cache size for Superserver is set to more than 80 per cent of the available RAM, resource problems will be extreme.

Note: File system caching is of some benefit on file writes, but only if Forced Writes is OFF, which is not recommended for most conditions.

Now, Superserver on both Windows and POSIX can be configured by a new configuration parameter, MaxFileSystemCache, to prevent or enable file system caching. It may provide the benefit of freeing more memory for other operations such as sorting and, where there are multiple databases, reduce the demands made on host resources.

Note: For Classic, there is no escaping file system caching.

For details of the MaxFileSystemCache parameter, see MaxFileSystemCache.

Other global improvements

Garbage collector rationalisation

V. Khorsun

Feature request CORE-1071

The background garbage collector process was reading all back versions of records on a page, including those created by active transactions. Since back versions of active records cannot be considered for garbage collection, it was wasteful to read them.

Immediate release of external files

V. Khorsun

Feature request CORE-961

The engine will now release external table files as soon as they are no longer in use by user requests.

Synchronization of DSQL metadata cache objects in Classic server

A. dos Santos Fernandes

Feature request CORE-976

No details.

BLOB improvements

A. dos Santos Fernandes

Feature request CORE-1169

Conversion of temporary blobs to the destination blob type now occurs when materializing.

Type flag for stored procedures

D. Yemanov

Feature request CORE-779

Introduced a type flag for stored procedures, adding column RDB$PROCEDURE_TYPE to the table RDB$PROCEDURES. Possible values are:

`0` or `NULL`	Legacy procedure (no validation checks are performed).
`1`	Selectable procedure (one that contains a `SUSPEND` statement).
`2`	Executable procedure (no `SUSPEND` statement, cannot be selected from).

Help for getting core dumps on Linux

A. Peshkov

Feature request CORE-1558

The configuration parameter BugcheckAbort provides the capability to make the server stop trying to continue operation after a bugcheck and instead, to call abort() immediately and dump a core file. Since a bugcheck usually occurs as a result of a problem the server does not recognise, continuing operation with an unresolved problem is not usually possible anyway, and the core dump can provide useful debug information.

In the more recent Linux distributions the default setups no longer dump core automatically when an application crashes. Users often have trouble trying to get them working. Differing rules for Classic and Superserver, combined with a lack of consistency between the OS setup tools from distro to distro, make it difficult to help out with any useful "general rule".

Code has been added for Classic and Superserver on Linux to bypass these problems and automate generation of a core dump file when an abort() on BUGCHECK occurs. The Firebird server will make the required cwd (change working directory) to an appropriate writable location (/tmp) and set the core file size limit so that the 'soft' limit equals the 'hard' limit.

Note: In a release version, the automated core-dumping is active only when the BugcheckAbort parameter in firebird.conf is set to true (1). In a debug version, it is always active.

If you need to enable the facility, don't forget that the server needs to be restarted to activate a parameter change.

back to top of page
<< Changes to the Firebird API and ODS | Firebird 2.1.6 Release Notes | Data Definition Language (DDL) >>