The Firebird server and VARCHARs

<< Selecting the right datatype to improve database performance | Database technology articles | Record versions as an undo log >>

The Firebird server and VARCHARs

By Holger Klemt
IBExpert Roadshow, Prag October 2008

To illustrate how Firebird stores VARCHAR fields, we will first create two simple tables, test1 and test2:

 create table test1
 (id bigint not null primary key, 
  txt varchar(40)) 

 create table test2
 (id bigint not null primary key, 
  txt varchar(32000)) 

The only difference between these is the size of the varchar field. The following data is then inserted into both tables:

 id	txt
 1	DOG
 2	CAT

After committing and disconnecting/reconnecting to the database, take a look at the Database Statistics:

Both tables take up 1 page of space. However table test2 has a 39% page fill, compared to test1 with 2% and, although both tables contain the same three records, test1 records have an average length of 16.67 bytes and test2 records an average of 514.67 bytes.

This is because the two varchar fields are stored in the database differently. If you take a look at the hex representation you will immediately notice the difference:



This is because large varchar columns are not stored inside the database in the same way as zero-defined strings in C programming, but are stored with a kind of compression. The word MOUSE is represented by 4D 4F 55 53 45, followed by 80 00 (64 * nothing) repeatedly until all 32000 characters are stored as "empty". The difference between a CHAR and a VARCHAR column in the storage is that with the char the unused characters are stored as a space and with a varchar they are stored as a binary zero.

The problem with the above example is that the txt field in the test2 table still requires 515 bytes of storage, even when only 5 bytes are actually used. Test1's txt field in comparison uses only 17 bytes. So avoid using too large char or varchar columns! If you really need the room to store a large number of characters in a single field, then use a blob datatype, and not a varchar, as a blob requires only 8 bytes as an internal pointer, (and it still requires these 8 bytes, even if it's empty). When you store, for example, 100 bytes in a blob column, typically the Firebird server uses 100 bytes to store it, plus 3 or 4 bytes overhead to mark the end of the blob.

Here you can directly see for example, when we look at the test tables, and generate some test data for test2 (refer to Test Data Generator), for example 10,000 further records, all 4 letters (A-Z) long:

and copy this data into test1:

 insert into test1 
 select * from test2 where id>3

or using the IBExpert Table Data Comparer.

If you refer again to the Database Statistics you will see how large the memory difference now is:

Whilst test1 is stored on 143 pages (585,728 bytes), test2 already uses 1,429 pages (5,853,184 bytes). And this is with only 10,000 data records. And if you then test performance:

 select * from test2 order by id

works quickly because the Optimizer uses the primary key index, but:

 select * from test2 order by id desc

takes a while, as there is no descending index for the Optimizer to use so it has to perform a temporary sort. So depending upon how much cache you have specified in the firebird.conf (TempBlockSize (in earlier Firebird versions, SortMem) and TempCacheLimit), the Optimizer will probably not only have to load all 1,429 pages into the cache, but probably use the hard drive as well. The sort file reaches over 100 MB, as the Optimizer has created a file which includes all the data from test2, including all the empty spaces in the txt field, as it cannot know whether the empty spaces are required or not. The internal ordering is performed on the complete column width for all records.

So think and plan your varchar size very carefully!

If you have queries that are running too slowly check whether indices can help. You will also speed up performance if you just select the columns that you really need.

If you have enough memory raise the TempBlock and Cache limits. The major problem would be the temporary sort file. Temporary files are really an area where you can speed up your environment.

If the Firebird server is set up as an application, the temp directory is the local user temp directory. When it is installed as a service, it will be Windows\Temp as default.

In spite of all these and any other tips for performance optimization, the best performance is achieved through an intelligent data structure!

This compression of varchar data in the database can also be helpful, for example with datasets containing repetitive characters, e.g. KLEMMMMMMMMMMT.

The Firebird server stores all data except blobs compressed in this way on the hard disk; blobs are stored as they are.

See also:
Data types
Database Statistics
Using the IBExpert Database Statistics

back to top of page
<< Selecting the right datatype to improve database performance | Database technology articles | Record versions as an undo log >>