Using generators to create unique row IDs

Why row IDs at all?

The answer to this question would go far beyond the scope of this article. If you see no need to have a generic, unique "handle" for every row inside a table, or don't like the idea of "meaningless" or "surrogate" keys in general, you should probably skip this section...

One for all or one for each?

OK, so you want row IDs. { author's note: congratulations! :-) }

A major, basic decision to take is whether we'll use one single generator for all the tables, or one generator for each table. This is up to you – but take the following considerations into account.

With the one for all approach, you:

+ need only a single generator for all your IDs;
+ have one integer number that does not only identify your row within its table, but within the entire database;
- have less possible ID values per table (this shouldn't really be a problem with 64bit generators...);
- will soon have to deal with large ID values even in e.g. lookup tables with only a handful of records;
- will likely see gaps in a per-table ID sequence, since generator values are spread throughout all tables.

With the one for each approach you:

- have to create a generator for every single "ID'd" table in your database;
- always need the combination of ID and table name to uniquely identify any row in any table;
+ have a simple and robust "insert counter" per table;
+ have a chronological sequence per table: if you find a gap in the ID sequence of a table, then it's caused either by a DELETE or by a failed INSERT.

Can you re-use generator values?

Well – yes, technically you can. But – NO, you shouldn't. Never. Never ever. Not only that this would destroy the nice chronological sequence (you can't judge a row's "age" by just looking at the ID any more), the more you think about it the more headaches it'll give you. Moreover it is an absolute contradiction to the entire concept of unique row identifiers.

So unless you have good reasons to re-use generator values, and a well-thought-out mechanism to make this work safely in multi-user/multi-transaction environments, JUST DON'T DO IT!

Generators for IDs or auto-increment fields

Giving a newly inserted record an ID (in the sense of a unique "serial number") is easily done with generators and BEFORE INSERT triggers, as we'll see in the following subsections. We start with the assumption that we have a table called TTEST with a column ID declared as Integer. Our generator's name is GIDTEST.

`BEFORE INSERT` trigger, version 1

 CREATE TRIGGER trgTTEST_BI_V1 for TTEST
 active before insert position 0
 as
 begin
   new.id = gen_id( gidTest, 1 );
 end

Problems with trigger version 1:

This one does the job all right – but it also "wastes" a generator value in cases where there is already an ID supplied in the INSERT statement. So it would be more efficient to only assign a value when there was none in the INSERT:

Before Insert trigger, version 2

 CREATE TRIGGER trgTTEST_BI_V2 for TTEST
 active before insert position 0
 as
 begin
   if (new.id is null) then
   begin
     new.id = gen_id( gidTest, 1 );
   end
 end

Problems with trigger version 2:

Some access components have the "bad habit" to auto-fill all the columns in an INSERT. Those not explicitly set by you get default values – usually 0 for integer columns. In that case, the above trigger would not work: it would find that the ID column does not have the state NULL, but the value 0, so it would not generate a new ID. You could post the record, though – but only one... the second one would fail. It is anyway a good idea to "ban" 0 as a normal ID value, to avoid any confusion with NULL and 0. You could e.g. use a special row with an ID of 0 to store a default record in each table.

Before Insert trigger, version 3

 CREATE TRIGGER trgTTEST_BI_V3 for TTEST
 active before insert position 0
 as
 begin
   if ((new.id is null) or (new.id = 0)) then
   begin
     new.id = gen_id( gidTest, 1 );
   end
 end

Well, now that we have a robust, working ID trigger, the following paragraphs will explain to you why mostly you won't need it at all:

The basic problem with IDs assigned in BEFORE INSERT triggers is that they are generated on the server side, after you send the Insert statement from the client. This plainly means there is no safe way to know from the client side which ID was generated for the row you just inserted.

You could grab the generator value from the client side after the INSERT, but in multi-user environments you cannot be really sure that what you get is your own row's ID (because of the transaction issue).

But if you get a new value from the generator before, and post the Insert with that value, you can simply fetch the row back with a Select ... where ID=<genvalue> to see what defaults were applied or whether columns were affected by INSERT triggers. This works especially well because you usually have a unique primary key index on the ID column, and those are about the fastest indexes you can have – they're unbeatable in selectivity, and mostly smaller than indexes on CHAR(n) colums (for n>8@@, depending on character set and collation).

The bottom line to this is:

You should create a BEFORE INSERT trigger to make absolutely sure every row gets a unique ID, even if no ID value was supplied from the client side in the INSERT statement.

If you have an "SQL-closed" database (that is, your own application code is the only source for newly inserted records), then you can leave out the trigger, but then you should always obtain a new generator value from the database before issuing the INSERT statement and include it there. The same, of course, goes for inserts from within triggers and stored procedures.