Designing an Identifier Is Surprisingly Interesting

By Gary Kennedy
July 1, 2026

A transaction ID is a dull reference number that has to appear on a contract note, in an email, or in a support call. Who would have thought there was much to say about it? Well, there is! Babylon’s transaction ID is designed to be compact, human-facing, stable across corrections, independent of database sequences, and safe to read and type. An ID such as 7KMP-RT92-WNXY is short enough for real use because it uses an alphabet of letters and digits rather than plain decimal numbers: for the same number of characters, that gives us many more possibilities. But letters introduce an unexpected problem. Helpful four-character grouping can accidentally produce rude or awkward words, so care has to be taken not to issue those IDs.

Why not use the classic database number?

The classic answer is a number from a relational database: an auto-incrementing primary key, identity column, or sequence.

That works well inside a database, but it is a poor shape for a public transaction reference. A sequence is predictable:

10421
10422
10423

Anyone who sees a few references can infer rough transaction volumes and guess other IDs. A global sequence leaks volume across the product. A ledger-level sequence leaks volume within that ledger. That may not be catastrophic, but it is unnecessary information leakage for a financial system.

It also ties the creation of the public reference to a particular kind of persistence technology. Auto-incrementing identifiers are natural in a relational database, but we prefer to be as agnostic as possible about the data store.

So Babylon avoids the classic database number. The transaction ID is not a public wrapper around an internal row number; it is a deliberately designed transaction reference.

Compact without being sequential

The next problem is compactness.

A plain decimal number only has ten possibilities per character:

0 1 2 3 4 5 6 7 8 9

Using letters as well as digits gives a much larger number of possibilities for the same number of characters.

0 1 2 3 4 5 6 7 8 9 A B C D E F G H I J K L M N O P Q R S T U V W X Y Z

In mathematical parlance, we move from Base10 to Base36. This lets Babylon keep the transaction reference compact without falling back to a simple visible sequence.

Of particular interest is the ULID. The first characters represent time, and the remaining characters are random. That gives the whole identifier partial lexicographic ordering: IDs with earlier time components sort before IDs with later time components. This is an excellent feature, and we mimic it in a smaller, ledger-specific context.

Trying a temporal prefix

Let us try using the first few characters of the ID for a temporal representation, much like a ULID does.

With the natural uppercase alphanumeric alphabet, each character has 36 possible values, so four characters give:

36^4 = 1,679,616

possible values.

This is enough to represent the number of calendar days from an epoch, say 1970-01-01, but not enough for a full timestamp. This is fine: we do not need timestamp-level ordering in the public transaction reference. Partial ordering by transaction date is quite reasonable.

The first few values would be:

Date Word	Day Number	Date
`0000`	0	`1970-01-01`
`0001`	1	`1970-01-02`
`0002`	2	`1970-01-03`
`0003`	3	`1970-01-04`
`0004`	4	`1970-01-05`
`0005`	5	`1970-01-06`
…	…	…
`0FX7`	20,635	`2026-07-01`

We encode the value using the natural ASCII ordering of digits and uppercase letters, so IDs with earlier date words sort before those with later date words. This partial ordering is a very convenient feature.

Adding randomness

Having defined the first word, we still need to distinguish transactions that occur on the same date.

The question is: how many transactions might occur on a given date? We want a large enough domain that any randomly chosen suffix has a very low probability of colliding with another randomly chosen suffix.

Four random characters would give:

36^4 = 1,679,616

possible suffixes for a given date.

That will be enough for many ledgers, but eight random characters is safer:

36^8 = 2,821,109,907,456

possible suffixes for a given date.

We also like groups of four characters because they are easier for humans to read, remember briefly, and compare. That gives the shape:

DDDD-RRRR-RRRR

The first word is the date word. The final two words are random.

The word problem

This looks promising, but the alphabet creates a new problem.

The date word is deterministic. If a particular date maps to an awkward word, then every transaction on that date begins with that word.

With the natural uppercase alphanumeric alphabet, a word such as:

0ASS

is an actual possibility (day number 13,996; date 2008-04-27)!

Now the second and third words are purely random, so they are quite unlikely to produce rude or awkward words, but not impossible.

So we need to make the alphabet safer.

The first step is to remove vowels:

A E I O U

Without vowels, ordinary English word shapes mostly disappear. A word like:

FART

can no longer be generated because A is not available.

But digits can stand in for letters. This is often called leet spelling. For example, 4 can be read as A, so:

F4RT

is the leet equivalent of FART.

Thus, we must also remove the common vowel-like digits:

0  // O
1  // I
3  // E
4  // A

Removing vowels and their common digit substitutes breaks both ordinary words and their obvious leet equivalents.

However, some bad-looking words and abbreviations remain; they are built mostly from consonants. In practice, many of the remaining problem cases involve the S/5 pair.

The letter S is especially useful in short English fragments, acronyms, plurals, and abbreviations. The digit 5 also creates an obvious visual substitute for S. So if we leave either one in the alphabet, awkward three- and four-character sequences can still reappear in ordinary or leet-like form.

We therefore remove both sides of the pair:

S
5

The usable alphabet now becomes:

2 6 7 8 9 B C D F G H J K L M N P Q R T V W X Y Z

The alphabet is now 25 characters. That changes the size of the identifier space, but not enough to damage the design. The date word now has:

25^4 = 390,625

possible values.

That is still enough for more than one thousand years of calendar days from an epoch. ß The random suffix has:

25^8 = 152,587,890,625

possible values for a given date word. So the alphabet has become safer for humans, while the identifier still remains compact and has a very large random space.

Reserved words and folding

The alphabet does most of the work, but it is not the whole solution.

Even after removing vowels, vowel-like digits, and the S/5 pair, some awkward strings can remain. They may not be ordinary dictionary words. They may be acronyms, shortened forms, consonant skeletons, or strings that only look wrong when read quickly.

So we keep a small reserved word list.

A reserved word is a three- or four-character sequence that we choose not to issue inside a visible ID word. For each four-character word:

ABCD

We check:

ABC   // first three characters
BCD   // last three characters
ABCD  // full four-character word

The same rule applies to each visible word in the ID:

DDDD-RRRR-RRRR

We do not scan across hyphen boundaries. The ID is designed, displayed, and spoken as three four-character words.

The date word needs particular care because it is deterministic. If a date naturally maps to a reserved word, we do not issue that proposed word. We skip to the next safe date word instead. Unsafe date words are akin to identifier holidays: the natural encoding can land on them, but we skip over them to the next available.

The random words are less of a concern. They are protected mostly by probability; however, each proposed random word is still checked before it is issued. If a random word contains a reserved word, we reject that candidate and generate another.

We also apply a display-risk fold before checking the reserved word list. Some characters can be read as other characters in certain fonts, handwriting, or quick visual inspection:

8 -> B
6 -> G
2 -> Z
Q -> O
V -> U

The fold is only a safety check. It does not change the issued ID.

This means the reserved word list does not need to contain every visual variant of every awkward word. We can keep the list small, while still rejecting candidates that would look wrong after common visual substitutions.

Namespace

A Babylon transaction ID is not a naked global identifier. It lives inside a namespace.

Conceptually, the full reference is:

account owner + ledger name + transaction ID

That matters because the visible transaction ID only needs to be unique within the relevant owner-and-ledger namespace. It is not competing with every transaction ever created by Babylon.

This is one reason the ID can remain compact. The random suffix provides a very large space for each date word, and that space is scoped to a particular ledger namespace.

The namespace also reflects how people use the reference. A transaction ID printed on a report or mentioned in support is meaningful in the context of an account owner and a ledger. The short ID is the human-facing part; the namespace gives it its full identity.

What we end up with

After all of that, the identifier still has the same simple shape:

DDDD-RRRR-RRRR

The first word is derived from the transaction date. The final two words are random. The alphabet is smaller than the natural uppercase alphanumeric alphabet, but it is safer for human use.

The result is compact, roughly sortable by transaction date, independent of database sequences, and stable across corrections. It does not expose a simple transaction count, and it avoids issuing the most obvious awkward words.

The ID is still only an identifier. It is not a secret and it is not an access-control mechanism. Any system using it must still check that the caller is allowed to see or change the transaction it refers to.