Developing the segment ledger quick entry

By Gary Kennedy
November 18, 2025

TL;DR: We originally built a phrase-based parser for segment ledger quick entry, but it was too strict.
Refactoring led us to anchor on the verb in the sentence (“buy”, “sell”), allowing us to identify the Subject–Verb–Object (SVO) structure.
This made the parser far more flexible and capable of handling natural, messy, human descriptions — including tricky cases like “GBP”, which can be both a currency and a listing symbol!

Segment ledger quick entry is a process to extract structured data — a segment ledger entry — from unstructured data, namely a sentence typed by a human or taken from a contract note. We’ve built this capability into our API endpoint:

POST https://api.babylon.app/v1/extract/segments/from-quick-entry/

allowing apps to turn a short typed message into a complete segment ledger entry automatically.

“Quick” recognises that it can be faster and more convenient to type a short description of the trade rather than fill in cells in a spreadsheet or an application form in a strict format. The shorter the description, the faster the entry — so it feels more like typing a chat message than composing a formal sentence.

Recall that a segment ledger entry has the following structure:

SegmentLedgerAccountAliasTypeQuantitySymbolNetAmountCurrencySettleDateBourse
MyUKLedgerAJBell-ISABuy53IUSA2500.00GBP2024-01-05LSE

How might one enter this data quickly? We could type:

bought 53 IUSA for 2500

That’s about as quick as we might be able to make the entry, hoping the system can infer the currency and bourse (and it can!).

If we want to be explicit we could write:

At LSE, we bought 53 IUSA for 2500 GBP, settling on 2024-01-05

We might also find a description like this in a contract note.

When we study reasonable examples of how one might describe a trade quickly, we can identify three or four key phrases:

  • [Buy|Sell] [Quantity] [Symbol]
  • for [Amount] [Currency]
  • at [Bourse]
  • settle [SettlementDate]

It turns out these phrases can be written in almost any order and they will still more or less make sense:

  • For 2500 GBP, we bought 53 IUSA, settling 2024-01-05
  • Bought 53 IUSA at LSE, settling 2024-01-05
  • Settling 2024-01-05 at LSE, we bought 53 IUSA for 2500 GBP
    etc.

Version 1: a strict phrase-grammar parser

Being very familiar with tokenisers, grammars and parsers, we first designed a parser that explicitly looked for these phrases and slotted what it found into the segment ledger fields. That gave us a fairly flexible quick entry parser as long as the user stuck, more or less, to our phrase grammar.

But of course, humans typing freely in a description don’t necessarily want to learn and remember “the Babylon way” to enter a quick segment ledger entry. Our phrase parser was still quite strict. If you got the words in the “wrong” order, or wrote something slightly different, the parse could fail.

Version 2: loosening the grammar

To cope with more variety, we loosened it into what we called a probabilistic parser: instead of requiring exact phrase patterns, we:

  • relaxed the grammar rules,
  • allowed more variation in word order and extra words,
  • and scored the phrases it thought it had found.

This worked better, but it was still fundamentally built around phrase patterns we had designed up front.

Version 3: leaning into SVO

Having studied some Natural Language Processing more recently, we realised we had accidentally been refactoring our original parser toward something that looks for Subject–Verb–Object (SVO) structure.

That turned out to be the crucial step.

Most of the fields can be inferred simply by classifying the individual tokens:

  • numbers that look like quantities,
  • numbers that look like monetary amounts,
  • currency codes,
  • symbols that look like instruments,
  • dates that look like settlement dates, etc.

However, the verb — typically “buy” or “sell” — is what disambiguates which number is the quantity and which is the amount. It also resolves a subtle token-classification problem: some tokens can plausibly belong to more than one category. For example, GBP is usually a currency, but on the London Stock Exchange it was also a listing code for a specific share listing (as we’ve written about before).

Once we anchored on the verb — the action — the rest of the sentence fell into place.
If “GBP” appears within the verb’s object (buy 10 GBP), it is likely acting as a symbol.
If it appears in a price phrase (for 2500 GBP), it is acting as the currency.

By anchoring on the verb, we no longer need to enforce strict phrase grammar rules. Instead, we can:

  • identify the verb (bought, sold),
  • identify the object of that verb (53 IUSA),
  • treat the numbers attached to that object as quantity,
  • treat the token attached to that object as the symbol,
  • and treat the remaining numeric phrase (for 2500 GBP) as the monetary amount.

In other words, once we anchored on the verb, the parser was no longer chained to our original hand-crafted phrase patterns. It became more tolerant of natural variations in how people actually describe trades — while still producing a clean, structured segment ledger entry at the end.