A Semantic Inference Engine for Tabular Data

By Gary Kennedy
November 25, 2025

Brokerage accounts provide trade history data in many different tabular formats – each with its own naming quirks, column conventions, ordering, and hidden assumptions. Yet behind this diversity lies a consistent set of trade attributes with inherent structural relationships: quantities, prices, consideration, fees, proceeds, dates, instruments, and identifiers. This is exactly where a semantic inference engine can shine.

Our trade history extraction applies this approach to tabular trade data, automatically interpreting each column’s meaning regardless of format. This allows heterogeneous CSV, Excel, or other tabular exports to be robustly transformed into a consistent, canonical trade ledger.

The Problem With “Known Formats”

Historically, trade ingestion pipelines relied on:

broker-specific schemas
rigid field mappings
manual configuration
brittle header matching
custom code for each institution

This approach is expensive and restrictive in modern environments where:

formats change without notice
users upload ad-hoc Excel sheets
new customers bring new brokers with unfamiliar reports
column names differ by region, platform, or even individual advisor

A schema-driven solution can struggle to keep up.

Moving From Structure to Semantics

Instead of assuming a known format, our approach observes the behaviour of each column and infers its real-world meaning.

This semantic method mirrors how humans understand a spreadsheet:

“This looks like a quantity because the header matches and the values are integers.”
“This looks like a price because the column name suggests it, and the values are decimal amounts that resemble plausible prices.”
“This column must be consideration because it matches quantity × price exactly.”
“This field must be settlement date because it always follows the other date by a few days.”
“This is an identifier — large values that clearly function as identifiers and are definitely not quantity, price, or cash amounts.”
“This amount is net proceeds — it closely tracks consideration, but differs by a small amount representing fees.”

These are not syntactic observations. They are semantic relationships rooted in the structure of trades.

Semantic Signals, Not Hard-Coded Schemas

Our engine analyses tabular trade data using a combination of:

1. Statistical Patterns

integer-like columns
decimal distributions
magnitude analysis
sign patterns
outlier detection

2. Contextual Clues

column naming hints (“qty”, “unitcost”, “settle”, etc.)
positional tendencies
typical broker export behaviours

3. Domain Relationships

gross amount = quantity × price
net amount = gross ± fees
two date columns → earlier = trade date
identifiers are large and stable

The system does not rely on any predefined layout.
It simply reasons about the data the way a domain expert would.

Detecting the True Header Row

An overlooked detail is the heading row. Broker reports often contain preamble lines: titles, disclaimers, empty rows, or introductory text. These lines are not part of the actual dataset, but they can easily confuse a traditional parser.

The semantic inference engine solves this by scoring each row to determine whether it looks like a header row, and verifying that the rows beneath it look like data. If not, the row is skipped and scoring continues.

This allows the engine to reliably locate the real table header—even when it is buried several lines into the file—before semantic analysis begins.

The Result: Robustness to the Unknown

Because the engine infers semantics instead of expecting structure, it can:

ingest trade files it has never seen before
tolerate renamed or reordered columns
recognise new patterns without reconfiguration
detect mis-labelled fields
handle price quoting differences (e.g., pence vs pounds)
reconcile inconsistencies
gracefully process incomplete data

It is format-agnostic. The engine adapts to the data — not the other way around.

Semantic inference is a powerful foundation for reliable, automated trade history processing.