BenchmarkThe Bank Statement Extraction Benchmark
A real-world benchmark for bank-statement transaction extraction. holofin clears 98% of statements with zero errors. Frontier LLMs read almost every row, but on a minority of layouts they hand back rows that aren't on the page — unpredictably — so a model that's 90% accurate per row returns a fully-correct statement only ~75–80% of the time.
Holofin Engineering

Your Table Extractor Passed. The Numbers Didn't.
An auditor opens your extraction output for a balance sheet. The model reports 99.2% cell accuracy. Impressive. Then she totals the asset column by hand, the way auditors do, and it comes to a number that is off by one row. Assets no longer equal liabilities plus equity. The statement does not close.
Greg T

Document Fraud Detection: What a PDF Can't Hide
We used to think document fraud was a visual problem. Wrong fonts. Misaligned columns. A logo that felt slightly off. We built checks around what humans see, because what humans see is all we had.
Greg T

When Documents Fight Back
Page 1: Account summary, two columns. Page 15: Same account, three columns, different header names. Page 47: A scan with a coffee stain. Page 89: The totals page, which references transactions you extracted 70 pages ago.
Greg T

The Invisible Audit Trail
An auditor opens your export file, finds a closing balance of €47,500, and pulls up the source PDF. Page 3, bottom-right corner: €47,000. Different number. "Where does the difference come from? Who changed it?"
Greg T

HoloRecall: Show, Don't Tell
There's a moment in every classification project where you watch the model confidently get something wrong. Not a hard case. Not an ambiguous edge. Something a human would solve in half a second without thinking.
Greg T

Your LLM Isn't a Document Pipeline
There's a moment in every AI project where the demo looks so good that your brain quietly starts deleting code. You watch a model "read" a bank statement and think: this is it. We can skip OCR. We can skip layout parsing. Maybe we can skip half the pipeline. In the movie version, someone presses Enter and JSON waterfalls out of the cloud.
Greg T

PDFs Are For People, Not For Data
We love PDFs. They look the same on every device, they print beautifully at any size, and they’re the closest thing we have to digital paper. But every time someone on our team says "let’s just extract the data from the PDF," we feel an ancient PostScript daemon wake up and whisper: “I was born to paint pixels, not to structure your rows.”
Greg T