DISA

Centre for Data Intensive Sciences and Applications

Final seminar before the licentiate thesis – Nemi Pelgrom

Postat den 28th October, 2025, 13:24 av Elin Gunnarsson

When? Thursday November 6, 10-12
Where? Onsite D1172 and via zoom
Registration: No registration needed – just come by

Abstract
Transcribing numbers and Receipts with Generative AI – Nemi Pelgrom
This dissertation investigates the usability of multi-modal language models (MMLMs) as transcription tools, with a focus on their reliability, limitations, and error mechanisms in document parsing tasks.

The work addresses four research questions across three studies. First, the potential of vision-capable generative models for extracting structured information from complex financial documents is evaluated using GPT-4. Tested on 1,000 digital invoices and 1,000 photographic receipts, the model achieved near-perfect accuracy, 99.8\% and 99.5\% respectively, with an additional API-based trial reaching 94.4\%. Second, the capacity of MMLMs to transcribe long numerical strings is explored, showing that GPT-4 and GPT-4o maintain 100\% accuracy up to 75 digits, after which performance drops sharply. Third, systematic error patterns are identified in transcription of random number sequences; mistakes consistently occur in the same positions across repeated runs, and hallucinated digits account for only 23\% of total errors, indicating biases and structured failure modes rather than noise. Lastly, a framework for categorisation of transcription errors is introduced, based on the analysis of 5,502 mistakes across GPT-4o and ARIA.

This reveals three mutually exclusive categories, and a detailed examination of ways to automatically distinguish between them, where the Ratcliff/Obershelp similarity was found to be highly useful. Together, these findings demonstrate that state-of-the-art MMLMs can already be deployed in production settings where accuracy and scalability are critical, while also providing systematic methods for diagnosing their weaknesses and guiding future model development.

Det här inlägget postades den October 28th, 2025, 13:24 och fylls under General

Comments are closed.