{"id":704,"date":"2025-10-28T13:24:09","date_gmt":"2025-10-28T12:24:09","guid":{"rendered":"https:\/\/blogg.lnu.se\/disa\/?p=704"},"modified":"2025-10-28T13:24:09","modified_gmt":"2025-10-28T12:24:09","slug":"final-seminar-before-the-licentiate-thesis-nemi-pelgrom","status":"publish","type":"post","link":"https:\/\/blogg.lnu.se\/disa\/?p=704","title":{"rendered":"Final seminar before the licentiate thesis &#8211; Nemi Pelgrom"},"content":{"rendered":"\n<p><strong>When?<\/strong> Thursday November 6, 10-12<br><strong>Where?<\/strong> Onsite D1172 and via zoom<br><strong>Registration:<\/strong> No registration needed \u2013 just come by<br><br><strong>Abstract<\/strong><br><strong>Transcribing numbers and Receipts with Generative AI \u2013 <em>Nemi Pelgrom<\/em><\/strong><br>This dissertation investigates the usability of multi-modal language models (MMLMs) as transcription tools, with a focus on their reliability, limitations, and error mechanisms in document parsing tasks.<\/p>\n\n\n\n<p>The work addresses four research questions across three studies. First, the potential of vision-capable generative models for extracting structured information from complex financial documents is evaluated using GPT-4. Tested on 1,000 digital invoices and 1,000 photographic receipts, the model achieved near-perfect accuracy, 99.8\\% and 99.5\\% respectively, with an additional API-based trial reaching 94.4\\%. Second, the capacity of MMLMs to transcribe long numerical strings is explored, showing that GPT-4 and GPT-4o maintain 100\\% accuracy up to 75 digits, after which performance drops sharply. Third, systematic error patterns are identified in transcription of random number sequences; mistakes consistently occur in the same positions across repeated runs, and hallucinated digits account for only 23\\% of total errors, indicating biases and structured failure modes rather than noise. Lastly, a framework for categorisation of transcription errors is introduced, based on the analysis of 5,502 mistakes across GPT-4o and ARIA.<br><br>This reveals three mutually exclusive categories, and a detailed examination of ways to automatically distinguish between them, where the Ratcliff\/Obershelp similarity was found to be highly useful. Together, these findings demonstrate that state-of-the-art MMLMs can already be deployed in production settings where accuracy and scalability are critical, while also providing systematic methods for diagnosing their weaknesses and guiding future model development.<\/p>\n","protected":false},"excerpt":{"rendered":"<p>When? Thursday November 6, 10-12Where? Onsite D1172 and via zoomRegistration: No registration needed \u2013 just come by AbstractTranscribing numbers and Receipts with Generative AI \u2013 Nemi PelgromThis dissertation investigates the usability of multi-modal language models (MMLMs) as transcription tools, with a focus on their reliability, limitations, and error mechanisms in document parsing tasks. The work [&hellip;]<\/p>\n","protected":false},"author":19939,"featured_media":0,"comment_status":"closed","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"footnotes":""},"categories":[93],"tags":[],"class_list":["post-704","post","type-post","status-publish","format-standard","hentry","category-blogg"],"acf":[],"yoast_head":"<!-- This site is optimized with the Yoast SEO plugin v21.7 - https:\/\/yoast.com\/wordpress\/plugins\/seo\/ -->\r\n<title>Final seminar before the licentiate thesis - Nemi Pelgrom - DISA<\/title>\r\n<meta name=\"robots\" content=\"index, follow, max-snippet:-1, max-image-preview:large, max-video-preview:-1\" \/>\r\n<link rel=\"canonical\" href=\"https:\/\/blogg.lnu.se\/disa\/?p=704\" \/>\r\n<meta property=\"og:locale\" content=\"en_US\" \/>\r\n<meta property=\"og:type\" content=\"article\" \/>\r\n<meta property=\"og:title\" content=\"Final seminar before the licentiate thesis - Nemi Pelgrom - DISA\" \/>\r\n<meta property=\"og:description\" content=\"When? Thursday November 6, 10-12Where? Onsite D1172 and via zoomRegistration: No registration needed \u2013 just come by AbstractTranscribing numbers and Receipts with Generative AI \u2013 Nemi PelgromThis dissertation investigates the usability of multi-modal language models (MMLMs) as transcription tools, with a focus on their reliability, limitations, and error mechanisms in document parsing tasks. The work [&hellip;]\" \/>\r\n<meta property=\"og:url\" content=\"https:\/\/blogg.lnu.se\/disa\/?p=704\" \/>\r\n<meta property=\"og:site_name\" content=\"DISA\" \/>\r\n<meta property=\"article:published_time\" content=\"2025-10-28T12:24:09+00:00\" \/>\r\n<meta name=\"author\" content=\"Elin Gunnarsson\" \/>\r\n<meta name=\"twitter:card\" content=\"summary_large_image\" \/>\r\n<meta name=\"twitter:label1\" content=\"Written by\" \/>\n\t<meta name=\"twitter:data1\" content=\"Elin Gunnarsson\" \/>\n\t<meta name=\"twitter:label2\" content=\"Est. reading time\" \/>\n\t<meta name=\"twitter:data2\" content=\"2 minutes\" \/>\r\n<script type=\"application\/ld+json\" class=\"yoast-schema-graph\">{\"@context\":\"https:\/\/schema.org\",\"@graph\":[{\"@type\":\"WebPage\",\"@id\":\"https:\/\/blogg.lnu.se\/disa\/?p=704\",\"url\":\"https:\/\/blogg.lnu.se\/disa\/?p=704\",\"name\":\"Final seminar before the licentiate thesis - Nemi Pelgrom - DISA\",\"isPartOf\":{\"@id\":\"https:\/\/blogg.lnu.se\/disa\/#website\"},\"datePublished\":\"2025-10-28T12:24:09+00:00\",\"dateModified\":\"2025-10-28T12:24:09+00:00\",\"author\":{\"@id\":\"https:\/\/blogg.lnu.se\/disa\/#\/schema\/person\/efff0d86afb01dd6efe2bec4e44b3fe2\"},\"inLanguage\":\"en-US\",\"potentialAction\":[{\"@type\":\"ReadAction\",\"target\":[\"https:\/\/blogg.lnu.se\/disa\/?p=704\"]}]},{\"@type\":\"WebSite\",\"@id\":\"https:\/\/blogg.lnu.se\/disa\/#website\",\"url\":\"https:\/\/blogg.lnu.se\/disa\/\",\"name\":\"DISA\",\"description\":\"Centre for Data Intensive Sciences and Applications\",\"potentialAction\":[{\"@type\":\"SearchAction\",\"target\":{\"@type\":\"EntryPoint\",\"urlTemplate\":\"https:\/\/blogg.lnu.se\/disa\/?s={search_term_string}\"},\"query-input\":\"required name=search_term_string\"}],\"inLanguage\":\"en-US\"},{\"@type\":\"Person\",\"@id\":\"https:\/\/blogg.lnu.se\/disa\/#\/schema\/person\/efff0d86afb01dd6efe2bec4e44b3fe2\",\"name\":\"Elin Gunnarsson\",\"url\":\"https:\/\/blogg.lnu.se\/disa\/?author=19939\"}]}<\/script>\r\n<!-- \/ Yoast SEO plugin. -->","yoast_head_json":{"title":"Final seminar before the licentiate thesis - Nemi Pelgrom - DISA","robots":{"index":"index","follow":"follow","max-snippet":"max-snippet:-1","max-image-preview":"max-image-preview:large","max-video-preview":"max-video-preview:-1"},"canonical":"https:\/\/blogg.lnu.se\/disa\/?p=704","og_locale":"en_US","og_type":"article","og_title":"Final seminar before the licentiate thesis - Nemi Pelgrom - DISA","og_description":"When? Thursday November 6, 10-12Where? Onsite D1172 and via zoomRegistration: No registration needed \u2013 just come by AbstractTranscribing numbers and Receipts with Generative AI \u2013 Nemi PelgromThis dissertation investigates the usability of multi-modal language models (MMLMs) as transcription tools, with a focus on their reliability, limitations, and error mechanisms in document parsing tasks. The work [&hellip;]","og_url":"https:\/\/blogg.lnu.se\/disa\/?p=704","og_site_name":"DISA","article_published_time":"2025-10-28T12:24:09+00:00","author":"Elin Gunnarsson","twitter_card":"summary_large_image","twitter_misc":{"Written by":"Elin Gunnarsson","Est. reading time":"2 minutes"},"schema":{"@context":"https:\/\/schema.org","@graph":[{"@type":"WebPage","@id":"https:\/\/blogg.lnu.se\/disa\/?p=704","url":"https:\/\/blogg.lnu.se\/disa\/?p=704","name":"Final seminar before the licentiate thesis - Nemi Pelgrom - DISA","isPartOf":{"@id":"https:\/\/blogg.lnu.se\/disa\/#website"},"datePublished":"2025-10-28T12:24:09+00:00","dateModified":"2025-10-28T12:24:09+00:00","author":{"@id":"https:\/\/blogg.lnu.se\/disa\/#\/schema\/person\/efff0d86afb01dd6efe2bec4e44b3fe2"},"inLanguage":"en-US","potentialAction":[{"@type":"ReadAction","target":["https:\/\/blogg.lnu.se\/disa\/?p=704"]}]},{"@type":"WebSite","@id":"https:\/\/blogg.lnu.se\/disa\/#website","url":"https:\/\/blogg.lnu.se\/disa\/","name":"DISA","description":"Centre for Data Intensive Sciences and Applications","potentialAction":[{"@type":"SearchAction","target":{"@type":"EntryPoint","urlTemplate":"https:\/\/blogg.lnu.se\/disa\/?s={search_term_string}"},"query-input":"required name=search_term_string"}],"inLanguage":"en-US"},{"@type":"Person","@id":"https:\/\/blogg.lnu.se\/disa\/#\/schema\/person\/efff0d86afb01dd6efe2bec4e44b3fe2","name":"Elin Gunnarsson","url":"https:\/\/blogg.lnu.se\/disa\/?author=19939"}]}},"_links":{"self":[{"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=\/wp\/v2\/posts\/704","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=\/wp\/v2\/users\/19939"}],"replies":[{"embeddable":true,"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=704"}],"version-history":[{"count":1,"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=\/wp\/v2\/posts\/704\/revisions"}],"predecessor-version":[{"id":705,"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=\/wp\/v2\/posts\/704\/revisions\/705"}],"wp:attachment":[{"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=704"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=%2Fwp%2Fv2%2Fcategories&post=704"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/blogg.lnu.se\/disa\/index.php?rest_route=%2Fwp%2Fv2%2Ftags&post=704"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}