DISA

Centre for Data Intensive Sciences and Applications

Welcome to our Higher Research Seminar in March

2026-03-02

When? March 20, 14.00-16.00
Where? Onsite:D2272 and via zoom
Registration: Please sign up for the seminar via this link https://forms.gle/qicTg53xmxQJtkBa7 by March 18. This is especially important if you plan to attend onsite so we can make sure there is fika for everyone.

Agenda
14.00-14.10 Welcome and practical information
14.10-14.55 Transparent Adverse Drug Event Detection in Swedish Clinical Text: Challenges in Building Information Extraction Pipelines for a Low-Resource Language – Elizaveta Kopacheva 
14.55 – 15.05 Coffee break
15.05 – 15.50 TWIN4DEM: Strengthening democratic resilience through digital twins – Giangiacomo Bravo 
15.50 -16.00 Sum up and plan for our upcoming seminars


Abstracts

Transparent Adverse Drug Event Detection in Swedish Clinical Text: Challenges in Building Information Extraction Pipelines for a Low-Resource Language – Elizaveta Kopacheva  
Automatic detection of adverse drug events (ADEs) in clinical texts is an important task for pharmacovigilance and patient safety. For clinical decision support, transparency is essential: a useful system should not only classify whether a document mentions an ADE but also highlight the supporting evidence in the text. Achieving this typically requires a pipeline combining named entity recognition (NER), relation extraction (RE), and document classification. However, most prior work studies these components in isolation and primarily focuses on high-resource languages such as English and Chinese. 

This presentation discusses the challenges of developing an end-to-end ADE detection pipeline for Swedish clinical text, a low-resource language. I will discuss how errors propagate through the NER–RE pipeline and affect overall performance. Particular attention is given to the complexity of Swedish clinical NER, including discontinuous entities, partial-word spans, and tokenization mismatches—especially when models rely on English-based tokenizers. I will present a comparison of multilingual and Swedish-specific pretrained models (including a clinically tuned model), as well as encoder-only and encoder–decoder architectures, and discuss their usability for transparent ADE detection in free text. I will also highlight remaining challenges in evaluation. The talk aims to provide practical insights into why building reliable and interpretable ADE detection systems for Swedish clinical text remains difficult and what considerations are important for future work. 

 
TWIN4DEM: Strengthening democratic resilience through digital twins – Giangiacomo Bravo 
Democracy research struggles to explain why democracies backslide and predict which countries are more vulnerable to erosion of the rule of law. TWIN4DEM, a Horizon Europe project, aims to address this issue by creating a digital twin (DT) of political systems. 

DTs are data-intensive simulation models designed as virtual copies of real-world systems. TWIN4DEM focuses on detecting vulnerabilities in democratic systems due to executive aggrandizement and advising policymakers on preventive measures. The current version of the DT is conceptual and focuses on the decision-making process of agents. It is based on synthetic data and represents a “generic” model to be used ad base to implement country-specific ones. The model includes three core groups of agents: (a) members of government that initiate executive aggrandizement; (b) members of parliament; and (c) members of constitutional or administrative courts. Such agents interact with two main types of influencers who shape their behavior: citizens (including context-specific interest groups) and EU institutions. 

The final goal of the project goal is to implement four specific country cases — Czech Republic, France, Hungary, and The Netherlands — reflecting the specificity of the different political systems and informed by local data. The first of these cases (Hungary) is planned to be developed during the spring and some preliminary results will be shared during the seminar. 

Welcome to our PhD-seminar in March

2026-02-27

When? March 13, 14.00-16.00
Where? Onsite: D2272 and via zoom
Registration: Please sign up for the seminar via this link https://forms.gle/EbRUASvY9c73kMNz6 by March 11. This is especially important if you plan to attend onsite so we can make sure there is fika for everyone

Agenda
14.00-14.10 Welcome and practical information
14.10-14.55 Palme archives as a chatbot – Tibo Bruneel
14.55 – 15.05 Coffee break
15.05 – 15.50 Gradient Tree Boosting for Regression Transfer – Dag Björnberg
15.50 -16.00 Sum up and plan for our upcoming seminars

Abstracts

Palme archives as a chatbot – Tibo Bruneel  
For nearly four decades, the assassination of Swedish Prime Minister Olof Palme has remained a complex and heavily disputed case. Although the investigation officially closed in 2020, the search for answers continues within a monumental digital footprint. This dataset is a chaotic, unstructured web of typed police reports, handwritten notes, maps, and images. 

How can we potentially find previously unknown clues buried within decades of scattered data? 
This presentation introduces “PalmeNet-Chat,” an LLM-powered investigative tool developed by Softwerk AB in collaboration with the true-crime podcast Spår. We will detail the technical challenge of processing this dense archive. By engineering an on-premise OCR pipeline utilising Vision Language Models, we transformed raw history into a structured, searchable library. We will then explore how we implemented Retrieval Augmented Generation (RAG) and vector databases to build a system capable of semantic search and contextual reasoning across the entire case file. Finally, we will offer a glimpse into the next phase of our project, showcasing how we are taking this investigative tool to an entirely new level. 

Gradient Tree Boosting for Regression Transfer – Dag Björnberg  
Many real-world modeling problems are hindered by limited data availability. In such cases, transfer learning leverages related source domains to improve predictions in a target domain of interest. We extend the classical gradient tree boosting paradigm to a regression transfer algorithm by modeling the weak learner as a sum of two regression trees. The trees are fitted on source data and target data, respectively, and jointly optimized for the target data. We derive optimal coefficients for the model update under the least-squares, the least-absolute deviation, and the Huber loss functions. We benchmark our approach against the widely used XGBoost algorithm in several transfer scenarios, achieving superior performance in seven out of eight cases. 

Welcome to our Higher Research Seminar in February

2026-02-02

Where? Onsite: D2272 and via zoom
Registration: Please sign up for the seminar via this link https://forms.gle/7LK5jZfjVwvYAf4L8 by February 18. This is especially important if you plan to attend onsite so we can make sure there is fika for everyone.

Abstracts
Accelerate ML: Overlap of computation and collective communication in multi-GPU systems – Minyu Cui 
The rapid growth of large-scale machine learning (ML) has made distributed training across multiple GPUs a fundamental building block of modern ML systems. As model sizes continue to increase and computational throughput improves, communication overhead has emerged as one of the dominant performance bottlenecks in multi-GPU computing paradigms. Conventional training pipelines in multi-GPU systems perform computation and communication sequentially, which leads to idle compute resources, limited scalability, and inefficient hardware utilization. 

In my research plan, I aim to accelerate multi-GPU ML by overlapping the two dominant operations: computation (such as GEMM) and collective communication. I will explore two complementary and efficient directions. First, my research will explore overlapping computation and communication kernels to hide communication latency. Second, it will further investigate fusing computation and communication into a single GPU kernel to enable efficient fine-grained overlap. These efforts will initially focus on improving operator-level performance and will subsequently be extended to enhance end-to-end training performance. 

I used to love Python… – Morgan Ericsson  
Some 15 years ago, when I did a lot of NLP, I learned Python 2, because it was the language that made the most sense (that was not Perl). I found it to be a beautiful language that made it fast and easy to translate thoughts into code. The rich ecosystem often turned coding into “gluing” things together, and since the things were often written in, e.g., C, it was fast enough. These days, I find it a lot more frustrating. The things were always silos, but a few years ago, I never found it to be a problem. These days, you are stuck with things that might play well together, if the authors were aware and took the time to integrate. If you are lucky, the things will support the platform (cpu, cuda, mps, …) that you are running on, but if not, well, then it’s not so much fun. You are also hitting all kinds of performance issues and bugs in the various things and gaps between them. So, for some work, I don’t like Python very much these days. My talk will rant about the problem but also try to find a way forward, looking at helpful tools for today and ideal solutions for tomorrow… 

Welcome to our PhD-seminar in February

Where? Onsite: D2272 and via zoom
Registration: Please sign up for the seminar via this link https://forms.gle/QB5oiSWMpVofjBHY6 by February 11. This is especially important if you plan to attend onsite so we can make sure there is fika for everyone.

Abstracts
Novelty Detection Using Time-Series Transient Data from the Lead-Acid Forklift Batteries – Zijie Feng  
In industrial applications, monitoring the battery health of electrically powered machinery and detecting abnormal operating conditions is a persistent yet critical challenge. Traditionally, anomaly detection systems are developed reactively: abnormalities are identified only after they occur, often leading to operational disruptions and economic losses. Novelty detection offers an alternative by learning normal behavior and detecting previously unseen abnormalities.  

In this work, we compare a diverse set of data-driven novelty detection methods using simulated time-series transient data derived from real lead-acid forklift battery measurements, aiming to identify suitable solutions for different types of anomalies. 

Potential of Graph Neural Networks for Software Architecture Recovery– Rakshanda Jabeen  
Software architecture recovery (SAR) aims to uncover a system’s modular structure directly from source code, supporting comprehension and maintenance when documentation is missing or outdated. In this work, we investigate the potential of graph neural networks (GNNs) and unsupervised learning for SAR by modeling software systems as heterogeneous, multi-relational graphs. Nodes represent software entities (e.g., files or classes) and typed edges capture structural and functional dependencies such as calls, imports, inheritance, and other code-level relations. To complement dependency structures with meaning, we integrate semantic signals from source code identifiers and related textual artifacts via contextual code embeddings (e.g., Word2Vec), yielding representations that capture both what entities do and how they interact. 

We study heterogeneous GNN encoders that aggregate information across relation types, including heterogeneous graph convolution and heterogeneous attention mechanisms, to analyze the trade-off between fixed normalization and adaptive neighbor weighting in software graphs. On top of these encoders, we explore two unsupervised training paradigms: (i) graph autoencoding, where embeddings are learned by reconstructing observed dependency relations, and (ii) contrastive representation learning inspired by Deep Graph Infomax, which maximizes agreement between embeddings from the original graph and perturbed views. The resulting entity embeddings are clustered to recover candidate architectural modules. Preliminary results across multiple open-source systems indicate that combining semantic cues with structural and functional dependencies produces more meaningful module separation than using structure alone, demonstrating that modern graph representation learning is a promising direction for robust, automated SAR beyond heuristic baselines. 

Welcome to our PhD-seminar in January

2025-12-23

When? Friday January 23, 14-16
Where? Onsite: D2272 at Linnaeus University in Växjö and online
Registration: Please sign up for the PhD-seminar via this link https://forms.gle/y8hxAvYLzJQHGdR48 by January 21 (especially important if you plan on attending onsite so we have fika for everyone)

Abstracts

Digital product passports: A value perspective
– Timmy Öberg
This presentation provides insights into an ongoing systematic mapping review of digital product passports (DPPs) from a value and ecosystem perspective. Based on existing and emerging classification schemes, the presentation maps the DPP research landscape and discusses current trends. The study explores where current DPP research is focused, including publication venues, research methods, contribution types, and industry sectors. It further analyzes the value dimensions the literature address, including value outcomes, value creation capabilities, and the value chain actors considered.

Designing and Evaluating Socio-Ecological Value Scorecards for Sustainable Smart Agriculture
– Saeed Niksaz
Sustainable farming increasingly uses digital tools such as sensors and artificial intelligence; however, many current evaluation methods do not fully consider the social and governance aspects of agriculture. This seminar brings together results from a systematic literature review and a conceptual model to explore how socio-ecological value scorecards can support sustainable smart farming.

The socio-ecological value scorecard is introduced as an easy-to-use dashboard that combines environmental, social, and economic measures to help farmers and other stakeholders make better decisions in farming systems. The seminar shows the value of scorecards in connecting digital innovation with sustainability goals.

Welcome to the Higher Research Seminar in December

2025-11-24

When? Friday December 5, 14-16
Where? Onsite: D0073 and via zoom

Agenda
14.00-14.10 Welcome and practical information
14.10-14.55 Presentation and discussion: 
A Comparative Evaluation of AI-Generated and Human-Written Alt Text for Image Accessibility – Mexhid Ferati
14.55 – 15.05 Coffee break
15.05 – 15.50 Presentation and discussion: 
Leveraging AI to predict review helpfulness and automate assessment in higher education – Zenun Kastrati

15.50 -16.00 Sum up and plan for our upcoming seminars

Abstracts

A Comparative Evaluation of AI-Generated and Human-Written Alt Text for Image Accessibility – Mexhid Ferati
This study investigates the comparative quality of AI-generated and human-written alternative text (alt text) for images, with the goal of understanding their respective strengths, limitations, and potential for supporting digital accessibility. The study was motivated by the growing use of generative AI tools, such as ChatGPT, in content creation, and the need to evaluate their effectiveness in producing accessible image descriptions.

The study follows an experimental design assessing 15 images across five thematic categories: People, Animals, Scenery, Food, Objects. The assessment is conducted for appropriateness in five evaluation criteria: accuracy, conciseness, fluency, comprehensibility, and relevance. The compared text for each image includes an alt text manually written by a professional content creator and generated by ChatGPT using standardized prompts. A pilot survey with eleven participants tested the clarity and functionality of the evaluation process before a final survey collected data from 101 participants in Sweden, who rated pairs of AI-generated and human-written descriptions across the five evaluation parameters.


Leveraging AI to predict review helpfulness and automate assessment in higher education – Zenun Kastrati
This presentation introduces two AI-driven approaches aimed at improving teaching and assessment in higher education. The first approach focuses on predicting the helpfulness of student reviews in online course using a deep learning framework that combines textual features with course metadata and student satisfaction. The second explores the application of large language models (LLMs) for automated scoring and feedback generation. Together, these approaches demonstrate how AI can enhance feedback loops and assessment scalability, while also addressing challenges related to interpretability, rubric clarity, and task subjectivity.

Welcome to Manoranjan Kumar’s PhD 90% seminar

2025-11-20

You are welcome to attend Manoranjan Kumar’s final seminar (doctoral 90% control). The seminar will include a presentation and a follow-up discussion with Mirka Kans, Associate Professor at the Department of Mechanical Engineering.

When: Wednesday, December 17, 9:00–12:00
Where: Room D1167 and on Zoom

Title: Digital Twins for Construction Equipment

Abstract: The technology organization at Volvo Construction Equipment (VCE) aims to predict and verify the performance of machines like Wheel loaders (WL) and Articulated haulers (AH) to enhance product development, requirements engineering, and customer service etc. Hence, it is needed to virtualize machines and components using the existing sensors on the machines and infrastructure of the central server. Over the years, virtualization has been achieved through the use of digital twins (DTs) across different industries, but realizing it on dynamically complex machines has its own challenges. It is also an important step in this digital transformation journey.

What is the specific research question to be answered?

This PhD thesis describes and investigates how the digital twin (DT) needs to be developed for machines like AH, and more specifically for WL. Further, a variety of actions are needed to incorporate into the framework of the DT. The framework needs to support different machines and their predictive journey, which can be different based on their usage and where it is being used. 

What are the means and methods used by the authors to answer the stated question?

This DT is the virtual replica of the physical machines that feed the twins (simulation model) with data from sensors and edge-based algorithms. The algorithms are built using a machine learning (ML) model. The algorithms that are implemented into machines are often called machine logs or virtual sensors. Further, a high-fidelity simulation supports the different force-driven maneuvers of different machine operators. A new co-simulation framework has been developed that integrates the operators’ model of the wheel loader (WL) and its interaction with the power source model, i.e., the drive train, the hydraulics, and the material. By using the simulations and physical machine data, visualizations are built to illustrate the results, which support various departments in providing customers with predictive services.

What is the answer to the research question?

The edge-based virtual sensors align well with their accuracy in predicting different failures in the machines. Furthermore, the results show that the co-simulation model aligns well with measurement data, validating the model’s accuracy in different types of machine operator driving. The integration of virtual sensors, machine logs, simulation, and results visualization paves the way for a successful DT of the machines.

Why is the answer important and for whom specifically?

The results are useful for engineers in product development, sales, and the aftermarket to create services and develop the machines for future generations.

How does the answer inspire future research?

The successful validation of the framework also paves the way for future research to enhance the virtual simulation techniques for WL and AH performances with different types of machine operators. It also paves the way and inspires to improve ML algorithms on the edge and, therefore, create services under the shadow of DTs. 

Welcome to the Higher Research Seminar in November

2025-10-31

When? Friday November 14, 14-16
Where? Onsite: D2273 and via zoom

Agenda
14.00-14.10 Welcome and practical information
14.10-14.55 Presentation and discussion: 
Artificial ‘ulama – Analyzing AI-Generated Islamic Theology – Jonas Svensson 

14.55 – 15.05 Coffee break
15.05 – 15.50 Presentation and discussion:  
Socio-Technical Considerations on Inter- and Intra-Organizational Sustainability Data Sharing – Anna Sell  

15.50 -16.00 Sum up and plan for our seminars in December

Abstracts
Artificial ‘ulama – Analyzing AI-Generated Islamic Theology – Jonas Svensson
The presentation provides information on, and preliminary findings from my research project Artificial ‘Ulama, examining how artificial intelligence systems produce Islamic theological content. The study focusses on how modern Large Language Models interpret and respond to prompts based on Islamic texts, concepts, and interpretational frameworks.

The presentation will focus preliminary results on having LLMs translate and interpret the Qur’an, produce views on  inter religious dialogue and producing synthetic data. 


Socio-Technical Considerations on Inter- and Intra-Organizational Sustainability Data Sharing Anna Sell
In response to new sustainability reporting requirements, such as the EU Corporate Sustainability Reporting Directive (CSRD), companies are increasingly expected to collect and share sustainability data, not only within their own operations but across entire supply chains. Most manufacturing companies operate in multiple supply chains and must adapt to the varying data requirements of each. The cross-organizational scope, unclear data expectations and lack of standardization make sustainability data particularly challenging to work with. Internally, companies’ existing data infrastructures and reporting capabilities are tailored to traditional business data, making them ill-suited for the complex and heterogeneous nature of sustainability data. In this research we explore the paradoxes and barriers that companies must navigate in order to move from compliance-driven reporting to value-creating use of sustainability data.  

Final seminar before the licentiate thesis – Nemi Pelgrom

2025-10-28

When? Thursday November 6, 10-12
Where? Onsite D1172 and via zoom
Registration: No registration needed – just come by

Abstract
Transcribing numbers and Receipts with Generative AI – Nemi Pelgrom
This dissertation investigates the usability of multi-modal language models (MMLMs) as transcription tools, with a focus on their reliability, limitations, and error mechanisms in document parsing tasks.

The work addresses four research questions across three studies. First, the potential of vision-capable generative models for extracting structured information from complex financial documents is evaluated using GPT-4. Tested on 1,000 digital invoices and 1,000 photographic receipts, the model achieved near-perfect accuracy, 99.8\% and 99.5\% respectively, with an additional API-based trial reaching 94.4\%. Second, the capacity of MMLMs to transcribe long numerical strings is explored, showing that GPT-4 and GPT-4o maintain 100\% accuracy up to 75 digits, after which performance drops sharply. Third, systematic error patterns are identified in transcription of random number sequences; mistakes consistently occur in the same positions across repeated runs, and hallucinated digits account for only 23\% of total errors, indicating biases and structured failure modes rather than noise. Lastly, a framework for categorisation of transcription errors is introduced, based on the analysis of 5,502 mistakes across GPT-4o and ARIA.

This reveals three mutually exclusive categories, and a detailed examination of ways to automatically distinguish between them, where the Ratcliff/Obershelp similarity was found to be highly useful. Together, these findings demonstrate that state-of-the-art MMLMs can already be deployed in production settings where accuracy and scalability are critical, while also providing systematic methods for diagnosing their weaknesses and guiding future model development.

Welcome to the Higher Research Seminar in October

2025-10-01

When? Friday 24 October,14-16
Where? Onsite: D1172 and via zoom
Registration: Please sign up for the seminar via this link, https://forms.gle/m3nRqxaQmnv8zETb6 by 20 October.

Agenda
14.00-14.10 Welcome and practical information
14.10-14.55 Presentation and discussion: 
The Self-Healing Hypochondriac: Confessions of an AI Nerd: From Skeleton Avatar Technology to medical insights and future directions in AI for eHealth – Welf Löwe
14.55 – 15.05 Coffee break
15.05 – 15.50 Presentation and discussion:  The Journey and Lessons of Emerging Technologies in Education: Insights from the European Project Exten.(D.T.)² – Alisa Lincke
15.50 -16.00 Sum up and plan for our seminars in November

Abstract
The Self-Healing Hypochondriac: Confessions of an AI Nerd
From Skeleton Avatar Technology to medical insights and future directions in AI for eHealth – Welf Löwe

This talk introduces our Skeleton Avatar Technology (SAT), an AI-based approach to video motion analysis. We will present ongoing research, recent results, commercialization efforts, and future directions, highlighting how SAT can provide valuable medical insights and transform applications in healthcare and elderly care. In addition, we will briefly outline our other research activities in AI and eHealth.

The Journey and Lessons of Emerging Technologies in Education: Insights from the European Project Exten.(D.T.)² – Alisa Lincke

This seminar introduces the Exten.(D.T.)² project (Extending Design Thinking with Emerging Digital Technologies), a Horizon Europe / Innovate UK initiative (2022–2025). The project enhances Design Thinking in schools by integrating technologies such as AI, augmented reality, robotics, 3D printing, with a main focus on authorable learning analytics and dashboards. Implemented in six European countries, it explores both the opportunities and challenges of using these tools to foster creativity, collaboration, problem solving, and digital literacy. Personal experiences from European research projects will be shared, with reflections on lessons learned in cross-national collaboration.