DISA

Centre for Data Intensive Sciences and Applications

Welcome to PhD-seminar September 2024

2024-08-29

When? Friday September 6th 14-16
Where? Onsite: D2272 at Linnaeus University in Växjö and online
Registration: Please sign up for the PhD-seminar via this link https://forms.gle/xtG9s5Qhs4SFd98E7 by September 4th (especially important if you plan on attending onsite so we have fika for everyone)

Agenda

14.00-14.10 Welcome and practical information from Welf Löwe
14.10-14.55 Presentation and discussion: Reuse of health data, combing the best of two worlds – Machine learning driven knowledge discovery from real world health data with collaboration of domain expert – Olle Björneld, Industrial PhD-student Region Kalmar
14.55 – 15.05 Coffee break
15.05 – 15.50 Presentation and discussion: Remaining useful life prediction of batteries based on historical loading-unloading cycle logs – Zijie Feng, Industrial PhD-student Micropower
15.50 -16.00 Sum up and plan for our next seminar on October 4th and other ongoing activities.

Abstracts

Reuse of health data, combing the best of two worlds – Machine learning driven knowledge discovery from real world health data with collaboration of domain expertOlle Björneld, Industrial PhD-student Region Kalmar
The main objective of the PhD project is “How can data driven knowledge discovery in databases (KDD), performed in the medical research domain supported with domain knowledge, be more effective and efficient?” To answer this question the following work has been performed:

Knowledge discovery from real-world data in health care can be demanding due to unstructured data and low registration quality in electronic health records (EHRs). Close collaboration between domain experts and data scientists is essential. New variables, referred to as features, are generated from domain experts and computer scientists in collaboration with medical researchers. This process is named knowledge-driven feature engineering (KDFE). (Study A, published)

A case study comprising two projects (P1 and P2) was performed to evaluate the effectiveness of manual KDFE (mKDFE), the effectiveness was represented of classification performance, more precisely the area under the receiver operating characteristic curve (AUROC). The study gave salient results that it is valuable for medical researchers to involve a data scientist when medical research based on real world medical data is performed. When mKDFE was compared to baseline, the average classification performance measured by AUROC for the engineered features rose for P1 from 0.62 to 0.82 and for P2 from 0.61 to 0.89 (p-values << 0.001). (Study B, published)

To perform KDD more effectively and efficiently, a framework for automatic Knowledge Driven Feature Engineering (aKDFE) was developed. Central to aKDFE is an automated feature engineering (FE), i.e., an automated construction of new, highly informative features, from those directly observed and recorded, e.g., in EHRs. The framework learns and aggregates domain knowledge to generate features that are more informative compared to those recorded in EHRs or manually engineered (manual FE) as done in many medical research projects today.

aKDFE is (i) more efficient than manual FE since it automates the manual knowledge discovery and FE processes. It is (ii) more effective due to its higher predictive power compared to manual KDFE. Finally, aKDFE (iii) applies and describes data pivoting and feature generation as explicit and transparent operation sequences on EHR features. (Study C, published)

Domain expert knowledge can be found in knowledge databases or expert knowledge decision support systems, in which derived and distilled knowledge has been manually entered and can be represented as risk scores or indexes. To leverage the effect of expert knowledge in aKDFE we will dissect the following questions: “How does decision support scores impact the effectiveness of aKDFE?”. (Study D/E, under construction)

aKDFE saves time and resources from medical researchers and produces more informative features, however future enhancements still exists, (i) evaluation of more sophisticated time series oriented models, (ii) use of LLMs to collect and structure domain knowledge, and (iii) evaluate multi-agent knowledge discovery.

Remaining useful life prediction of batteries based on historical loading-unloading cycle logsZijie Feng, Industrial PhD-student Micropower

As technology advances, battery usage has become increasingly prevalent in daily life. Many traditional fuel-powered mechanical devices, such as forklifts and automated guided vehicles, are now powered by battery. Concurrently, concerns about safety and efficiency have heightened the focus on monitoring the condition of batteries in these large devices.

During usage, the battery’s actual capacity diminishes gradually. When the capacity falls to a certain threshold, the battery becomes unusable. In general, we can measure the remaining useful life of a battery (i.e., RUL) in two ways: directly by measuring the physical and chemical characteristics of the battery, and indirectly by using data-driven models. Since direct measurement of batteries is very inconvenient, RUL prediction based on data models is a promising research direction. RUL is typically estimated, considering the battery’s condition and the customer’s usage. However, both factors are influenced by numerous variables, introducing uncertainty into the estimated RUL, and consequently significant fluctuations in the RUL curve.

In this presentation, we will share the progress we’ve made at Micropower in developing a workflow that predicts battery RUL with confidence intervals using machine learning algorithms on historical battery cycle logs. These results will help battery owners and suppliers plan maintenance and replacements in advance. Additionally, we will also introduce our ongoing project on anomaly detection within battery logs.

Welcome to Research Seminar in Mathematics

2024-08-25

When? Wednesday September 4th 11.30-13.00
Where? D1140, Campus Växjö.
Registration: No registration needed – just come by

We have a guest visit from Sebastian Zeng from Universität Salzburg (Austria) who will give a guest lecture with the title Latent SDEs on Homogeneous Spaces

Abstract: We consider the problem of variational Bayesian inference in a latent variable model where a (possibly complex) observed stochastic process is governed by the solution of a latent stochastic differential equation (SDE). Motivated by the challenges that arise when trying to learn an (almost arbitrary) latent neural SDE from data, such as efficient gradient computation, we take a step back and study a specific subclass instead. In our case, the SDE evolves on a homogeneous latent space and is induced by stochastic dynamics of the corresponding (matrix) Lie group. In learning problems, SDEs on the unit n-sphere are arguably the most relevant incarnation of this setup. Notably, for variational inference, the sphere not only facilitates using a truly uninformative prior, but we also obtain a particularly simple and intuitive expression for the Kullback-Leibler divergence between the approximate posterior and prior process in the evidence lower bound. Experiments demonstrate that a latent SDE of the proposed type can be learned efficiently by means of an existing one-step geometric Euler-Maruyama scheme. Despite restricting ourselves to a less rich class of SDEs, we achieve competitive or even state-of-the-art results on various time series interpolation/classification problems.

For more information or questions about the seminar, please contact:
– Wolfgang Bock wolfgang.bock@lnu.se or Jonas Nordqvist jonas.nordqvist@lnu.se

Welcome to Higher Research Seminar 240816

2024-08-12

When? Friday August 16th 14-16
Where? Onsite: B1009 at Linnaeus University in Växjö and online
Registration: Please sign up for the PhD-seminar via this link https://forms.gle/aYqRMod68hVLv8EW9 by August 14th (especially important if you plan on attending onsite so we have fika for everyone)

Agenda

14.00-14.10 Welcome and practical information from Welf Löwe
14.10-14.55 Presentation and discussion: Improving Non-Indigenous Species Introduction Risk Considering Seasonality and Gravity-informed Deep Learning Models – Amilcar Soares
14.55 – 15.05 Coffee break
15.05 – 15.50 Presentation and discussion – State-of-the-art and ongoing research on the Visualization of Temporal and Multivariate Networks – Claudio Linhares
15.50 -16.00 Sum up and plan for the September seminar

Abstracts

Improving Non-Indigenous Species Introduction Risk Considering Seasonality and Gravity-informed Deep Learning Models – Amilcar Soares

The introduction and spread of aquatic non-indigenous species (NIS) pose significant threats to global biodiversity, disrupt ecosystems, and cause substantial economic damage in agriculture, forestry, and fisheries. The growing complexity of international trade and transportation networks has exacerbated the risk of NIS introduction and spread. In this presentation, I will discuss the common problem of NIS management and the importance of robust risk assessment models to mitigate these threats. First, I will present a study investigating the influence of temporal variability in sea surface temperature and salinity on ballast water risk assessment (BWRA) models. By comparing global ports’ monthly and annual environmental data, the study highlights how seasonal variations can impact the environmental similarity scores between source and recipient locations, which are crucial for predicting NIS survival and establishment. The findings suggest that incorporating monthly data in BWRA models provides a more sensitive and accurate risk assessment than traditional annual average models. Next, I will introduce a novel physics-informed model designed to forecast maritime shipping traffic and assess the risk of NIS spread through global transportation networks. This model, inspired by the gravity model for international trade, integrates factors such as shipping flux density, port distance, trade flow, and centrality measures. By incorporating transformers, the model effectively captures both short- and long-term dependencies, achieving significant improvements in predicting vessel trajectories and traffic flows. The enhanced accuracy of this model aids policymakers and stakeholders in identifying high-risk invasion pathways and prioritizing management actions. Together, these studies advance our understanding of NIS risk assessment and underscore the need for dynamic, data-driven approaches to effectively manage and mitigate NIS’s impacts in a rapidly changing global landscape.

State-of-the-art and ongoing research on the Visualization of Temporal and Multivariate Networks – Claudio Linhares
This presentation will cover an overview of current research on visualizing temporal and multivariate networks, emphasizing the challenges and advancements in representing evolving interactions and diverse attributes. Also, it will discuss ongoing research into temporal network visualization techniques, including static and dynamic approaches, and the incorporation of multivariate attributes, such as node features, edge weights, and temporal dynamics. Furthermore, it will explore the challenges of scalability, interpretability, and interactive exploration.