Welcome to Higher Research Seminar 241213
2024-12-09
When? Friday December 13th 14-16
Where? Onsite: D2272 and via zoom
Registration: Please sign up for the PhD-seminar via this link by https://forms.gle/94Gb6pGdQ5qj2BeD7 December 11th (especially important if you plan on attending onsite so we have fika for everyone)
Agenda
14.00-14.10 Welcome and practical information from Welf Löwe
14.10-14.55 Presentation and discussion: The deterministic pancake forest – Jonas Nordqvist
14.55 – 15.05 Coffee break
15.05 – 15.50 Presentation and discussion – “Will It Hold? Predicting the Joinability of Metals Before Welding Them” and “A Foundational Approach for Fine-Grained Commit Quantification” – Sebastian Hönel
15.50 -16.00 Sum up and plan for the spring seminars
Abstracts
The deterministic pancake forest – Jonas Nordqvist
In this talk, we will discuss a classical problem in computer science, namely sorting by prefix reversal. Its more popular name, Pancake Sorting, aside, it is actually more than just a toy problem. The long-standing question is: given a list of length n, what is the minimal number of prefix reversals needed to sort it? However, in the 70s, Conway proposed that one might study a deterministic version of this problem. Doing so, the problem, formulated as a discrete dynamical system, gives rise to an adjacency graph that is a collection of trees, i.e., a forest; more precisely, the deterministic pancake forest. Besides discussing the problem in general, I will present some results on the pancake forest and how this relates back to the original problem.
Will It Hold? Predicting the Joinability of Metals Before Welding Them – Sebastian Hönel
In the context of automotive applications, a common task is to join two or more parts, such as sections of a car’s frame.The joining of dissimilar metals presents a critical challenge in automotive manufacturing due to the differing thickness, as well as thermal, mechanical, and electrical properties of the base materials.
The challenge further lies in joining a varying number of materials reliably, that is, obtaining a joint that is sufficiently large and stable. Extensive laboratory tests using spot welding were conducted to gather an understanding of which materials using which parameters can be welded together. However, performing these tests is costly and trials need to be repeated multiple times to get robust and dependable estimates.
This study focuses on A) establishing a probabilistic understanding of selected parameters, materials, and welding outcome, and B) prediction of joint quality given the desired materials and parameters. To address these challenges, we employ deep conditional density estimation in conjunction with regression models.
Some preliminary results show that predicting joint size is within a reasonable error of margin, especially since we have not yet considered material properties just yet. Furthermore, a conditional normalizing flow was able to accurately capture the joint density of our dataset, allowing us to estimate the probability that a joint is sufficiently stable and to efficiently oversample the underrepresented test cases.
A Foundational Approach for Fine-Grained Commit Quantification – Sebastian Hönel
Commits are sets of changed made continuously to a software repository. Understanding commits and the purpose behind them is crucial for a wider range of applications, such as commit classification, fault prediction and -localization, or automated commit message generation.
Extracting features from commits is and has historically been a challenging task. In the past, many studies were limited to commit metadata or human-engineered features specific to the downstream task at hand. Such features are almost always far inferior to semi- or unsupervised approaches used in representation learning.
With the recent advent of large language models (LLMs), the ability to largely capture the underlying (changed) source code in a commit has significantly improved. However, the inherent tree-like structure of a commit, together with a variable number of affected files, hunks, etc., which are also of variable length, poses a challenge for, e.g., regression- or discriminative models.
We attempt to alleviate these challenges once and for all by suggesting a foundational approach that consists of A) a language-agnostic, fine-grained, and multi-scale source code and metadata commit extraction, and B) a flexible deep-learning-based framework for the embedding, reduction, and projection of commits. The framework is agnostic with regard to the choice of LLM(s) and exploits transformers as well as recurrence-based architectures.
We evaluate our framework using an enhanced version of the downstream task of commit classification. We add uncertainty estimation which allows the trained model to quantify the risk of misclassification. The model exploits multiple-instance learning and optionally a stochastic version of what constitutes a commit to not only allow classification, but to also enable intent-disentanglement of merge- and ordinary commits and classification of fractional commits.