Previous Seminars

About the Seminars

The research seminars in data science will consist of talks on fundamental and applied research within statistics, machine learning and artificial intelligence.

Simula@bi research seminars spring 2022


Date Time Title Speaker Institution
3 February 13:30 - 14:30 Identifying dominant units using graphical models in panel time series data Jan Ditzen Free University of Bozen-Bolzano
10 February 13:30 - 14:30 Automatic misinformation detection: results from the medieval multimedia evaluation challenge Konstantin Pogorelov
31 March 13:30 - 14:30 Should we stop using black box machine learning models and use interpretable models instead? Lars Henry Berge Olsen University of Oslo
24 March 13:30 - 14:30 Data assimilation: Methods, convergence, results and challenges Håkon Hoel Department of Mathematics, RWTH Aachen
3 March 13:30 - 14:30 Jumps or Staleness Roberto Renò Department of Economics, the University of Verona
16 June 13:30 - 14:30 Adventures in data-driven software engineering Leon Moonen Simula and BI
19 May 13:30 - 14:30 Mean field models in physics and social science - from micro to macro Avi Mayorcas University of Cambridge
12 May 13:30 - 14:30 Back to the present: learning about the Euro-area through a now-casting model Domenico Giannone Amazon.com

Simula@bi research seminars autumn 2021


Date Time Title Speaker Institution
9 September 13:30-14:30 On the asymptotic behaviour of the variance estimator of a U-statistic Riccardo De Bin UiO
23 September 13:30-14:30 Explaining News Spreading Phenomena in Social Networks Daniel Thilo Schroeder Simula
14 October 13:30-14:30 Predicting service time to improve route optimization Tarjei Bondevik Oda
21 October 13:30-14:30 Dynamic Combination and Calibration for Climate Predictions Francesco Ravazzolo BI
4 November 13:30-14:30 Probabilistic programming (and what it means for you) Jan Kudlicka BI
11 November 13:30-14:30 Explainable AI - from game theory with love Inga Strumke NTNU
2 December 13:30-14:30 SMARTboost: Efficient Boosting of Smooth Regression Trees Paolo Giordani BI

Simula@bi research seminars spring 2021

Date Time Title Speaker Institution Url
10 June 12:30-13:30 Bayesian network structure learning Wei-Ting Yang BI BI
Registration for the seminar

A Bayesian Network is a probabilistic graphic model describing the conditional dependencies among variables via a directed acyclic graph (DAG). The structure of Bayesian networks can be established based on expert domain knowledge or learned from data. Learning structure from data is known to be a computationally challenging, NP-hard problem. In this seminar, two main structure learning methods will be introduced, namely constraint-based methods and score-based methods. Other approaches will be discussed as well. I will also present my previous research, which uses Bayesian networks to solve process control problems in semiconductor manufacturing. The structure is constructed based on sensor data collected from process machines, and some existing domain knowledge is also integrated into the learning procedure.
27 May 12:30-13:30 Multimodal learning models Rogelio A Mancisidor BI BI
Registration for the seminar

Multimodal learning engages multiple action systems of a learner. For example, children learn to distinguish cats from dogs by auditory inputs in addition to the obvious visual inputs. There is a field in machine learning that aims to mimic such learning process.

The main difference in the machine learning approach for multimodal learning is that learning is not done in the input space (the auditory and visual inputs) but using a (hopefully) better representation for the inputs. Once such alternative representation is learned, it can be used to generate or classify the inputs, e.g. images of cats or dogs.

In this seminar, I will present my own work on multimodal learning models that are able to learn and generate alternative representations of the input data even when one of the inputs are missing. That is, the alternative representation that is generated contains information from both auditory and visual inputs even when it was generated only with the visual input for example. I will show generative and classification results, for the models that I have developed, on different domains, e.g. image-to-annotation, image-to-image, acoustic-to-articularoty, and historic-to-future credit risk data.
20 May 12:30-13:30 Scalable changepoint and anomaly detection in cross-correlated data Martin Tveten Norwegian Computing Center NR
Registration for the seminar

Motivated by a problem of detecting time-periods of suboptimal operation of a sensor-monitored subsea pump, this talk presents work on detecting anomalies or changes in the mean of a subset of variables in cross-correlated data. The approach is based on penalised likelihood methodology, but the maximum likelihood solution scale exponentially in the number of variables. I.e., not many variables are needed before an approximation is necessary. We propose an approximation in terms of a binary quadratic program and derive a dynamic programming algorithm for computing its solution in linear time in the number of variables, given that the precision matrix is banded. Our simulations indicate that little power is lost by using the approximation in place of the exact maximum likelihood, and that our method performs well even if the sparsity structure of the precision matrix estimate is misspecified. Through the simulation study, we also aim to understand when it is worth the effort to incorporate correlations rather than assuming all variables to be independent in terms of detection power and accuracy.
29 April 12:30-13:30 Inference in second-order Bayesian networks Magdalena Ivanovska BI BI
Registration for the seminar

Bayesian networks are a well-established framework for modelling and reasoning with probabilistic information, with a variety of practical applications. However, they require specification of exact probability values which can be challenging in many cases due to lack of data and/or domain expertise. In this talk I give a short introduction to Bayesian networks and their advantages as a modelling tool. Then I discuss our previous and ongoing research on inference in second-order Bayesian networks, in which the uncertainty about the probabilities is expressed by beta and Dirichlet distributions.
22 April 12:30-13:30 Research at OsloMet AI Lab Hugo Hammer OsloMet/ Simula@OsloMet OsloMet
Registration for the seminar

Over the recent years machine learning, and especially deep learning, has received a lot of attention in medicine. In this seminar I will present two ongoing projects at OsloMet AI lab where deep learning is used to solve medical problems. The projects focus on improving assisted reproduction technology and detecting abnormalities in the gastrointestinal tract. I will also shortly review other research at OsloMet AI lab.
15 April 12:30-13:30 The regression discontinuity design and survival analysis Emil Aas Stoltenberg BI BI
Registration for the seminar

In this talk I'll present ongoing work on the regression discontinuity design when applied to right-censored survival data. The regression discontinuity design is a design used to evaluate the effect of treatment (a 0-1 random variable) on some outcome. Assignment to the treatment group is determined by the value of an observable covariate lying on either side of a fixed threshold. The idea is that the individuals whose value of this covariate is in a small interval around the threshold are alike, so that by basing inference on these individuals one is controlling for unobserved confounders. The survival analysis models I'll be discussing are those where the outcome is the minimum of a true event time and a censoring variable, giving rise to right-censored survival data.
25 March 12:30-13:30 Developing NLP models without labelling data: a weak supervision approach Pierre Lison Norwegian Computing Center, NR NR
Registration for the seminar

Perhaps the most important problem we need to face when developing NLP models is the lack of training data. When in-domain labelled data is available, techniques based on transfer learning and data augmentation can certainly help. But what should we do when there is no hand-labelled data for the target domain? One powerful alternative is weak supervision, which seeks to automatically annotate target-domain texts based on a combination of various labelling functions, such as heuristics, gazetteers, out-of-domain models, and document-level constraints. Those labelling functions are then aggregated into a single, unified annotation layer through unsupervised learning, taking into account the varying accuracies and correlations between labelling functions. We illustrate this approach on a Named Entity Recognition task, where we show that such an approach is able to outcompete more complex neural models based on unsupervised domain adaptation.
18 March 12:30-13:30 Measuring agreement in the light of decision theory Jonas Moss BI BI
There are many chance-corrected measures of agreement, such as the weighted Cohen's kappa, Krippendorff's alpha, Scott's pi, and Fleiss' kappa. I propose a general chance-corrected measure of agreement using decision theory, which I call the agreement coefficient. Most well-known measures of agreement are special cases of the agreement coefficient, but there are realistic circumstances where no established measure of agreement estimate it consistently. In this talk, I give a gentle introduction to the why's and how's of measures of agreement, and explain how everything is connected to decision theory.
11 March 12:30-13:30 Deep Randomized Neural Networks Claudio Gallicchio University of Pisa Homepage
Deep Neural Networks (DNNs) are a fundamental tool in the modern development of Machine Learning. Beyond the merits of the training algorithms, a great part of DNNs success is due to the inherent properties of their layered architectures, i.e., to the introduced architectural biases. This talk explores recent classes of DNN models in which the majority of connections are untrained, i.e., randomized or more generally fixed according to some specific heuristic. Limiting the training algorithms to operate on a reduced set of weights implies intriguing features. Among them, the extreme efficiency of the learning processes is undoubtedly a striking advantage with respect to fully trained counterparts. Besides, despite the involved simplifications, randomized neural systems possess remarkable properties both in practice, achieving state-of-the-art results in multiple domains, and theoretically, allowing to analyze the intrinsic properties of neural architectures. This talk will cover the major aspects regarding Deep Randomized Neural Networks, with a particular focus on dynamically recurrent deep neural systems for time-series and graphs.
18 Feb 12:30-13:30 Some recent contributions to psychometrics Steffen Grønneberg BI BI
I will summarize some recent psychometric papers I have written jointly with Njål Foldnes and Jonas Moss. I will focus on two papers: "Partial identification of latent correlations with binary data" (2020, Psychometrika) and "The sensitivity of structural equation modeling with ordinal data to underlying non-normality and observed distributional forms" (2021, Psychological methods, forthcoming). We consider a class of statistical models for ordinal data much in use in applied psychology and other fields within the social sciences from a critical perspective. A traditional and to a large extent unverifiable assumption is that the ordinal variables originating from a person's answers to a questionnaire on a scale say of 1 to 5 (E.g. "How content are you with your life", "Are you happy with your life?", etc.) is generated by "chopping up" (or more technically, discretizing) a continuous multivariate normal variable. We refused to make this assumption, and saw where it led us. The papers investigate what can be inferred from assuming only that the answers are discretization of some variable, not necessarily normal, and to what extent existing methodology is robust towards underlying non-normality.
11 Feb 12:30-13:30 Fake news, Digital Wildfires, and Graph Neural Networks Johannes Langguth Simula/BI Simula
In the recent years, misinformation in social networks, often referred to as fake news, has received significant attention from media, researchers, and the general public. We introduce the the topic while focusing on digital wildfires i.e. fast spreading online misinformation phenomena with the potential to cause harm in the physical world. We introduce possible countermeasures from the computer science point of view and show that among the possible countermeasures, network based approaches are particularly suitable, both due to their generality and robustness. We then explain the recently developed graph neural networks and show why they constitute a powerful tool for network based tracking of misinformation.