Employee Profile

Jan Kudlicka

Associate Professor - Department of Data Science and Analytics

Room number B3Y-074
E-mail jan.kudlicka@bi.no

Biography

Please visit my LinkedIn profile or my homepage for more information.

Area of Expertise

Publications

Scientific publications

Lundén, Daniel; Hummelgren, Lars, Kudlicka, Jan, Eriksson, Oscar & Broman, David (2024)

Suspension Analysis and Selective Continuation-Passing Style for Universal Probabilistic Programming Languages

Weirich, Stephanie (red.). 33rd European Symposium on Programming, ESOP 2024, Held as Part of the European Joint Conferences on Theory and Practice of Software, ETAPS 2024, Luxembourg City, Luxembourg, April 6–11, 2024, Proceedings, Part II

Universal probabilistic programming languages (PPLs) make it relatively easy to encode and automatically solve statistical inference problems. To solve inference problems, PPL implementations often apply Monte Carlo inference algorithms that rely on execution suspension. State-of-the-art solutions enable execution suspension either through (i) continuation-passing style (CPS) transformations or (ii) efficient, but comparatively complex, low-level solutions that are often not available in high-level languages. CPS transformations introduce overhead due to unnecessary closure allocations—a problem the PPL community has generally overlooked. To reduce overhead, we develop a new efficient selective CPS approach for PPLs. Specifically, we design a novel static suspension analysis technique that determines parts of programs that require suspension, given a particular inference algorithm. The analysis allows selectively CPS transforming the program only where necessary. We formally prove the correctness of the analysis and implement the analysis and transformation in the Miking CorePPL compiler. We evaluate the implementation for a large number of Monte Carlo inference algorithms on real-world models from phylogenetics, epidemiology, and topic modeling. The evaluation results demonstrate significant improvements across all models and inference algorithms.

Iwaszkiewicz-Eggebrecht, Elzbieta; Granqvist, Emma, Buczek, Mateusz, Prus, Monika, Kudlicka, Jan, Roslin, Tomas, Tack, Ayco J. M., Andersson, Anders F., Miraldo, Andreia, Ronquist, Fredrik & Łukasik, Piotr (2023)

Optimizing insect metabarcoding using replicated mock communities

Methods in Ecology and Evolution, 14(4), s. 1130- 1146. Doi: 10.1111/2041-210X.14073 - Full text in research archive

Metabarcoding (high-throughput sequencing of marker gene amplicons) has emerged as a promising and cost-effective method for characterizing insect community samples. Yet, the methodology varies greatly among studies and its performance has not been systematically evaluated to date. In particular, it is unclear how accurately metabarcoding can resolve species communities in terms of presence-absence, abundance and biomass. Here we use mock community experiments and a simple probabilistic model to evaluate the effect of different DNA extraction protocols on metabarcoding performance. Specifically, we ask four questions: (Q1) How consistent are the recovered community profiles across replicate mock communities?; (Q2) How does the choice of lysis buffer affect the recovery of the original community?; (Q3) How are community estimates affected by differing lysis times and homogenization? and (Q4) Is it possible to obtain adequate species abundance estimates through the use of biological spike-ins? We show that estimates are quite variable across community replicates. In general, a mild lysis protocol is better at reconstructing species lists and approximate counts, while homogenization is better at retrieving biomass composition. Small insects are more likely to be detected in lysates, while some tough species require homogenization to be detected. Results are less consistent across biological replicates for lysates than for homogenates. Some species are associated with strong PCR amplification bias, which complicates the reconstruction of species counts. Yet, with adequate spike-in data, species abundance can be determined with roughly 40% standard error for homogenates, and with roughly 50% standard error for lysates, under ideal conditions. In the latter case, however, this often requires species-specific reference data, while spike-in data generalize better across species for homogenates. We conclude that a non-destructive, mild lysis approach shows the highest promise for the presence/absence description of the community, while also allowing future morphological or molecular work on the material. However, homogenization protocols perform better for characterizing community composition, in particular in terms of biomass.

Lundén, Daniel; Öhman, Joey, Kudlicka, Jan, Senderov, Viktor, Ronquist, Fredrik & Broman, David (2022)

Compiling Universal Probabilistic Programming Languages with Efficient Parallel Sequential Monte Carlo Inference

Sergey, Ilya (red.). Programming Languages and Systems (31st European Symposium on Programming, ESOP 2022)

Probabilistic programming languages (PPLs) allow users to encode arbitrary inference problems, and PPL implementations provide general-purpose automatic inference for these problems. However, constructing inference implementations that are efficient enough is challenging for many real-world problems. Often, this is due to PPLs not fully exploiting available parallelization and optimization opportunities. For example, handling probabilistic checkpoints in PPLs through continuation-passing style transformations or non-preemptive multitasking—as is done in many popular PPLs—often disallows compilation to low-level languages required for high-performance platforms such as GPUs. To solve the checkpoint problem, we introduce the concept of PPL control-flow graphs (PCFGs)—a simple and efficient approach to checkpoints in low-level languages. We use this approach to implement RootPPL: a low-level PPL built on CUDA and C++ with OpenMP, providing highly efficient and massively parallel SMC inference. We also introduce a general method of compiling universal high-level PPLs to PCFGs and illustrate its application when compiling Miking CorePPL—a high-level universal PPL—to RootPPL. The approach is the first to compile a universal PPL to GPUs with SMC inference. We evaluate RootPPL and the CorePPL compiler through a set of real-world experiments in the domains of phylogenetics and epidemiology, demonstrating up to 6× speedups over state-of-the-art PPLs implementing SMC inference.

Ronquist, Fredrik; Kudlicka, Jan, Senderov, Viktor, Borgström, Johannes, Lartillot, Nicolas, Lundén, Daniel, Murray, Lawrence, Schön, Thomas & Broman, David (2021)

Universal probabilistic programming offers a powerful approach to statistical phylogenetics

Communications Biology

Statistical phylogenetic analysis currently relies on complex, dedicated software packages, making it difficult for evolutionary biologists to explore new models and inference strategies. Recent years have seen more generic solutions based on probabilistic graphical models, but this formalism can only partly express phylogenetic problems. Here, we show that universal probabilistic programming languages (PPLs) solve the expressivity problem, while still supporting automated generation of efficient inference algorithms. To prove the latter point, we develop automated generation of sequential Monte Carlo (SMC) algorithms for PPL descriptions of arbitrary biological diversification (birth-death) models. SMC is a new inference strategy for these problems, supporting both parameter inference and efficient estimation of Bayes factors that are used in model testing. We take advantage of this in automatically generating SMC algorithms for several recent diversification models that have been difficult or impossible to tackle previously. Finally, applying these algorithms to 40 bird phylogenies, we show that models with slowing diversification, constant turnover and many small shifts generally explain the data best. Our work opens up several related problem domains to PPL approaches, and shows that few hurdles remain before these techniques can be effectively applied to the full range of phylogenetic models.

Academic Degrees
Year	Academic Department	Degree
2021	Uppsala University	Ph.D.

Work Experience
Year	Employer	Job Title
2023 - 2024	Aplia AS	Senior System Architect (20%)