A Bayesian two-way latent structure clustering model for genomic data integration with an application to subtyping breast cancer

  • Starts:12:00, 1 November 2022
  • Ends:13:00, 1 November 2022
  • Location:At BI, C2-055
  • Contact:Siri Johnsen (siri.johnsen@bi.no)

Speaker: Professor Arnoldo Frigessi

We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. Many of the methods proposed so far, make the assumption of a common clustering across all data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources.

A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. Our formulation makes no nestedness assumptions of samples across data sources so that a sample that is missing data from one genomic source can be clustered according to its other existing data sources.

We apply our model to a Norwegian breast cancer cohort of ductal carcinoma and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. This is joint work with David M. Swanson, Tonje Lien, Helga Bergholtz and Therese Sørlie.