A Bayesian two-way latent structure clustering model for genomic data integration with an application to subtyping breast cancer
Speaker: Professor Arnoldo Frigessi
We propose a Bayesian parametric model for integrative, unsupervised clustering across data sources. Many of the methods proposed so far, make the assumption of a common clustering across all data sources. In our two-way latent structure model, samples are clustered in relation to each specific data source, but cluster labels have across-dataset meaning, allowing cluster information to be shared between data sources.
A common scaling across data sources is not required, and inference is obtained by a Gibbs Sampler. Posterior interpretation allows for inference on common clusterings occurring among subsets of data sources. Our formulation makes no nestedness assumptions of samples across data sources so that a sample that is missing data from one genomic source can be clustered according to its other existing data sources.
We apply our model to a Norwegian breast cancer cohort of ductal carcinoma and invasive tumors, comprised of somatic copy-number alteration, methylation and expression datasets. This is joint work with David M. Swanson, Tonje Lien, Helga Bergholtz and Therese Sørlie.