I am a Senior Research Fellow and Team Leader at the MRC Biostatistics Unit, University of Cambridge (United Kingdom). I hold a PhD in Mathematics from EPFL (Switzerland). My research lies at the intersection of statistical methodology and its application to open problems in biomedicine.
My team develops Bayesian methods and accompanying software, with a focus on scalable hierarchical modelling approaches for variable selection, latent structure discovery and network estimation, in high-dimensional or temporal data settings. Our methods are motivated by research questions arising from collaborative clinical and biological studies. Our overarching goal is to provide principled statistical and computational tools to help advance our understanding of the biological processes driving disease risk and progression.
I have the pleasure of working with talented researchers at the MRC Biostatistics Unit:
Former team members:
I also regularly supervise Bachelor and Master theses of students from Cambridge University and EPFL (Lausanne), and I am a co-organiser of the MRC Biostatistics Unit Internship Programme.
Aug 7, 2024: Our Joint Graphical Horseshoe approach has just been published in the Annals of Applied Statistics! Joint with Camilla Lingjaerde, Benjamin Fairfax and Sylvia Richardson.
Jun 25, 2024: Our work on leveraging node-level information for Bayesian network inference has now been published in Biostatistics ! Lead Author: Xiaoyue Xi.
May 31, 2024: Joseph Feest, third year student at Cambridge University (and former summer intern in the team) has successfully completed a year-long project part of his degree, supervised by Xiaoyue Xi and Camilla Lingjaede. Congratulations to him !
Apr 26, 2024: Our work will be highlighted at the International Society for Clinical Biostatistics (ISCB) Conference 2024 in Thessaloniki-Greece, with a talk on high-dimensional functional data analysis given by Marion Kerioui (programme).
Mar 25, 2024: The research of Salima Jaoua, formerly EPFL Master student, on Bayesian FPCA for longitudinal gene expression data, will be presented Daniel Temko (supervisor) at the European Mathematical Genetics Meeting 2024 in Vienna (programme).
The landscape of biomedical research is changing rapidly, driven by technological advances for quantifying clinical and molecular data at scale. This evolution not only offers a more granular view of disease mechanisms, but also parallels a growing acknowledgment that pathogenic responses are tightly coordinated at the organismal level. Consequently, there is a need for modelling approaches capable of providing a holistic understanding of complex interplays across biological systems.
My team aims to provide principled statistical methodology for tackling this challenge, guided by collaborations with clinicians and researchers in areas such as immunology, infectiology and cancer. Specifically, we develop Bayesian hierarchical modelling approaches for sparse regression, graphical modelling, latent factor modelling and functional data analysis to leverage complicated dependences within and between heterogeneous biological data sources (e.g., genetics, genomics, proteomics, metabolomics), while conveying uncertainty coherently.
We are particularly interested in addressing the tension between flexible joint estimation and practical feasibility for analysis at the scale of current biomedical studies, through dedicated efforts to enhance accuracy, robustness and computational tractability. Common threads for our methods include (i) uncovering and leveraging shared biological structures across multiple contexts (molecular entities, tissues, cell types or disease subtypes), and (ii) using approximate inference procedures, such as expectation-maximisation (EM) and model-specific variational schemes, tailored to the exploration of multimodal parameter spaces.
Our methods are designed with specific contexts in mind, yet are adaptable for various scientific applications, which is facilitated by accompanying statistical implementations (see Software).
Our research receives generous support from the Lopez–Loreta Foundation.
Variable selection in sparse regression with hierarchically-related responses
Bayesian functional principal component analysis suite – with Tui Nolan
Source code for reproducing an example in a chapter published in the Handbook of Bayesian variable selection
FPCA estimates of patient disease trajectories after SARS-CoV-2 infection
Faithful replication and simulation of molecular and clinical data
Annotation-driven approach for large-scale joint regression with multiple responses
Joint graphical horseshoe for multiple network inference with shared information – creator and maintainer Camilla Lingjærde
Large-scale variational inference for variable selection in sparse multiple-response regression
Solutions and suggested code for the 1st practical session fo the Bayes4Health-CoSInES Masterclass in variational inference – with Camilla Lingjærde
Source code for reproducing a numerical example using the method “locus” on simulated data
Variable-guided network inference using Bayesian graphical spike-and-slab modelling – creator and maintainer Xiaoyue Xi
Online database gathering hits from a QTL mapping of human protein abundance in plasma
Source code for assessing the sensitivity of the method “atlasqtl” to hyperparameter settings
Source code for reproducing article on the atlasqtl R package published in the Software Corner of the IBS Bulletin