Videos

Clustering Spatial Transcriptomics Data with Dirichlet Process Mixture of Random Spanning Trees

Presenter
January 12, 2026
Abstract
Spatial transcriptomics has gained tremendous popularity as it allows researchers to map gene expression directly onto tissue architecture, preserving spatial context and providing high-resolution insights into cellular interactions and biological processes within their native environments. In this paper, we introduce a novel Bayesian nonparametric framework, Dirichlet process mixture of random spanning trees (DP-RST), designed to detect an unknown number of possibly non-convex clusters in possibly non-convex spatial domains. The model’s two-layer partitioning effectively addresses challenges posed by the intricate spatial organization of tissue samples, such as non-convex clusters and irregular spatial boundaries of the samples. Through simulation studies, DP-RST demonstrates superior clustering accuracy compared to existing methods. We apply DP-RST to our motivating mouse colonic dataset during healing from inflammatory damage, revealing meaningful clusters associated with different stages of tissue repair. Differential gene expression analysis highlights key genes with spatially distinct patterns, revealing the compartmentalization of immune, metabolic, and regenerative processes during mucosal healing. To demonstrate the broad applicability of DP-RST, we analyze four additional spatial transcriptomics datasets generated by the 10x Visium platform. Supplementary materials, including extended results and the code that implements our method, are available online.