Bayesian Mixture Models for Histograms with Applications to Large Datasets
Presenter
January 13, 2026
Abstract
In many real-world scenarios, especially those involving privacy constraints or data summarization, data are available only in aggregated forms such as histograms or frequency tables. This work introduces a novel Bayesian method for inferring the underlying population distribution by fitting a mixture model to binned data. While we focus on mixtures of normal distributions, the framework is flexible, and can be extended to other distributional families. We place a prior on the number of mixture components, accommodating both finite and countably infinite mixtures, and perform inference using reversible jump MCMC. The proposed approach demonstrates strong performance on large-scale data, showcasing the potential of nonparametric Bayesian modeling in practical applications. Furthermore, we extend the method to model multiple histograms simultaneously and clustering them using the Dirichlet process. This enables information sharing across populations and provides a principled posterior probability for assessing homogeneity between groups. Some theoretical results supporting performance of the proposed methodology are also discussed.