Videos

Explanation of variability in data through optimal transport

Presenter
October 21, 2016
Abstract
A methodology based on the theory of optimal transport is developed to attribute variability in data sets to known and unknown factors and to remove such attributable components of the variability from the data. Denoting by $x$ the quantities of interest and by $z$ the explanatory factors, the procedure transforms $x$ into filtered variables $y$ through a $z$-dependent map, so that the conditional probability distributions $ ho(x|z)$ are pushed forward into a target distribution $mu(y)$, independent of $z$. Among all maps and target distributions that achieve this goal, the procedure selects the one that minimally distorts the original data: the barycenter of the $ ho(x|z)$. We will discuss the relevance of this methodology to medicine and biology, including the amalgamation of data sets and removal of batch effects, the analysis of time series, the analysis of dependence among variables and the discovery of previously unknown variability factors.