
Matching for causal effects via multimarginal unbalanced optimal transport

May 12, 2023
Matching on covariates is a well-established framework for estimating causal effects in observational studies. A major challenge is that established methods like matching via nearest neighbors possess poor statistical properties when the dimension of the continuous covariates is high. This article introduces an alternative matching approach based on unbalanced optimal transport that possesses better statistical properties in high-dimensional settings. In particular, we prove that the proposed method dominates classical nearest neighbor matching in mean squared error in finite samples when the dimension of the continuous covariates is high enough. This notable result is already present in low dimensions, as we demonstrate in simulations. It follows from two properties of the new estimator. First, for any positive “matching radius”, the optimal matching obtained converges at the parametric rate in any dimension to the optimal population matching. This stands in contrast to the classical nearest neighbor matching, which suffers from a curse of dimensionality in the continuous covariates. Second, as the matching radius converges to zero, the method is unbiased in the population for the average treatment effect on the overlapping region. The approach also possesses several other desirable properties: it is flexible in allowing for many different ways to define the matching radius and the cost of matching, can be bootstrapped for inference, provides interpretable weights based on the cost of matching individuals, can be efficiently implemented via Sinkhorn iterations, and can match several treatment arms simultaneously. Importantly, it only selects good matches from any treatment arm, thus providing unbiased estimates of average treatment effects in the region of overlapping supports