Videos

Is your distribution in shape?

Presenter
October 10, 2022
Abstract
Algorithms for understanding data generated from distributions over large discrete domains are of fundamental importance.  In this talk, we consider the sample complexity of *property testing algorithms* that seek to to distinguish whether or not an underlying distribution satisfies basic shape properties.   Examples of such properties include convexity, log-concavity, heavy-tailed, and approximability by k-histogram functions.  In this talk, we will focus on the property of *monotonicity*, as tools developed for distinguishing the monotonicity property have proven to be useful for the all of the above properties as well as several others.  We say a distribution p is "monotone" if for any two comparable elements x<y in the domain, we have that p(x)<p(y). For example, for the classic n-dimensional hypercube domain, in which domain elements are described via n different features, monotonicity implies that for every element, an increase in the value of one of the features can only increase its probability. We recount the development over the past nearly two decades of monotonicity testing algorithms for distributions over various discrete domains, which make no a priori assumptions on the underlying distribution.   We study the sample complexity for testing whether a distribution is monotone as a function of the size of the domain, which can vary dramatically depending on the structure of the underlying domain.  Not surprisingly, the sample complexity over high dimensional domains can be much greater than over low dimensional domains of the same size.  Nevertheless, for many important domain structures, including high dimensional domains, the sample complexity is sublinear in the size of the domain.   In contrast, when no a priori assumptions are made about the distribution,  learning the distribution requires sample complexity that is linear in the size of the domain. The techniques used draw tools from a wide spectrum of areas, including statistics, optimization, combinatorics, and computational complexity theory.