Learning models: connections between boosting, hard-core distributions, dense models, GAN, and regularity I
Presenter
November 13, 2017
Abstract
A theme that cuts across many domains of computer science and mathematics is to find simple representations of complex mathematical objects such as graphs, functions, or distributions on data. These representations need to capture how the object interacts with a class of tests, and to approximately determine the outcome of these tests.
For example, a scientist is trying to find a mathematical model explaining observed data about some phenomenon, such as kinds of butterflies in a forest. A minimal criterion for success is that the model should accurately predict the results of future observations. When is this possible? This general situation arises in many contexts in computer science and mathematics. In machine learning, the object might be a distribution on data points, high dimensional real vectors, and the tests might be half-spaces. The goal would be to learn a simple representation of the data that determines the probability of any half-space or possibly intersections of half spaces. In computational complexity, the object might be a Boolean function or distribution on strings, and the tests are functions of low circuit complexity. In graph theory, the object is a large graph, and the tests are the cuts In the graph; the representation should determine approximately the size of any cut. In additive combinatorics, the object might be a function or distribution over an Abelian group, and the tests might be correlations with linear functions or polynomials.
This talk is to illustrate the connections between these various contexts.
Boosting: From machine learning, a boosting algorithm converts a weak learner, that finds a hypothesis that has a small correlation to an unknown , to a strong learner, that finds a hypothesis agreeing with the function almost everywhere.
Hard-core distributions: From complexity theory, a hard-core distribution lemma says that any Boolean function which cannot be computed with success more than $1-\delta$ on random inputs, has a sub-distribution of size $2 \delta$ where the function is pseudo-random.
Dense model theorems: Initially from additive combinatorics. Dense model theorems say that any distribution either has a simple witness that it is small (has low min-entropy) or has a simple indistinguishable distribution that is large (high min-entropy).
GAN-type algorithms: From machine learning, generative adversarial networks (GANs) are algorithms that use a distinguisher that finds differences between a target distribution and a sampling algorithm in order to refine the sampling algorithm. Algorithmic versions of dense model theorems can be viewed as analogous to GAN's in many ways, but can have stronger provable guarantees.
Regularity lemmas: Initially from graph theory, a regularity lemma shows that an arbitrary mathematical object has a simply described approximation.
In particular, in this talk, we'll focus on how certain families of boosting algorithms can be used to construct GAN-like algorithms. This construction is based on a reduction from dense model theorems from additive number theory to the hardcore distribution lemma from computational complexity. We call these algorithms GAB, generation from adversarial boosting. These algorithms will have strong theoretical guarantees, for when they will succeed in finding a similar distribution to the samples, and a guarantee about not over-fitting the distribution generated to the specific samples. Arora, Ge, Liang, Ma and Zhang observed that standard GAN's can converge to distributions that are much lower entropy than the sampled distribution, and Arora and Zhang have performed empirical studies showing that this indeed occurs in practice, at least some of the time. In contrast, we give a version of GAB that is guaranteed to produce a distribution with entropy that is not only within a fraction of a bit of the sampled distribution, but within the same fraction from any distribution that is indistinguishable from it using small threshold circuits. If the GAB algorithm fails to produce a high entropy distribution indistinguishable from the sampled distribution, it instead produces a witness that no such distribution exists.
We'll also show some applications to generalizing weak regularity lemmas in graphs, leakage-resilient cryptography, and generalizing Chang's inequality in Fourier analysis of Boolean functions.