From clustering with graph cuts to isoperimetric inequalities: quantitative convergence rates of Cheeger cuts on data clouds
Presenter
April 24, 2020
Abstract
Nicolás García Trillos - University of Wisconsin-Madison
Graph cuts have been studied for decades in the mathematics and computer science communities, and in modern applications in machine learning have been used to formulate optimization problems for data clustering. A canonical example with historical motivation is the so called Cheeger cut problem. This problem is on the one hand intuitively motivated, but on the other, is highly non-convex with a pessimistic NP hard label stamped on it (at least in a worst case scenario setting). Despite this, in the past decade or so, several algorithmic improvements made the minimization of Cheeger cuts more feasible, and at the same time there was a renewed interest in studying statistical properties of Cheeger cuts. New analytical ideas have provided new tools to attack problems that were elusive using classical approaches from statistics and statistical learning theory. Despite the advances, several questions remain unanswered. The purpose of this talk is to present some of these theoretical developments, with emphasis on new results where, for the first time, high probability converge rates of Cheeger cuts of proximity graphs over data clouds are deduced. These quantitative convergence rates are obtained by building bridges between the original clustering problem and another field within the mathematical analysis community that has seen enormous advancements in the past few years: quantitative isoperimetric inequalities. This connection serves as a metaphor for how the mathematical analyst may be able to contribute to answer theoretical questions in machine learning, and how one may be able to deduce statistical properties of solutions to learning optimization problems that have a continuum counterpart.