Videos

Stochastic gradient descent for noise with ML-type scaling

July 9, 2021
Abstract
There are two types of convergence results for stochastic gradient descent: (1) SGD finds minimizers of convex objective functions and (2) SGD finds critical points of smooth objective functions. We show that, if the objective landscape and noise possess certain properties which are reminiscent of deep learning problems, then we can obtain global convergence guarantees of first type under second type assumptions for a fixed (small, but positive) learning rate. The convergence is exponential, but with a large random coefficient. If the learning rate exceeds a certain threshold, we discuss minimum selection by studying the invariant distribution of a continuous time SGD model. We show that at a critical threshold, SGD prefers minimizers where the objective function is 'flat' in a precise sense.