MathInstitutes.org

On the existence of wide flat minima in neural network landscapes: analytic and algorithm approaches

Presenter

Carlo Baldassi

November 20, 2019

On the existence of wide flat minima in neural network landscapes: analytic and algorithm approaches Thumbnail

Abstract

Carlo Baldassi - Bocconi University The techniques currently used for training neural networks are often very effective at avoiding overfitting and find solutions that generalize well, even when applied to very complex architectures in an overparametrized regime. This phenomenon is currently poorly understood. Building on a framework that we have been developing in recent years, based on a large-deviation statistical physics analysis, we have studied analytically, numerically and algorithmically the structural properties of simplified models in relation to the existance and accessibility of so-called "wide flat minima" of the loss function. We have investigated the effect of the ReLU transfer function and of the cross-entropy loss function, contrasted these devices with others that don't exhibit the same phenomena, and developed message-passing and greedy local-search algorithms that exploit the analytical findings.

Abstract

Supplementary Materials

On the existence of wide flat minima in neural network landscapes: analytic and algorithm approaches

Videos

On the existence of wide flat minima in neural network landscapes: analytic and algorithm approaches

Presenter

Abstract

Supplementary Materials