Videos

From stochastic gradient descent to Wasserstein gradient flows

Presenter
May 8, 2020
Keywords:
  • Neural networks
  • Mean field
  • Wasserstein gradient flow
MSC:
  • 35Q68
  • 60K35
Abstract
Modern neural networks contain millions of parameters, and training them requires to optimize a highly non-convex objective. Despite the apparent complexity of this task, practitioners successfully train such models using simple first order methods such as stochastic gradient descent (SGD). I will survey recent efforts to understand this surprising phenomenon using tools from the theory of partial differential equations. Namely, I will discuss a mean field limit in which the number of neurons becomes large, and the SGD dynamics is approximated by a certain Wasserstein gradient flow. [Joint work with Adel Javanmard, Song Mei, Theodor Misiakiewicz, Marco Mondelli, Phan-Minh Nguyen]