Videos

Momentum in Stochastic Gradient Descent and Deep Neural Nets

Presenter
January 29, 2020
Abstract
Bao Wang University of California, Los Angeles (UCLA) Mathematics Stochastic gradient-based optimization algorithms play perhaps the most important role in modern machine learning, in particular, deep learning. Nesterov accelerated gradient (NAG) is a celebrated technique to accelerate gradient descent, however, the NAG technique will fail in stochastic gradient descent (SGD). In this talk, I will discuss some recent progress in leveraging NAG and restart techniques to accelerate SGD. Also, I will discuss how to leverage momentum to design deep neural nets in a mathematically mechanistic manner. This is joint work with Tan Nguyen, Richard Baraniuk, Andrea Bertozzi, and Stan Osher.
Supplementary Materials