Videos

Wu Lin - A framework for designing (non-diagonal) adaptive training methods - IPAM at UCLA

Presenter
October 14, 2024
Abstract
Recorded 14 October 2024. Wu Lin of the Vector Institute presents "A framework for designing (non-diagonal) adaptive training methods" at IPAM's Theory and Practice of Deep Learning Workshop. Abstract: Optimization is an essential ingredient of deep learning. Many optimization problems can be reformulated from a probabilistic perspective to exploit the Fisher-Rao (Riemannian) geometric structure of a probability family. In this talk, we show how to design new quasi-Newton methods for large-scale neural network (NN) training by leveraging the geometric structure. We first establish a second-order view of adaptive methods such as RMSProp and full-matrix AdaGrad when the square root is removed from their preconditioned gradient step. After that, we introduce and use preconditioner invariance to make non-diagonal adaptive methods inverse-free while keeping their preconditioner structures for modern mini-batches training with low precision. Finally, we propose Kronecker-factored adaptive methods to bridge the computation gap between non-diagonal and diagonal adaptive methods and demonstrate the advantages of our methods for training large NNs in half-precision by removing numerically unstable and computationally intensive matrix decompositions and inversions. Learn more online at: https://www.ipam.ucla.edu/programs/workshops/workshop-ii-theory-and-practice-of-deep-learning/?tab=overview