Leena Vankadara - Scaling Insights from Infinite-Width Theory for Next Gen Architecture & Learning
Presenter
October 18, 2024
Abstract
Recorded 18 October 2024. Leena Vankadara of Amazon Research presents "Beyond muP: Scaling Insights from Infinite-Width Theory for Next Generation Architectures and Learning Paradigms" at IPAM's Theory and Practice of Deep Learning Workshop.
Abstract: Scaling is pivotal to the success of modern machine learning. However, this upscaling also introduces new challenges, such as increased training instability. In this talk, I will discuss how infinite-width theory can be utilized to establish optimal scaling rules across various architectures and learning paradigms. I will begin by discussing the scaling behaviour of Multilayer Perceptrons (MLPs) under Sharpness-Aware Minimization—a min-max learning formulation designed to enhance generalization. The analysis extends naturally to other architectures like transformers, ResNets, and CNNs. Additionally, I will discuss the scaling behaviour of structured state space models (SSMs), which have emerged as efficient alternatives to transformers. Owing to the unique structure of their transition matrices, SSMs defy conventional scaling analyses and necessitate specialized approaches. I will discuss the scaling of SSMs within the standard minimization framework, highlighting the need for and implications of specialized scaling strategies.
Learn more online at: https://www.ipam.ucla.edu/programs/workshops/workshop-ii-theory-and-practice-of-deep-learning/?tab=overview