Misha Belkin - Emergence and grokking in "simple" architectures - IPAM at UCLA
Presenter
October 18, 2024
Abstract
Recorded 18 October 2024. Misha Belkin of the University of California, San Diego, presents "Emergence and grokking in "simple" architectures" at IPAM's Theory and Practice of Deep Learning Workshop.
Abstract: In recent years transformers have become a dominant machine learning methodology.
A key element of transformer architectures is a standard neural network (MLP). I argue that MLPs alone already exhibit many remarkable behaviors observed in modern LLMs, including emergent phenomena. Furthermore, despite large amounts of work, we are still far from understanding how 2-layer MLPs learn relatively simple problems, such as "grokking" modular arithmetic. I will discuss recent progress and will argue that feature-learning kernel machines (Recursive Feature Machines) isolate some key computational aspects of modern neural architectures and are preferable to MLPs as a model for analysis of emergent phenomena.
Learn more online at: https://www.ipam.ucla.edu/programs/workshops/workshop-ii-theory-and-practice-of-deep-learning/?tab=overview