Dan Yamins - Climbing the Ladder of Causation Toward Mid-Level Vision - IPAM at UCLA
Presenter
September 23, 2024
Abstract
Recorded 23 September 2024. Dan Yamins of Stanford University presents "Climbing the Ladder of Causation Toward Mid-Level Vision" at IPAM's Analyzing High-dimensional Traces of Intelligent Behavior Workshop.
Abstract: Humans and (almost certainly) many animals possess extremely rich low-to-mid-level visual scene understanding concepts, including (inter alia) the ability to estimate contours and border ownership, optical flow and self-induced motion, a "2.5D sketch" of monocular depth and surface normals, various kinds of segmentation, 3D shape, and materials. But where do these concepts come from, especially given that real biological organisms are definitely not getting detailed supervision for all these rich visual properties? I will present a working theory of how they arise, based on recent work from my lab on Counterfactual World Models (CWMs). Specifically, I will describe a specific form of masked prediction that enables the training of large-scale predictive models that organically possess high-quality causally-informative tokens. I will then show how a wide variety of mid-level visual concepts arise via performing simple generic interventions on these tokens, and computing counterfactual effects therefrom. The resulting model takes a step towards a generic unsupervised algorithm for visual scene understanding in machines, and helps us formulate novel and intriguing hypotheses for the origins of biological vision.
Learn more online at: https://www.ipam.ucla.edu/programs/workshops/workshop-i-analyzing-high-dimensional-traces-of-intelligent-behavior/?tab=overview