Videos

Aishni Parab - Extracting Structured Data from Multi-Modal Input - IPAM at UCLA

Presenter
November 7, 2024
Abstract
Recorded 07 November 2024. Aishni Parab of the University of California, Los Angeles, presents "Extracting Structured Data from Multi-Modal Input" at IPAM's Naturalistic Approaches to Artificial Intelligence Workshop. Abstract: In many real-world images, text and visual elements coexist seamlessly — appearing in tables, charts, road signs and maps. These multi-modal images tightly integrate vision and language, requiring precise extraction methods to preserve the semantic richness of both modalities. For example, extracting a table's structure and content requires precision to preserve both its layout and meaning. Programs serve as a powerful, interpretable representation for extracting information from such images. They can be executed to accurately reproduce the image digitally and integrate with software tools like spreadsheets for downstream workflows. Additionally, programs provide disentangled representations that isolate components and relations between these components, enabling precise manipulation without affecting the whole. Programs also generalize well by abstracting patterns independent of content, supporting reusable templates and scalable operations across datasets and domains. In this talk, I will explore key techniques for translating multi-modal data into code, with a focus on structured data extraction from tables. I will highlight purely neural methods, neuro-symbolic approaches, and modern LLM-based techniques, discussing their strengths, limitations, and the challenges involved. Learn more online at: https://www.ipam.ucla.edu/programs/workshops/workshop-iii-naturalistic-approaches-to-artificial-intelligence/?tab=overview