Towards Spatial Intelligence for Behaviors and Environments - Robotics Institute Carnegie Mellon University
Loading Events

Faculty Events

February

21
Fri
Laszlo A. Jeni Assistant Research Professor Robotics Institute,
Carnegie Mellon University
Friday, February 21
12:00 pm to 1:00 pm
Newell-Simon Hall 4305
Towards Spatial Intelligence for Behaviors and Environments
Abstract: We are in an era of foundation models and spatial intelligence (AR/VR). Despite significant advancements in natural language processing for reasoning, other modalities like vision lag behind, offering limited contributions: current video-language models (VLMs) struggle even with basic spatial reasoning tasks. The challenge lies in the disparate training needs of different modalities. To enhance spatial reasoning, we must elevate vision to a higher semantic level (e.g., geometry), aligning it with language to achieve multimodal reasoning. Developing models that can reason about dynamic environments, behaviors, and interactions via multimodal inputs requires three key innovations: 1) universal 3D lifting to semantic representation for reasoning, 2) more efficient Vision Transformers (ViTs) for spatial tasks, and 3) robust data collection and benchmarking frameworks. In this presentation, I will discuss ongoing projects in my lab aimed at these innovations.