PhD Thesis Proposal
Bridging Generative and Discriminative Learning with Diffusion Models
Abstract: Generative models have advanced significantly, synthesizing photorealistic images, videos, and text. Building on this progress, our work explores the potential of diffusion models to bridge generative and discriminative learning, uncovering new pathways for leveraging their strengths in visual perception tasks. In the first part, we propose Diff-2-in-1, a unified framework for multi-modal data generation [...]
Bring Hand to The Air: Towards Universal Aerial Manipulation
Abstract: Uncrewed Aerial Vehicles (UAVs) have attracted the interest of researchers, industry, and the general public in many applications. Noticing that high-altitude tasks sometimes require active interaction with the environment, there have been more and more works focusing on aerial manipulation recently. Each of them has demonstrated the ability to use a specific aerial manipulator [...]
Spatial Reasoning and Semantic Representations for Intelligent Multi-Robot Exploration and Navigation
Abstract: Autonomous robot exploration is widely applied in areas such as search and rescue, environmental monitoring, and structural inspection. Multi-robot exploration has garnered significant attention in the robotics research community, as it enables faster task completion and greater coverage than a single robot can achieve. However, it presents unique challenges: behavior coordination is complex, communication [...]
Leveraging Sense of Agency to Improve the Experience of Control Over Assistive Robots
Abstract: In an age of autonomous driving and robotics, we are increasingly engaging with robots that deploy autonomous assistance. Cognitive science and human-computer interaction literature tells us that, when we apply autonomy in assistive settings, we are often augmenting the user's sense of agency over the system. Sense of agency is a phenomenon from cognitive [...]
Efficient Synthetic Data Generation and Utilization for Action Recognition and Universal Avatar Generation
Abstract: Human-centered computer vision technology relies heavily on large, diverse datasets, but collecting data from human subjects is time-consuming, labor-intensive, and raises privacy concerns. To address these challenges, researchers are increasingly using synthetic data to augment real-world datasets. This thesis explores efficient methods for generating and utilizing synthetic data to train human-based computer vision models. [...]