Vision-based Human Motion Modeling and Analysis - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

September

23
Mon
Jinkun Cao PhD Student Robotics Institute,
Carnegie Mellon University
Monday, September 23
9:00 am to 10:30 am
NSH 4305
Vision-based Human Motion Modeling and Analysis

Abstract:
Modern computer vision has achieved remarkable success in tasks such as detecting, segmenting, and estimating the pose of humans in images and videos, reaching or even surpassing human-level performance. However, they still face significant challenges in predicting and analyzing future human motion. This thesis explores how vision-based solutions can enhance the fidelity and accuracy of human motion modeling and analysis.

We first studied multi-object tracking by linking static human localization results with temporal data. We investigated correlations between human detections over time, using either motion or appearance matching. While learning-based methods have dominated appearance matching, we found that classic linear filtering methods excel in motion-based matching. Our proposed methods offer new insights into human motion tracking and establish effective baselines for future research, highlighting the value of filtering-based methods alongside modern learning-based approaches. By conditioning on pixel data from later time steps, object tracking solutions remain deterministic. However, humans can anticipate the future positions of moving objects with multiple hypotheses.

Following the research on tracking, we examine human motion from a probabilistic perspective. We developed an effective method for reversible distribution transformation in human trajectory forecasting. Confronted with asymmetric and imbalanced trajectory distributions, we challenged the common assumption of deriving the target trajectories using a symmetric unimodal Gaussian distribution as the original distribution. While this assumption is theoretically acceptable, it restricts the effectiveness of deriving an asymmetric, multi-modal target distribution with finite training data. By an adaptive construction of mixed Gaussian distributions instead, we achieved significant improvements in controllability, diversity, and accuracy for future trajectory modeling.

Finally, transitioning from coarse-grained representations of human position, we investigated fine-grained body articulation and deformation. We propose two new tasks in progress. First, we aim to understand how humans interact with objects and other individuals, leading to a prior distribution of plausible and natural locomotion and articulation. Second, we seek to leverage generative human motion priors to constrain vision-based motion estimation, enhancing accuracy, robustness against occlusion and blurring, and temporal consistency.

Thesis Committee Members:
Kris Kitani, Chair
Deva Ramanan
Shubham Tulsiani
Siyu Tang, ETH Zurich