Abstract:
In this talk, I will focus on the problem of multi-object tracking in crowded scenes. Tracking within crowds is particularly challenging due to heavy occlusion and frequent crossover between tracking targets. The problem becomes more difficult when we only have noisy bounding boxes due to background and neighboring objects. Existing tracking methods try to solve the problem from two angles: motion modeling and appearance matching. However, motion modeling is usually limited to simple motion assumptions, and appearance matching is unreliable when occlusion is severe. To study multi-object tracking in scenarios with complex motion and high occlusion, we propose the DanceTrack dataset. In DanceTrack, people with similar or even uniform appearances move in trajectories with high occlusion and frequent crossover, making it an important and more difficult platform on which to evaluate multi-object tracking. Building on this dataset, we discuss ways to improve the performance of multi-object tracking within crowds by considering motion modeling and appearance matching. We find that even a simple Kalman filter can achieve state-of-the-art performance if proper care is given to handling occlusion. Further, we propose a hierarchical representation to achieve better robustness when distinguishing an object among others to track it consistently across video frames.
Committee:
Kris Kitani
Deva Ramanan
David Held
Rawal Khirodkar