Activity Detection in Untrimmed Surveillance Videos - Robotics Institute Carnegie Mellon University

Activity Detection in Untrimmed Surveillance Videos

Seunghwan Cha
Master's Thesis, Tech. Report, CMU-RI-TR-21-61, Robotics Institute, Carnegie Mellon University, August, 2021

Abstract

Accurately detecting activities in untrimmed videos is a challenging task as systems need to handle variance in object scales, multiple viewpoints, and multiple types of activities. Furthermore, in a real-world scenario, activity detectors are often required to detect novel kinds of activities when the need arises from end-users. To address these issues, we first
build an activity classifier on the known activities using detection-based proposals. Then, we propose a retrieval-based solution that utilizes both visual and textual queries for detecting novel types of activities.

For known activity classification, a sequence of object detection, optical flow, and hierarchical clustering are run to obtain spatiotemporal proposals. Then, a multilabel loss is used for optimizing the TSM model. Our trained action classifier demonstrates classification and scene generalization capability by performing competitively on the public MEVA test set and the Known Activity Leaderboards from ActEV Challenge.

For the vision-based retrieval, the penultimate features from the trained TSM are extracted on both query and gallery proposals. The averaged features from the query proposals are compared against the pool of gallery proposals to select the top-ranked proposals as detected activity instances. We also explore a language-based retrieval system that can utilize the
textual descriptions of the unseen activities. An image-text model called CLIP is used to extract textual and visual features from the given examples. The same retrieval technique from the vision-based approach is applied for final predictions. Our proposed system ranked 1st place on the Surprise Activity Leaderboard from ActEV Challenge. We hope that the proposed system can help facilitate the successful deployment of activity detection in the real world.

BibTeX

@mastersthesis{Cha-2021-129144,
author = {Seunghwan Cha},
title = {Activity Detection in Untrimmed Surveillance Videos},
year = {2021},
month = {August},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-61},
}