Unsupervised Temporal Commonality Discovery
Abstract
Unsupervised discovery of commonalities in images has recently attracted much interest due to the need to find correspondences in large amounts of visual data. A natural extension, and a relatively unexplored problem, is how to discover common semantic temporal patterns in videos. That is, given two or more videos, find the subsequences that contain similar visual content in an unsupervised manner. We call this problem Temporal Commonality Discovery (TCD). The naive exhaustive search approach to solve the TCD problem has a computational complexity quadratic with the length of each sequence, making it impractical for regular-length sequences. This paper proposes an efficient branch and bound (B&B) algorithm to tackle the TCD problem. We derive tight bounds for classical distances between temporal bag of words of two segments, including ℓ1, intersection and χ2. Using these bounds the B&B algorithm can efficiently find the global optimal solution. Our algorithm is general, and it can be applied to any feature that has been quantified into histograms. Experiments on finding common facial actions in video and human actions in motion capture data demonstrate the benefits of our approach. To the best of our knowledge, this is the first work that addresses unsupervised discovery of common events in videos.
BibTeX
@conference{Chu-2012-7631,author = {Wen-Sheng Chu and Feng Zhou and Fernando De la Torre Frade},
title = {Unsupervised Temporal Commonality Discovery},
booktitle = {Proceedings of (ECCV) European Conference on Computer Vision},
year = {2012},
month = {October},
pages = {373 - 387},
keywords = {Unsupervised commonality discovery, time series analysis, temporal bag of words, branch and bound algorithms},
}