Exploiting multiple modalities for interactive video retrieval

Michael G. Christel, Chang Huang, Neema Moraveji, and Norman Papernick

Conference Paper, Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04), pp. 1032 - 1035, May, 2004

Abstract

Aural and visual cues can be automatically extracted from video and used to index its contents. The paper explores the relative merits of the cues extracted from the different modalities for locating relevant shots in video, specifically reporting on the indexing and interface strategies used to retrieve information from the Video TREC 2002 and 2003 data sets, and the evaluation of the interactive search runs. For the documentary and news material in these sets, automated speech recognition produces rich textual descriptions derived from the narrative, with visual descriptions and depictions offering additional browsing functionality. Through speech and visual processing, storyboard interfaces with query-based filtering provide an effective interactive retrieval interface. Examples drawn from the Video TREC 2002 and 2003 search topics and results using these topics illustrate the utility of multiple-document storyboards and other interfaces incorporating the results of multimodal processing.

BibTeX

@conference{Christel-2004-126429,
author = {Michael G. Christel and Chang Huang and Neema Moraveji and Norman Papernick},
title = {Exploiting multiple modalities for interactive video retrieval},
booktitle = {Proceedings of IEEE International Conference on Acoustics, Speech, and Signal Processing (ICASSP '04)},
year = {2004},
month = {May},
pages = {1032 - 1035},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.