Employing signed TV broadcasts for automated learning and recognition of British Sign Language - Robotics Institute Carnegie Mellon University
Loading Events

VASC Seminar

July

21
Wed
Patrick Buehler Ph.D. Candidate University of Oxford
Wednesday, July 21
11:00 pm to 12:00 pm
Employing signed TV broadcasts for automated learning and recognition of British Sign Language

Event Location: NSH 1507
Bio: Patrick Buehler is a Ph.D. Candidate at the University of Oxford, in the group of Prof. Andrew Zisserman. Before moving to England, he worked from 2004 to 2005 as Research Engineer in Japan. In 2004, he received a M.Sc. degree in Computer Science from the University of Mannheim, Germany. During his studies, he spent a year on a Fulbright Scholarship at the University of Massachusetts, Amherst. His interests and passion are in the fields of Computer Vision and Machine Learning. He has worked on several projects in these fields, including body pose estimation, gesture recognition, or human drowsiness estimation.

Abstract: In this talk I will present our work on British Sign Language (BSL) recognition. Specifically, I will (i) show how we detect the pose of a signer (arms, head and body) to find the position of the hands; and (ii) demonstrate that BSL signs can be learned automatically using signing footage and simultaneously broadcasted subtitles taken from TV.

Detecting the pose of a signer is cast as inference in a generative model of the image. Under this model, limb detection is expensive due to the very large number of possible configurations each part can assume. We make the following contributions to reduce this cost: (i)using efficient sampling from a pictorial structure proposal distribution to obtain reasonable configurations; (ii) identifying a large set of frames where correct configurations can be inferred, and
using temporal tracking elsewhere.

Learning BSL signs automatically from TV broadcasts is achieved using the supervisory information available from subtitles broadcast simultaneously with the signing. We propose (i) a distance function to match signing sequences which includes the trajectory of both hands, the hand shape and orientation; (ii) show that by optimizing a scoring function based on multiple instance learning, the proposed method is able to extract the sign of interest from hours of signing footage, despite the very weak and noisy supervision; and subsequently (iii) use these automatically extracted signing examples to train deterministic signer-independent sign classifiers.