Attention-guided Augmentation of Animations, Stills, and Videos - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

July

20
Wed
Eakta Jain Carnegie Mellon University
Wednesday, July 20
1:30 pm to 12:00 am
Attention-guided Augmentation of Animations, Stills, and Videos

Event Location: NSH 1305

Abstract: Artists who work with visual media consider what a viewer will look at, how their attention will flow across the scene. They understand that attention is a limited resource and train themselves to use this resource well. In this thesis, our key insight is that algorithms to digitally manipulate artist-created visual media should be aware of how attention is allocated. We present algorithms that leverage this insight to augment visual media with digitally created effects. In particular, we demonstrate results on animations, stills, and videos.


Animations: Two widely practiced forms of animation are two-dimensional (2D) hand-drawn animation and three-dimensional computer-generated (3D CG) animation. Animation techniques developed for one medium do not immediately carry over to the other medium. To apply the techniques of the 3D medium to 2D animation, researchers have attempted to compute 3D reconstructions of the shape and motion of the hand-drawn character, which are meant to act as their ‘proxy’ in the 3D environment. The most significant challenge in computing 3D reconstructions of 2D shapes is inferring the missing third dimension. A second challenge for hand drawings is that the character is a consistent 3D object in CG animation while 2D artists routinely introduce geometric inconsistencies to add expressiveness to their drawings. In this thesis, we argue that a perfect reconstruction is excessive because it does not leverage the fact that attention is a limited resource. We propose a 3D proxy with different levels of detail, such that at each level the error terms account for quantities that will attract viewer attention. We show that the different levels of detail can be generated by reinterpreting three basic error terms.


Stills: Moves-on-stills is a technique to engage the viewer while presenting still pictures, such as paintings or photographs, on television or in movies. This effect is widely used to create ‘motion comics’ from comic book material. State of the art software, like iMovie, allows a user to specify the parameters of the camera move; it does not solve the problem of how the parameters are chosen. Because artists compose their pictures to direct the viewer’s attention along a deliberate path, we argue that a good camera move respects this visual route. Though there are no algorithms to compute artistic intent, if we record the gaze of viewers looking at the pictures, we could reconstruct the artist’s intention (assuming that he or she was successful in directing viewer attention). In this thesis, we propose an algorithm to predict the parameters of camera moves-on-stills based on aggregate statistics derived from the visual attention records of multiple viewers. We find that there is increased agreement in the gaze data of people looking at composed sequential art versus random photographs. We show the ‘motion comics’ created by our algorithm from sequential (comic) art and compare our results against a professionally created DVD.

Video: Video data is increasingly manipulated digitally before it reaches consumers. Two manipulations are particularly useful in making video content available to a global audience—retargeting to different display devices, and subtitling in different languages. However, both manipulations potentially alter the flow of viewer attention through the video by adding, removing, or distorting the content. We propose that algorithms to manipulate videos should minimally distort the original visual route taken by the viewers by incorporating information from viewer gaze data.


The problem of video retargeting is to alter the original video to fit the new display size, while best preserving content and minimizing any artifacts. State of the art retargeting techniques define content as color, edges, faces and other image-based saliency features. We suggest that content is, in fact, what people look at. Eyetracking data would tell us what to preserve in a video during retargeting. Subtitling is a popular way to present foreign movies or TV productions in the native language. However, viewers often miss the richness of the scene (the actors’ expressions, secondary movement) because they spend their visual attention on the black subtitle bar. We propose that closed captions be placed on the screen, close to where the viewer would be looking in the original video: gaze recordings provide us with the necessary information to make this possible. Such a placement of closed captions might preserve the original soundtrack, without straining attentional resources.

Committee:Jessica Hodgins, Co-chair

Yaser Sheikh, Co-chair

Nancy Pollard

Adam Finkelstein, Princeton University