Active Vision: Autonomous Aerial Cinematography with Learned Artistic Decision-Making - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

May

5
Tue
Rogerio Bonatti Robotics Institute,
Carnegie Mellon University
Tuesday, May 5
3:00 pm to 4:00 pm
Active Vision: Autonomous Aerial Cinematography with Learned Artistic Decision-Making

Zoom Link

Abstract:
Aerial cinematography is revolutionizing industries that require live and dynamic camera viewpoints such as entertainment, sports, and security. Fundamentally, it is a tool with immense potential to improve human creativity, expressiveness, and sharing of experiences. However, safely piloting a drone while filming a moving target in the presence of obstacles is immensely taxing, often requiring multiple highly trained human operators to safely control a single vehicle. Our research focus is to build autonomous systems that can empower any individual with the full artistic capabilities of aerial cameras. We develop a system for active vision: in other words, one that not only passively processes the incoming sensor feed, but on the contrary, actively reasons about the cinematographic quality of viewpoints and safely generate sequences of shots. The theory and systems developed in this work can impact video generation for both real-world and simulated environments, such as professional and amateur movie-making, videogames, and virtual reality.

First, we formalize the theory behind the aerial filming by incorporating cinematography guidelines into robot motion planning. We describe the problem in terms of its principal cost functions, and develop an efficient trajectory optimization framework for executing arbitrary types of shots while avoiding collisions and occlusions with obstacles.

Second, we propose and develop a system for aerial cinematography in the wild. We combine several components into a real-time framework: vision-based target estimation, 3D signed-distance mapping for collision and occlusion avoidance, and trajectory optimization for camera motion. We extensively evaluate our system both in simulation and in field experiments by filming dynamic targets moving through unstructured environments.

Third, we take a step towards learning the intangible art of cinematography. We all know a good clip when we see it – but we cannot yet objectively specify a formula. We propose the use of deep reinforcement learning with a human evaluator in the loop to guide the selection of artistic shots. Our user studies show that the learned policies can translate intuitive concepts of human aesthetics into the motion planning process.

Lastly, the proposed work extends the current system in two important directions. The first one is on learning specific styles for directing movies. A style goes beyond the binary selection of good or bad shots: even when presented with the same scene context, different directors often use distinct filming techniques. We propose to use machine learning to cluster movie styles based on the dominant features observed in demonstrations from expert directors. Then, using a generative model we will to output camera motions, conditioned on the desired director style and scene context. The second proposed research direction is on multi-camera collaboration. When capturing scenes such as sports or social events, it is difficult to capture the optimal viewpoint at all times employing a single aerial camera. In addition, unlike a movie studio, most real-world events cannot be reenacted for additional takes. Here we plan to design motion planning algorithms for multi-camera cinematography, maximizing the artistic quality of multiple viewpoints simultaneously. Using limited resources, our goal is to film a scene only once while maximizing coverage.

More Information

Thesis Committee Members:
Sebastian Scherer, Chair
Jessica Hodgins
Oliver Kroemer
Ashish Kapoor, Microsoft Research
Nathan Ratliiff, Nvidia