Generative Robotics: Self-Supervised Learning for Human-Robot Collaborative Creation - Robotics Institute Carnegie Mellon University
Loading Events

PhD Thesis Proposal

September

18
Wed
Peter Schaldenbrand PhD Student Robotics Institute,
Carnegie Mellon University
Wednesday, September 18
12:00 pm to 1:30 pm
NSH 4305
Generative Robotics: Self-Supervised Learning for Human-Robot Collaborative Creation

Abstract:
While Generative AI has shown breakthroughs in recent years in generating new digital contents such as images or 3D models from high-level goal inputs like text, Robotics technologies have not, instead focusing on low-level goal inputs. We propose Generative Robotics, as a new field of robotics which combines the high-level goal input abilities of large models pretrained on huge datasets with real-world robotics capabilities. We show that existing Generative Robotics systems either make simplifications with engineered solutions or lack a proper connection between the real-world constraints and the foundation model used to generate what the robot will create.

Towards a highly intelligent Generative Robotics system, we posit that existing robot datasets are too small to support a supervised, end-to-end approach. Since training text-to-image generative AI models requires hundreds of millions of text-image pairs, it may, for example, require as many text-painting pairs to train a robot to paint from text inputs. However, we hypothesize that existing large datasets along with limited additional robot data can be used to couple real-world constraints and challenges into the robot’s planning of both what the robot will create and how the robot will create it. We introduce a Framework and Robotics Initiative for Developing Arts (FRIDA) to test this hypothesis in the domain of 2D painting and drawings.

For generalized action-planning with a variety of tools and materials, FRIDA leverages a small dataset of self-generated robot data to create a differentiable, low-level action planning environment using Real2Sim2Real methodology. This low-level planner can create paintings and drawings from input images, matching the semantics of the image rather than the pixel-level details. For high-level planning of what to make from high-level input goals, FRIDA leverages large models pretrained on huge, non-robotic datasets. To ground these pretrained models in real-world constraints, such as a robot’s abilities and materials, the low-level planner is used to quickly generate many simulated examples of text-painting pairs for adapting the pretrained model in a process we call Self-Supervised Fine-Tuning.

The completed work demonstrates that FRIDA can perform 2-Dimensional Generative Robotics: Abstractly representing content from an input photograph with only a few brush strokes, painting from high-level inputs like text and sound, and collaboratively drawing with a human user by understanding what they drew and adding additional content. We propose to validate that FRIDA is a highly intelligent Generative Robotics system by adapting it for the 3-Dimensional domains of wood relief carving, pumpkin carving, and clay sculpture.

Thesis Committee Members:
Jean Oh, Chair
James McCann
Manuela Veloso
Ken Goldberg, UC Berkeley

More Information