Touch provides a direct window into robot-object interaction, free from occlusion and aliasing faced by visual sensing. Collated tactile perception can facilitate contact-rich tasks—like in-hand manipulation, sliding, and grasping. Here, online estimates of object geometry and pose are crucial for downstream planning and control. With significant advances in tactile sensing, like vision-based touch, a general technique for the underlying inference is much desired.
Tactile inference is analogous to the simultaneous localization and mapping (SLAM) problem—just as mobile robots use odometry and cameras, manipulators have access to end-effector poses and vision-based touch. However, the domain also comes with its own set of unique challenges and opportunities. First, tactile signals are local; they provide situated but detailed information that must be fused over long time-horizons. Second, interactive perception is intrusive; the act of sensing itself perturbs the object. Finally, how can objects be represented so as to efficiently reconstruct them with touch at sufficient level-of-detail, while being compatible with downstream applications?
In this thesis, we propose a tactile SLAM framework to reconstruct objects from interaction. First, in completed work, we introduce ShapeMap 3D for visuo-tactile mapping with implicit surfaces. Then, we present MidasTouch, a framework for particle filtering from vision-based touch. Finally, we discuss shape and pose estimation for planar manipulation, a precursor to generalized 3D SLAM.
In proposed research, we present online neural mapping with vision-based touch. Then, building on completed work, we explore its extension to dexterous 3D reconstruction. Finally, we discuss the applications and evaluation of the proposed work for multi-finger interactive perception.
Thesis Committee Members:
Michael Kaess, Chair
Nancy Pollard
Shubham Tulsiani
Mustafa Mukadam, Meta AI (FAIR)
Alberto Rodriguez, Massachusetts Institute of Technology