Learning to Represent and Accurately Arrange Food Items

Master's Thesis, Tech. Report, CMU-RI-TR-21-05, Robotics Institute, Carnegie Mellon University, May, 2021

View Publication

Abstract

Arrangements of objects are commonplace in a myriad of everyday scenarios. Collages of photos at one’s home, displays at museums, and plates of food at restaurants are just a few examples. An efficient personal robot should be able to learn how an arrangement is constructed using only a few examples and recreate it robustly and accurately given similar objects. Small variations, due to differences in object sizes and minor misplacements, should also betaken into account and adapted to in the overall arrangement. Furthermore, the amount of error when performing the placement should be small relative to the objects being placed. Hence, tasks where the objects can be quite small, such as food plating, require more accuracy. However, robotic food manipulation has its own challenges, especially when modeling the material properties of diverse and deformable food items. To deal with these issues, we propose a framework for learning how to produce arrangements of food items. We evaluate our overall approach on a real world arrangement task that requires a robot to plate variations of Caprese salads.

In the first part of this thesis, we propose using a multimodal sensory approach to interacting with food that aids in learning embeddings that capture distinguishing properties across food items. These embeddings are learned in a self-supervised manner using a triplet loss formulation and a combination of proprioceptive, audio, and visual data. The information encoded in these embeddings can be advantageous for various tasks, such as determining which food items are appropriate for a particular plating design. Additionally, we present a rich dataset of 21 unique food items with varying slice types and properties, which is collected autonomously using a robotic arm and an assortment of sensors. We perform additional evaluations that show how this dataset can be used to learn embeddings that can successfully increase performance in a wide range of material and shape classification tasks by incorporating interactive data.

In the second part of this thesis, we propose a data-efficient local regression model that can learn the underlying pattern of an arrangement using visual inputs, is robust to errors, and is trained on only a few demonstrations. To reduce the amount of error this regression model will encounter at execution time, a complementary neural network is trained on depth images to predict whether a given placement will be stable and result in an accurate placement. We demonstrate how our overall framework can be used to successfully pro-duce arrangements of Caprese salads.

BibTeX

@mastersthesis{Lee-2021-127358,
author = {Steven Lee},
title = {Learning to Represent and Accurately Arrange Food Items},
year = {2021},
month = {May},
school = {Carnegie Mellon University},
address = {Pittsburgh, PA},
number = {CMU-RI-TR-21-05},
keywords = {Robotics, Manipulation, Machine Learning, Deep Learning, Food Manipulation, Simulation},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.