Continual Personalization of Human Actions with Prompt Tuning
Abstract:
In interactive computing devices (VR/XR headsets), users interact with the virtual world using hand gestures and body actions. Typically, models deployed in such XR devices are static and limited to their default set of action classes. The goal of our research is to provide users and developers with the capability to personalize their experience by adding new action classes to their device models continually. Importantly, a user should be able to add new classes in a low-shot and efficient manner, while this process should not require storing or replaying any of the user’s sensitive training data. We formalize this problem as privacy-aware few-shot continual action recognition.
Towards this end, we propose POET: Prompt-Offset Tuning. While existing prompt tuning approaches have shown great promise for continual learning of image, text, and video modalities; they demand access to extensively pre-trained transformers. Breaking away from this assumption, POET demonstrates the efficacy of prompt tuning a significantly lightweight backbone, pre-trained exclusively on the base class data. We propose a spatio-temporal learnable prompt tuning approach, and apply additive prompts to Graph Neural Networks. We demonstrate our method on two new benchmarks for 3D skeleton human activity recognition and hand gesture recognition.
Committee:
Prof. Fernando De La Torre (Advisor)
Prof. Deva Ramanan
Prof. Kris Kitani
Russell Mendonca