Mind Your Language: Learning Visually Grounded Dialog in a Multi-Agent Setting

Akshat Agarwal, Swaminathan Gurumurthy, Vasu Sharma, and Katia Sycara

Workshop Paper, FAIM '18 Adaptive Learning Agents Workshop, August, 2018

Abstract

AI is increasingly becoming an important part of our daily lives, be it in the household, the workplace or in public places. In order for humans to be able to interact with and understand the AI system, it needs to learn how to communicate with us about our environment using the languages that we speak. This requires the AI system to visually interpret the world, and communicate descriptions of the physical world. While such a task would have been considered impossible a few years ago, the recent progress in the fields of Computer Vision and Natural Language Processing, which are important building blocks for this task, have reinvigorated interest in the community. Several problems like image captioning ([10],[33],[27],[9],[17],[34]), image classification ([12],[26],[7],[31]), object detection ([14],[20],[21]), image segmentation ([15],[8],[19]), dialog ([25],[28],[5]), question answering ([35],[24],[32]) etc. have received immense amounts of attention from the research community. The paradigm of reinforcement learning has also shown promising results in several problems including learning to play Go [23] and Atari games [18], among others, at superhuman levels. Capitalizing on the growth in all these different domains, it now seems plausible to build more advanced dialog systems capable of reasoning over multiple modalities while also learning from one another. Such systems will allow humans to have a meaningful dialog with intelligent systems containing visual as well as textual content.

Notes
Oral Presentation

BibTeX

@workshop{Agarwal-2018-126659,
author = {Akshat Agarwal and Swaminathan Gurumurthy and Vasu Sharma and Katia Sycara},
title = {Mind Your Language: Learning Visually Grounded Dialog in a Multi-Agent Setting},
booktitle = {Proceedings of FAIM '18 Adaptive Learning Agents Workshop},
year = {2018},
month = {August},
}

Copyright notice: This material is presented to ensure timely dissemination of scholarly and technical work. Copyright and all rights therein are retained by authors or by other copyright holders. All persons copying this information are expected to adhere to the terms and constraints invoked by each author's copyright. These works may not be reposted without the explicit permission of the copyright holder.