Mind Your Language: Learning Visually Grounded Dialog in a Multi-Agent Setting
Abstract
AI is increasingly becoming an important part of our daily lives, be it in the household, the workplace or in public places. In order for humans to be able to interact with and understand the AI system, it needs to learn how to communicate with us about our environment using the languages that we speak. This requires the AI system to visually interpret the world, and communicate descriptions of the physical world. While such a task would have been considered impossible a few years ago, the recent progress in the fields of Computer Vision and Natural Language Processing, which are important building blocks for this task, have reinvigorated interest in the community. Several problems like image captioning ([10],[33],[27],[9],[17],[34]), image classification ([12],[26],[7],[31]), object detection ([14],[20],[21]), image segmentation ([15],[8],[19]), dialog ([25],[28],[5]), question answering ([35],[24],[32]) etc. have received immense amounts of attention from the research community. The paradigm of reinforcement learning has also shown promising results in several problems including learning to play Go [23] and Atari games [18], among others, at superhuman levels. Capitalizing on the growth in all these different domains, it now seems plausible to build more advanced dialog systems capable of reasoning over multiple modalities while also learning from one another. Such systems will allow humans to have a meaningful dialog with intelligent systems containing visual as well as textual content.
Oral Presentation
BibTeX
@workshop{Agarwal-2018-126659,author = {Akshat Agarwal and Swaminathan Gurumurthy and Vasu Sharma and Katia Sycara},
title = {Mind Your Language: Learning Visually Grounded Dialog in a Multi-Agent Setting},
booktitle = {Proceedings of FAIM '18 Adaptive Learning Agents Workshop},
year = {2018},
month = {August},
}