Attention-based multimodal neural machine translation - Robotics Institute Carnegie Mellon University

Attention-based multimodal neural machine translation

P.-Y. Huang, F. Liu, S.-R. Shiang, J. Oh, and C. Dyer
Conference Paper, Proceedings of 1st Conference on Machine Translation (WMT '16), Vol. 2, pp. 639 - 645, August, 2016


We present a novel neural machine translation (NMT) architecture associating visual and textual features for translation tasks with multiple modalities. Transformed global and regional visual features are concatenated with text to form attendable sequences which are dissipated over parallel long short-term memory (LSTM) threads to assist the encoder generating a representation for attention-based decoding. Experiments show that the proposed NMT outperform the text-only baseline.


author = {P.-Y. Huang and F. Liu and S.-R. Shiang and J. Oh and C. Dyer},
title = {Attention-based multimodal neural machine translation},
booktitle = {Proceedings of 1st Conference on Machine Translation (WMT '16)},
year = {2016},
month = {August},
volume = {2},
pages = {639 - 645},