Neural machine translation for multimodal interaction

Dutta Chowdhury, Koel (2019) Neural machine translation for multimodal interaction. Master of Science thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

Typically it is seen that multimodal neural machine translation (MNMT) systems trained on a combination of visual and textual inputs produce better translations than systems trained using only textual inputs. The task of such systems can be decomposed into two sub-tasks: learning visually grounded representations from images and translation of the textual counterparts using those representations. In a multi-task learning framework, translations are generated from an attention-based encoder-decoder framework and grounded representations that are learned from pretrained convolutional neural networks (CNNs) for classifying images. In this thesis, I study different computational techniques to translate the meaning of sentences from one language into another considering the visual modality as a naturally occurring meaning representation bridging between languages. We examine the behaviour of state-of-the-art MNMT systems from the data perspective in order to understand the role of the both textual and visual inputs in such systems. We evaluate our models on the Multi30k, a large-scale multilingual multimodal dataset publicly available for machine learning research. Our results in the optimal and sparse data settings show that the differences in translation system performance are proportional to the amount of both visual and linguistic information whereas, in the adversarial condition the effect of the visual modality is rather small or negligible. The chapters of the thesis follow a progression starting with using different state-of-the-art MMT models for incorporating images in optimal data settings to creating synthetic image data under the low-resource scenario and extending to addition of adversarial perturbations to the textual input for evaluating the real contribution of images.

Metadata

Item Type:	Thesis (Master of Science)
Date of Award:	November 2019
Refereed:	No
Supervisor(s):	Graham, Yvette and Smeaton, Alan F.
Subjects:	Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:	Science Foundation of Ireland for the research grant (Grant 13/RC/2106)
ID Code:	23747
Deposited On:	19 Nov 2019 12:44 by Yvette Graham . Last Modified 12 Aug 2020 16:55

Documents

Full text available as:

[thumbnail of Final_version_MSc_Thesis_2019.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

Altmetric

DORAS | DCU Research Repository

Neural machine translation for multimodal interaction

Downloads