Dutta Chowdhury, Koel (2019) Neural machine translation for multimodal interaction. Master of Science thesis, Dublin City University.
Abstract
Typically it is seen that multimodal neural machine translation (MNMT) systems
trained on a combination of visual and textual inputs produce better translations
than systems trained using only textual inputs. The task of such systems can be
decomposed into two sub-tasks: learning visually grounded representations from
images and translation of the textual counterparts using those representations. In a
multi-task learning framework, translations are generated from an attention-based
encoder-decoder framework and grounded representations that are learned from pretrained convolutional neural networks (CNNs) for classifying images.
In this thesis, I study different computational techniques to translate the meaning of sentences from one language into another considering the visual modality
as a naturally occurring meaning representation bridging between languages. We
examine the behaviour of state-of-the-art MNMT systems from the data perspective in order to understand the role of the both textual and visual inputs in such
systems. We evaluate our models on the Multi30k, a large-scale multilingual multimodal dataset publicly available for machine learning research. Our results in the optimal and sparse data settings show that the differences in translation system
performance are proportional to the amount of both visual and linguistic information whereas, in the adversarial condition the effect of the visual modality is rather
small or negligible. The chapters of the thesis follow a progression starting with using different state-of-the-art MMT models for incorporating images in optimal data
settings to creating synthetic image data under the low-resource scenario and extending to addition of adversarial perturbations to the textual input for evaluating
the real contribution of images.
Metadata
Item Type: | Thesis (Master of Science) |
---|---|
Date of Award: | November 2019 |
Refereed: | No |
Supervisor(s): | Graham, Yvette and Smeaton, Alan F. |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation of Ireland for the research grant (Grant 13/RC/2106) |
ID Code: | 23747 |
Deposited On: | 19 Nov 2019 12:44 by Yvette Graham . Last Modified 12 Aug 2020 16:55 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record