Calixto, Iacer (2017) Incorporating visual information into neural machine translation. PhD thesis, Dublin City University.
Abstract
In this work, we study different ways to enrich Machine Translation (MT) models using information obtained from images. Specifically, we propose different models to incorporate images into MT by transferring learning from pre-trained convolutional neural networks (CNN) trained for classifying images. We use these pre-trained CNNs for image feature extraction, and use two different types of visual features: global visual features, that encode an entire image into one single real-valued feature vector; and local visual features, that encode different areas of an image into separate real-valued vectors, therefore also encoding spatial information. We first study how to train embeddings that are both multilingual and multi-modal, and use global visual features and multilingual sentences for training. Second, we propose different models to incorporate global visual features into state-of-the-art Neural Machine Translation (NMT): (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. Finally, we put forward one model to incorporate local visual features into NMT: (i) a NMT model with an independent visual attention mechanism integrated into the same decoder Recurrent Neural Network (RNN) as the source-language attention mechanism. We evaluate our models on the Multi30k, a publicly available, general domain data set, and also on a proprietary data set of product listings and images built by eBay Inc., which was made available for the purpose of this research. We report state-of-the-art results on the publicly available Multi30k data set. Our best models also significantly improve on comparable phrase-based Statistical MT (PBSMT) models trained on the same data set, according to widely adopted MT metrics.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | November 2017 |
Refereed: | No |
Supervisor(s): | Liu, Qun and Campbell, Nick |
Subjects: | Computer Science > Machine translating Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Artificial intelligence |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 21942 |
Deposited On: | 10 Nov 2017 12:38 by Qun Liu . Last Modified 25 Oct 2018 09:21 |
Documents
Full text available as:
Preview |
PDF (PhD thesis )
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
4MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record