Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Incorporating visual information into neural machine translation

Calixto, Iacer (2017) Incorporating visual information into neural machine translation. PhD thesis, Dublin City University.

Abstract
In this work, we study different ways to enrich Machine Translation (MT) models using information obtained from images. Specifically, we propose different models to incorporate images into MT by transferring learning from pre-trained convolutional neural networks (CNN) trained for classifying images. We use these pre-trained CNNs for image feature extraction, and use two different types of visual features: global visual features, that encode an entire image into one single real-valued feature vector; and local visual features, that encode different areas of an image into separate real-valued vectors, therefore also encoding spatial information. We first study how to train embeddings that are both multilingual and multi-modal, and use global visual features and multilingual sentences for training. Second, we propose different models to incorporate global visual features into state-of-the-art Neural Machine Translation (NMT): (i) as words in the source sentence, (ii) to initialise the encoder hidden state, and (iii) as additional data to initialise the decoder hidden state. Finally, we put forward one model to incorporate local visual features into NMT: (i) a NMT model with an independent visual attention mechanism integrated into the same decoder Recurrent Neural Network (RNN) as the source-language attention mechanism. We evaluate our models on the Multi30k, a publicly available, general domain data set, and also on a proprietary data set of product listings and images built by eBay Inc., which was made available for the purpose of this research. We report state-of-the-art results on the publicly available Multi30k data set. Our best models also significantly improve on comparable phrase-based Statistical MT (PBSMT) models trained on the same data set, according to widely adopted MT metrics.
Metadata
Item Type:Thesis (PhD)
Date of Award:November 2017
Refereed:No
Supervisor(s):Liu, Qun and Campbell, Nick
Subjects:Computer Science > Machine translating
Computer Science > Computational linguistics
Computer Science > Machine learning
Computer Science > Artificial intelligence
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
ID Code:21942
Deposited On:10 Nov 2017 12:38 by Qun Liu . Last Modified 25 Oct 2018 09:21
Documents

Full text available as:

[thumbnail of PhD thesis ]
Preview
PDF (PhD thesis ) - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
4MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record