Longyue, Wang ORCID: 0000-0002-9062-6183 (2019) Discourse-aware neural machine translation. PhD thesis, Dublin City University.
Abstract
Machine translation (MT) models usually translate a text by considering isolated sentences
based on a strict assumption that the sentences in a text are independent of one another.
However, it is a truism that texts have properties of connectedness that go beyond those of
their individual sentences. Disregarding dependencies across sentences will harm translation quality especially in terms of coherence, cohesion, and consistency. Previously,
some discourse-aware approaches have been investigated for conventional statistical machine translation (SMT). However, this is a serious obstacle for the state-of-the-art neural
machine translation (NMT), which recently has surpassed the performance of SMT.
In this thesis, we try to incorporate useful discourse information for enhancing NMT
models. More specifically, we conduct research on two main parts: 1) exploiting novel
document-level NMT architecture; and 2) dealing with a specific discourse phenomenon
for translation models.
Firstly, we investigate the influence of historical contextual information on the perfor-
mance of NMT models. A cross-sentence context-aware NMT model is proposed to consider the influence of previous sentences in the same document. Specifically, this history
is summarized using an additional hierarchical encoder. The historical representations are
then integrated into the standard NMT model in different strategies. Experimental results
on a Chinese–English document-level translation task show that the approach significantly
improves upon a strong attention-based NMT system by up to +2.1 BLEU points. In addition, analysis and comparison also give insightful discussions and conclusions for this
research direction.
Secondly, we explore the impact of discourse phenomena on the performance of MT.
In this thesis, we focus on the phenomenon of pronoun-dropping (pro-drop), where, in pro-drop languages, pronouns can be omitted when it is possible to infer the referent from the
context. As the data for training a dropped pronoun (DP) generator is scarce, we propose to
automatically annotate DPs using alignment information from a large parallel corpus. We
then introduce a hybrid approach: building a neural-based DP generator and integrating it
into the SMT model. Experimental results on both Chinese–English and Japanese–English
translation tasks demonstrate that our approach achieves a significant improvement of up to
+1.58 BLEU points with 66% F-score for DP generation accuracy.
Motivated by this promising result, we further exploit the DP translation approach for
advanced NMT models. A novel reconstruction-based model is proposed to reconstruct the
DP-annotated source sentence from the hidden states of either encoder or decoder, or both
components. Experimental results on the same translation tasks show that the proposed approach significantly and consistently improves translation performance over a strong NMT
baseline, which is trained on DP-annotated parallel data.
To avoid the errors propagated from an external DP prediction model, we finally investigate an end-to-end DP translation model. Specifically, we improve the reconstruction-based
model from three perspectives. We first employ a shared reconstructor to better exploit encoder and decoder representations. Secondly, we propose to jointly learn to translate and
predict DPs. In order to capture discourse information for DP prediction, we finally combine the hierarchical encoder with the DP translation model. Experimental results on the
same translation tasks show that our approach significantly improves both translation performance and DP prediction accuracy.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | March 2019 |
Refereed: | No |
Supervisor(s): | Way, Andy and Qun, Liu |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating Humanities > Linguistics |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland, Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund and the European Union Horizon 2020 research and innovation programme under grant agreement 645452 (QT21), DCU-Huawei Joint Projects: 2015-2016 (201504032-A/YB2015090061) and 2017-2018 (YBN2017080040) |
ID Code: | 22903 |
Deposited On: | 01 Apr 2019 15:36 by Andrew Way . Last Modified 30 Sep 2022 15:06 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
7MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record