Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task

Haque, Rejwanul orcid logoORCID: 0000-0003-1680-0099, Moslem, Yasmin orcid logoORCID: 0000-0003-4595-6877 and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2020) Terminology-aware sentence mining for NMT domain adaptation: ADAPT’s submission to the Adap-MT 2020 English-to-Hindi AI translation shared task. In: Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020), 18-21 Dec 2020, Patna, India (Online).

Abstract
This paper describes the ADAPT Centre’s submission to the Adap-MT 2020 AI Translation Shared Task for English-to-Hindi. The neural machine translation (NMT) systems that we built to translate AI domain texts are state-of- the-art Transformer models. In order to improve the translation quality of our NMT systems, we made use of both in-domain and out-of-domain data for training and employed different fine-tuning techniques for adapting our NMT systems to this task, e.g. mixed fine-tuning and on-the-fly self-training. For this, we mined parallel sentence pairs and monolingual sentences from large out-of-domain data, and the mining process was facilitated through automatic extraction of terminology from the in-domain data. This paper outlines the experiments we carried out for this task and reports the performance of our NMT systems on the evaluation test set.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Additional Information:Part of ICON 2020: 17th International Conference on Natural Language Processing
Subjects:Computer Science > Artificial intelligence
Computer Science > Computational linguistics
Computer Science > Machine learning
Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Proceedings of Workshop on Low Resource Domain Adaptation for Indic Machine Translation (Adap-MT 2020). . NLP Association of India (NLPAI).
Publisher:NLP Association of India (NLPAI)
Copyright Information:© 2020 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland (SFI) Research Centres Programme (Grant No. 13/RC/2106) and is co-funded under the European Regional Development Fund, Science Foundation Ireland (SFI) under Grant Number 13/RC/2077 and 18/CRT/6224
ID Code:25446
Deposited On:28 Jan 2021 14:11 by Thomas Murtagh . Last Modified 14 Feb 2022 15:49
Documents

Full text available as:

[thumbnail of Terminology-Aware_Sentence_Mining_for_NMT_Domain_Adaptation.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Share Alike 4.0
115kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record