An investigation of English-Irish machine translation and associated resources

Dowling, Meghan ORCID: 0000-0003-1637-4923 (2022) An investigation of English-Irish machine translation and associated resources. PhD thesis, Dublin City University.

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

As an official language in both Ireland and the European Union (EU), there is a high demand for English-Irish (EN-GA) translation in public administration. The difficulty that translators currently face in meeting this demand leads to the need for reliable domain-specific user-driven EN-GA machine translation (MT). This landscape provides a timely opportunity to address some research questions surrounding MT for the EN-GA language pair. To this end, we assess the corpora available for training data-driven MT systems, including publicly-available data, data collected through EU-supported data collection efforts and web-crawling, showing that though Irish is a low-resource language it is possible to increase the corpora available through concerted data collection efforts. We investigate how increased corpora affect domain-specific (public administration) statistical MT (SMT) and neural MT (NMT) systems using automatic metrics. The effect that different SMT and NMT parameters have on these automatic values is also explored, using sentence-level metrics to identify specific areas where output differs greatly between MT systems and providing a linguistic analysis of each. With EN-GA SMT and NMT automatic evaluation scores showing inconclusive results, we investigate the usefulness of EN-GA hybrid MT through the use of monolingual data as a source of artificial data creation via backtranslation. We evaluate these results using automatic metrics and linguistic analysis. Although results indicate that the addition of artificial data did not have a positive impact on EN-GA MT, repeated experiments involving Scottish Gaelic show that the method holds promise, given suitable conditions. Finally, given that the intended use-case of EN-GA MT is in the workflow of a professional translator, we conduct an in-depth human evaluation study for EN-GA SMT and NMT, providing a human-derived assessment of EN-GA MT quality and comparison of EN-GA SMT and NMT. We include a survey of translator opinions and recommendations surrounding EN-GA SMT and NMT as well as an analysis of data gathered through the post-editing of MT output. We compare these results to those generated automatically and provide recommendations for future work on EN-GA MT, in particular with regards to its use in a professional translation workflow within public administration.

Metadata

Item Type:	Thesis (PhD)
Date of Award:	February 2022
Refereed:	No
Supervisor(s):	Way, Andy and Lynn, Teresa
Subjects:	Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating Humanities > Linguistics Humanities > Translating and interpreting
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
Funders:	Science Foundation Ireland
ID Code:	26574
Deposited On:	15 Feb 2022 12:25 by Andrew Way . Last Modified 03 Nov 2023 10:38

Documents

Full text available as:

[thumbnail of Dowling_thesis_POSTexamination_copy_DORAS.pdf]

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
2MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

An investigation of English-Irish machine translation and associated resources

Downloads