Han, Lifeng ORCID: 0000-0002-3221-2185 (2022) An investigation into multi-word expressions in machine translation. PhD thesis, Dublin City University.
Abstract
Multi-word Expressions (MWEs) present challenges in natural language processing and computational linguistics due to their popular usage, richness in variety, idiomaticity, and non-decompositionality, which are present in the text content in which they are used. This is a typical level of expectation in the machine translation (MT) field where we require algorithms to perform a translation from one human language to another automatically while requiring high-quality output including features such as adequacy, fluency, and keeping the same or making creative and correct style decisions in that output.
In this thesis, we carry out an extensive investigation into MWEs in Neural MT.
Firstly, we carry out a review of relevant literature which includes experimental work on re-examining state-of-the-art models that combine knowledge of MWEs into MT systems, but with new language pairs setting to see what gaps might exist in the published literature.
Secondly, we propose our new models on how to address MWE translations. This includes a design where we treat MWEs as low-frequency words and phrases translation issues, by integrating language-specific features such as strokes and radicals representation of Chinese characters into the learning model, expecting that this will facilitate improved accuracy.
Thirdly, to properly examine different MT models' performances in the context of MWEs, we need to carry out a new evaluation methodology, and in light of this, we create a multilingual parallel corpus with MWE annotations (AlphaMWE). During the creation of this corpus, we classify the MT issues on MWE-related content into several categories with the expectation that this will help future MT researchers to focus on one or some of these in order to achieve a new state of the art in MT performance, ultimately moving towards human parity.
Finally, we propose a new methodology for human in the loop MT evaluation with MWE considerations (HiLMeMe).
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | February 2022 |
Refereed: | No |
Supervisor(s): | Smeaton, Alan F. and Jones, Gareth J.F. |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 26559 |
Deposited On: | 16 Feb 2022 11:52 by Alan Smeaton . Last Modified 16 Feb 2022 11:52 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 4MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record