Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Automated text simplification as a preprocessing step for machine translation into an under-resourced language

Štajner, Sanja orcid logoORCID: 0000-0002-7780-7035 and Popović, Maja orcid logoORCID: 0000-0001-8234-8745 (2019) Automated text simplification as a preprocessing step for machine translation into an under-resourced language. In: Recent Advances in Natural Language Processing (RANLP 2019), 2-4 Sept 2019, Varna, Bulgaria.

Abstract
In this work, we investigate the possibility of using fully automatic text simplification system on the English source in machine translation (MT) for improving its translation into an under-resourced language. We use the state-of-the-art automatic text simplification (ATS) system for lexically and syntactically simplifying source sentences, which are then translated with two state-of-the-art English-to-Serbian MT systems, the phrase-based MT (PBMT) and the neural MT (NMT). We explore three different scenarios for using the ATS in MT: (1) using the raw output of the ATS; (2) automatically filtering out the sentences with low grammaticality and meaning preservation scores; and (3) performing a minimal manual correction of the ATS output. Our results show improvement in fluency of the translation regardless of the chosen scenario, and difference in success of the three scenarios depending on the MT approach used (PBMT or NMT) with regards to improving translation fluency and post-editing effort.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: Mitkov, Ruslan and Angelova, Galia, (eds.) Proceedings of the International Conference on Recent Advances in Natural Language Processing (RANLP 2019). . INCOMA Ltd..
Publisher:INCOMA Ltd.
Official URL:http://dx.doi.org/10.26615/978-954-452-056-4_131
Copyright Information:© 2019 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland Research Centres Programme (Grant 13/RC/2106), European Regional Development Fund.
ID Code:24479
Deposited On:25 May 2020 15:49 by Maja Popovic . Last Modified 20 Jan 2021 16:39
Documents

Full text available as:

[thumbnail of RANLP19Maja.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
228kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record