Castilho, Sheila ORCID: 0000-0002-8416-6555, Cavalheiro Camargo, João Lucas ORCID: 0000-0003-3746-1225, Menezes, Miguel and Way, Andy ORCID: 0000-0001-5736-5930 (2021) DELA Corpus - A Document-Level Corpus Annotated with Context-Related Issues. In: Sixth Conference on Machine Translation (WMT21), 10-11 Nov 2021, Punta Cana, Dominican Republic (Online). ISBN 978-1-954085-94-7
Abstract
Recently, the Machine Translation (MT) community has become more interested in document-level evaluation especially in light of reactions to claims of "human parity", since examining the quality at the level of the document rather than at the sentence level allows for the assessment of suprasentential context, providing a more reliable evaluation.
This paper presents a document-level corpus annotated in English with context-aware issues that arise when translating from English into Brazilian Portuguese, namely ellipsis, gender, lexical ambiguity, number, reference, and terminology, with six different domains. The corpus can be used as a challenge test set for evaluation and as a training/testing corpus for MT as well as for deep linguistic analysis of context issues. To the best of our knowledge, this is the first corpus of its kind.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Additional Information: | pp 556-577 |
Uncontrolled Keywords: | machine translation evaluation; document-level MT; corpus, annotation |
Subjects: | Computer Science > Computational linguistics Computer Science > Machine translating Humanities > Language Humanities > Translating and interpreting |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Published in: | Proceedings of the Sixth Conference on Machine Translation. . Association for Computational Linguistics (ACL). ISBN 978-1-954085-94-7 |
Publisher: | Association for Computational Linguistics (ACL) |
Official URL: | https://aclanthology.org/2021.wmt-1.63 |
Copyright Information: | © 2021 The Authors |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Irish Research Council (GOIPD/2020/69), Science Foundation Ireland through the SFI Research Centres Programme (Grant 13/RC/2106_P2) |
ID Code: | 26256 |
Deposited On: | 14 Sep 2021 12:44 by Dr Sheila Castilho M de Sousa . Last Modified 27 Apr 2022 11:01 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
367kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record