Walsh, Abigail (2023) The automatic processing of multiword expressions in Irish. PhD thesis, Dublin City University.
Abstract
It is well-documented that Multiword Expressions (MWEs) pose a unique challenge
to a variety of NLP tasks such as machine translation, parsing, information retrieval,
and more. For low-resource languages such as Irish, these challenges can be exacerbated by the scarcity of data, and a lack of research in this topic. In order to
improve handling of MWEs in various NLP tasks for Irish, this thesis will address
both the lack of resources specifically targeting MWEs in Irish, and examine how
these resources can be applied to said NLP tasks.
We report on the creation and analysis of a number of lexical resources as part
of this PhD research. Ilfhocail, a lexicon of Irish MWEs, is created through extract-
ing MWEs from other lexical resources such as dictionaries. A corpus annotated
with verbal MWEs in Irish is created for the inclusion of Irish in the PARSEME
Shared Task 1.2. Additionally, MWEs were tagged in a bilingual EN-GA corpus
for inclusion in experiments in machine translation. For the purposes of annotation, a categorisation scheme for nine categories of MWEs in Irish is created, based
on combining linguistic analysis on these types of constructions and cross-lingual
frameworks for defining MWEs.
A case study in applying MWEs to NLP tasks is undertaken, with the exploration of incorporating MWE information while training Neural Machine Translation
systems. Finally, the topic of automatic identification of Irish MWEs is explored,
documenting the training of a system capable of automatically identifying Irish
MWEs from a variety of categories, and the challenges associated with developing
such a system.
This research contributes towards a greater understanding of Irish MWEs and
their applications in NLP, and provides a foundation for future work in exploring
other methods for the automatic discovery and identification of Irish MWEs, and
further developing the MWE resources described above.
Metadata
Item Type: | Thesis (PhD) |
---|---|
Date of Award: | March 2023 |
Refereed: | No |
Supervisor(s): | Foster, Jennifer and Lynn, Teresa |
Uncontrolled Keywords: | Natural Language Processing; Multiword Expressions; Irish Language; Technology |
Subjects: | Computer Science > Artificial intelligence Computer Science > Computational linguistics Computer Science > Machine learning Computer Science > Machine translating Humanities > Irish language |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Funders: | Department of Tourism, Culture, Arts, Gaeltacht, Sport and Media |
ID Code: | 27997 |
Deposited On: | 31 Mar 2023 09:17 by Jennifer Foster . Last Modified 31 Mar 2023 09:17 |
Documents
Full text available as:
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0 3MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record