Uí Dhonnchadha, Elaine (2002) An analyser and generator for Irish inflectional morphology using finite-state transducers. Master of Science thesis, Dublin City University.
Abstract
Computational morphology is an important step in natural language processing. Finite-state techniques have been applied successfully in computational phonology and morphology to many of the world’s major languages. Celtic languages, such as Modern Irish, present unique and challenging morphological features that to date have not been addressed using finite-state technology. This thesis presents a finite-state morphology of Irish developed using Xerox Finite-State Tools. To the best of our knowledge, such a resource does not exist.
The computational model, implemented as a finite-state transducer, encodes the inflectional morphology of nouns, adjectives, and verbs. Other parts of speech are also included in the interests of language coverage. The implementation is a strictly lexicalised design: the morphotactics of stems and affixes are encoded in the lexicon using replace rule triggers. Word mutations are then implemented as a series of replace rules written as regular expressions. Both components are compiled into finite state transducers and then combined, to produce a single two-level morphological transducer for the language.
A major advantage of finite-state implementations of morphology is their inherent bi-directionality; the
same system is used for both analysis and generation of word forms in the language.
This resource can be used as a component part in parsing and generation in natural language processing (NLP) applications, such as spelling checkers/correctors, stemmers and text to speech synthesisers. It can also be used for tokenising text, lemmatising, and as an input to automatic partof- speech tagging of a corpus.
The system is designed for broad coverage of the language and this is evaluated by comparing it with a list of the 1000 most frequently found word forms in a corpus of contemporary Irish texts.
Finally, maintainability of the system is discussed and possible extensions to the system are suggested, such as derivational morphology and the inclusion of dialectal or historical word-forms.
Metadata
Item Type: | Thesis (Master of Science) |
---|---|
Date of Award: | 2002 |
Refereed: | No |
Supervisor(s): | van Genabith, Josef and Nic Pháidín, Caoilfhionn |
Uncontrolled Keywords: | Inflection; Morphology; Natural language processing |
Subjects: | Computer Science > Computational linguistics Humanities > Irish language |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License |
ID Code: | 18253 |
Deposited On: | 27 May 2013 13:32 by Celine Campbell . Last Modified 03 Nov 2023 10:40 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
4MB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record