Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Handling unknown words in statistical latent-variable parsing models for Arabic, English and French

Attia, Mohammed, Foster, Jennifer orcid logoORCID: 0000-0002-7789-4853, Hogan, Deirdre, Le Roux, Joseph, Tounsi, Lamia and van Genabith, Josef orcid logoORCID: 0000-0003-1322-7944 (2010) Handling unknown words in statistical latent-variable parsing models for Arabic, English and French. In: SPMRL 2010 - 1st Workshop on Statistical Parsing of Morphologically-Rich Languages at NAACL HLT 2010, 5 June 2010, Los Angeles, CA, USA.

Abstract
This paper presents a study of the impact of using simple and complex morphological clues to improve the classification of rare and unknown words for parsing. We compare this approach to a language-independent technique often used in parsers which is based solely on word frequencies. This study is applied to three languages that exhibit different levels of morphological expressiveness: Arabic, French and English. We integrate information about Arabic affixes and morphotactics into a PCFG-LA parser and obtain stateof-the-art accuracy. We also show that these morphological clues can be learnt automatically from an annotated corpus.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Association for Computational Linguistics
Official URL:http://www.aclweb.org/anthology/W/W10/
Copyright Information:© 2010 Association for Computational Linguistics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, Irish Research Council for Science Engineering and Technology
ID Code:15980
Deposited On:08 Dec 2010 14:32 by Shane Harper . Last Modified 21 Jan 2022 16:27
Documents

Full text available as:

[thumbnail of Handling_Unknown_Words.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
98kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record