Foster, Jennifer ORCID: 0000-0002-7789-4853, Wagner, Joachim ORCID: 0000-0002-8290-3849 and van Genabith, Josef (2008) Adapting a WSJ-trained parser to grammatically noisy text. In: ACL-08:HLT - 46th Annual Meeting of the Association for Computational Linguistics: Human Language Technologies, 15-20 June 2008, Columbus, USA.
Abstract
We present a robust parser which is trained on a treebank of ungrammatical sentences. The treebank is created automatically by modifying Penn treebank sentences so that they contain one or more syntactic errors. We evaluate
an existing Penn-treebank-trained parser on the ungrammatical treebank to see how it reacts to noise in the form of grammatical errors. We re-train this parser on the training section of the ungrammatical treebank, leading
to an significantly improved performance on the ungrammatical test sets. We show how a classifier can be used to prevent performance degradation on the original grammatical data.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | parser; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Initiatives and Centres > National Centre for Language Technology (NCLT) |
Publisher: | Association for Computational Linguistics |
Official URL: | http://www.aclweb.org/anthology/P/P08/ |
Copyright Information: | © 2008 Association for Computational Linguistics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Irish Research Council for Science Engineering and Technology, IRCSET P/04/232 |
ID Code: | 15192 |
Deposited On: | 16 Feb 2010 14:25 by DORAS Administrator . Last Modified 10 Oct 2018 15:16 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
38kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record