Foster, Jennifer ORCID: 0000-0002-7789-4853, Wagner, Joachim ORCID: 0000-0002-8290-3849, Seddah, Djamé and van Genabith, Josef (2007) Adapting WSJ-trained parsers to the British national corpus using in-domain self-training. In: IWPT 2007 - 10th International Conference of Parsing Technology, 23-24 June 2007, Prague, Czech Republic.
Abstract
We introduce a set of 1,000 gold standard parse trees for the British National Corpus (BNC) and perform a series of self-training experiments with Charniak and Johnson’s
reranking parser and BNC sentences. We show that retraining this parser with a combination of one million BNC parse trees
(produced by the same parser) and the original WSJ training data yields improvements of 0.4% on WSJ Section 23 and 1.7% on the new BNC gold standard set.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | parsers; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Initiatives and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Publisher: | Association for Computational Linguistics |
Official URL: | http://aclweb.org/anthology/W/W07/ |
Copyright Information: | © 2007 Association for Computational Linguistics |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Irish Research Council for Science Engineering and Technology, IRCSET SC/02/298, IRCSET P/04/232, Science Foundation Ireland, SFI 04/IN.3/I527 |
ID Code: | 15209 |
Deposited On: | 17 Feb 2010 16:06 by DORAS Administrator . Last Modified 10 Oct 2018 15:17 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
29kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record