Burke, Michael, Cahill, Aoife ORCID: 0000-0002-3519-7726, van Genabith, Josef and Way, Andy ORCID: 0000-0001-5736-5930 (2005) Evaluating automatically acquired f-structures against PropBank. In: LFG05 - 10th International Lexical Functional Grammar Conference, 18-20 July 2005, Bergen, Norway. ISBN 1098-6782
Abstract
An automatic method for annotating the Penn-II Treebank (Marcus et al., 1994) with high-level Lexical Functional Grammar (Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structure representations is presented by Burke et al. (2004b). The annotation algorithm is the basis for the automatic acquisition of wide-coverage and robust probabilistic approximations of LFG grammars (Cahill et al., 2004) and for the induction of subcategorisation frames (O’Donovan et al., 2004; O’Donovan et al., 2005). Annotation quality is, therefore, extremely important and to date has been measured against the DCU 105 and the PARC 700 Dependency Bank (King et al., 2003). The annotation algorithm achieves f-scores of 96.73% for complete f-structures and 94.28% for preds-only f-structures against the DCU 105 and 87.07% against the PARC 700 using the feature set of Kaplan et al. (2004). Burke et al. (2004a) provides detailed analysis of these results.
This paper presents an evaluation of the annotation algorithm against PropBank (Kingsbury and Palmer,
2002). PropBank identifies the semantic arguments of each predicate in the Penn-II treebank and annotates their semantic roles. As PropBank was developed independently of any grammar formalism it provides a platform for making more meaningful comparisons between parsing technologies than was previously possible. PropBank also allows a much larger scale evaluation than the smaller DCU 105 and PARC 700 gold standards. In order to perform the evaluation, first, we automatically converted the PropBank annotations
into a dependency format. Second, we developed conversion software to produce PropBank-style semantic annotations in dependency format from the f-structures automatically acquired by the annotation algorithm from Penn-II. The evaluation was performed using the evaluation software of Crouch et al. (2002) and Riezler et al. (2002). Using the Penn-II Wall Street Journal Section 24 as the development set, currently we achieve an f-score of 76.58% against PropBank for the Section 23 test set.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | lexical functional grammar; |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | Research Initiatives and Centres > National Centre for Language Technology (NCLT) |
Published in: | The Proceedings of the LFG '05 Conference. . CSLI Publications. ISBN 1098-6782 |
Publisher: | CSLI Publications |
Official URL: | http://csli-publications.stanford.edu/LFG/10/lfg05... |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 15290 |
Deposited On: | 12 Mar 2010 11:12 by DORAS Administrator . Last Modified 25 Jan 2019 11:47 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
109kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record