Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Evaluating automatic LFG f-structure annotation for the Penn-II treebank

Burke, Michael, Cahill, Aoife orcid logoORCID: 0000-0002-3519-7726, McCarthy, Mairéad, O'Donovan, Ruth, van Genabith, Josef and Way, Andy orcid logoORCID: 0000-0001-5736-5930 (2004) Evaluating automatic LFG f-structure annotation for the Penn-II treebank. Research on Language and Computation, 2 (4). pp. 523-547. ISSN 1570-7075

Abstract
Lexical-Functional Grammar (LFG: Kaplan and Bresnan, 1982; Bresnan, 2001; Dalrymple, 2001) f-structures represent abstract syntactic information approximating to basic predicate-argument-modifier (dependency) structure or simple logical form (van Genabith and Crouch, 1996; Cahill et al., 2003a) . A number of methods have been developed (van Genabith et al., 1999a,b, 2001; Frank, 2000; Sadler et al., 2000; Frank et al., 2003) for automatically annotating treebank resources with LFG f-structure information. Until recently, however, most of this work on automatic f-structure annotation has been applied only to limited data sets, so while it may have shown lsquoproof of conceptrsquo, it has not yet demonstrated that the techniques developed scale up to much larger data sets. More recent work (Cahill et al., 2002a,b) has presented efforts in evolving and scaling techniques established in these previous papers to the full Penn-II Treebank (Marcus et al., 1994). In this paper, we present a number of quantitative and qualitative evaluation experiments which provide insights into the effectiveness of the techniques developed to automatically derive a set of f-structures for the more than 1,000,000 words and 49,000 sentences of Penn-II. Currently we obtain 94.85% Precision, 95.4% Recall and 95.09% F-Score for preds-only f-structures against a manually encoded gold standard.
Metadata
Item Type:Article (Published)
Refereed:Yes
Additional Information:The original publication is available at www.springerlink.com
Uncontrolled Keywords:automatic annotation; corpora; evaluation; Lexical Functional Grammar; treebanks; unification grammar;
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Publisher:Springer Netherlands
Official URL:http://dx.doi.org/10.1007/s11168-004-7428-y
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Enterprise Ireland, EI SC/2001/186
ID Code:15363
Deposited On:20 Apr 2010 13:36 by DORAS Administrator . Last Modified 25 Jan 2019 11:44
Documents

Full text available as:

[thumbnail of burke_et_al_04d.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
295kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record