Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Preparing, restructuring, and augmenting a French treebank: lexicalised parsers or coherent treebanks?

Schluter, Natalie and van Genabith, Josef (2007) Preparing, restructuring, and augmenting a French treebank: lexicalised parsers or coherent treebanks? In: PACLING 2007 - 10th Conference of the Pacific Association for Computational Linguistics, 19-21 September , 2007, Melbourne, Australia.

Abstract
We present the Modified French Treebank (MFT), a completely revamped French Treebank, derived from the Paris 7 Treebank (P7T), which is cleaner, more coherent, has several transformed structures, and introduces new linguistic analyses. To determine the effect of these changes, we investigate how theMFT fares in statistical parsing. Probabilistic parsers trained on the MFT training set (currently 3800 trees) already perform better than their counterparts trained on five times the P7T data (18,548 trees), providing an extreme example of the importance of data quality over quantity in statistical parsing. Moreover, regression analysis on the learning curve of parsers trained on the MFT lead to the prediction that parsers trained on the full projected 18,548 tree MFT training set will far outscore their counterparts trained on the full P7T. These analyses also show how problematic data can lead to problematic conclusions–in particular, we find that lexicalisation in the probabilistic parsing of French is probably not as crucial as was once thought (Arun and Keller (2005)).
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Modified French Treebank (MFT);
Subjects:Computer Science > Machine translating
DCU Faculties and Centres:Research Initiatives and Centres > National Centre for Language Technology (NCLT)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:http://mandrake.csse.unimelb.edu.au/pacling2007/
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland, SFI 04/IN/I527
ID Code:15265
Deposited On:09 Mar 2010 16:29 by DORAS Administrator . Last Modified 19 Jul 2018 14:50
Documents

Full text available as:

[thumbnail of 78_Paper_meta.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
157kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record