Wagner, Joachim ORCID: 0000-0002-8290-3849 and Foster, Jennifer ORCID: 0000-0002-7789-4853 (2015) DCU-ADAPT: Learning edit operations for microblog normalisation with the generalised perceptron. In: ACL 2015 Workshop on Noisy User-generated Text (W-NUT), 31 July 2015, Beijing, China.
Abstract
We describe the work carried out by the DCU-ADAPT team on the Lexical Normalisation shared task at W-NUT 2015. We train a generalised perceptron to annotate noisy text with edit operations that normalise the text when executed. Features are character n-grams, recurrent neural network language model hidden layer activations, character class and eligibility for editing according to the task rules. We combine predictions from 25 models trained on subsets of the training data by selecting the most-likely normalisation according to a character language model. We compare the use of a generalised perceptron to the use of conditional random fields restricted to smaller amounts of training data due to memory constraints. Furthermore, we make a first attempt to verify Chrupała (2014)’s hypothesis that the noisy channel model would not be useful due to the limited amount of training data for the source language model, i.e. the language model on normalised text.
Metadata
Item Type: | Conference or Workshop Item (Poster) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Uncontrolled Keywords: | Normalisation; Pre-processing; Spelling correction; Non-standard spellings; Informal abbreviations; Contractions; Twitter; User-generated content |
Subjects: | Computer Science > Computational linguistics |
DCU Faculties and Centres: | Research Initiatives and Centres > ADAPT Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL) |
Published in: | Proceedings of the ACL 2015 Workshop on Noisy User-generated Text (W-NUT 2015). . Association for Computational Linguistics. |
Publisher: | Association for Computational Linguistics |
Official URL: | http://dx.doi.org/10.18653/v1/W15-4314 |
Copyright Information: | ©2015 The Authors |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland |
ID Code: | 20694 |
Deposited On: | 04 Aug 2015 11:04 by Joachim Wagner . Last Modified 22 Jul 2019 13:56 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
176kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record