Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

DCU-ADAPT: Learning edit operations for microblog normalisation with the generalised perceptron

Wagner, Joachim orcid logoORCID: 0000-0002-8290-3849 and Foster, Jennifer orcid logoORCID: 0000-0002-7789-4853 (2015) DCU-ADAPT: Learning edit operations for microblog normalisation with the generalised perceptron. In: ACL 2015 Workshop on Noisy User-generated Text (W-NUT), 31 July 2015, Beijing, China.

Abstract
We describe the work carried out by the DCU-ADAPT team on the Lexical Normalisation shared task at W-NUT 2015. We train a generalised perceptron to annotate noisy text with edit operations that normalise the text when executed. Features are character n-grams, recurrent neural network language model hidden layer activations, character class and eligibility for editing according to the task rules. We combine predictions from 25 models trained on subsets of the training data by selecting the most-likely normalisation according to a character language model. We compare the use of a generalised perceptron to the use of conditional random fields restricted to smaller amounts of training data due to memory constraints. Furthermore, we make a first attempt to verify Chrupała (2014)’s hypothesis that the noisy channel model would not be useful due to the limited amount of training data for the source language model, i.e. the language model on normalised text.
Metadata
Item Type:Conference or Workshop Item (Poster)
Event Type:Workshop
Refereed:Yes
Uncontrolled Keywords:Normalisation; Pre-processing; Spelling correction; Non-standard spellings; Informal abbreviations; Contractions; Twitter; User-generated content
Subjects:Computer Science > Computational linguistics
DCU Faculties and Centres:Research Initiatives and Centres > ADAPT
Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
Published in: Proceedings of the ACL 2015 Workshop on Noisy User-generated Text (W-NUT 2015). . Association for Computational Linguistics.
Publisher:Association for Computational Linguistics
Official URL:http://dx.doi.org/10.18653/v1/W15-4314
Copyright Information:©2015 The Authors
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:20694
Deposited On:04 Aug 2015 11:04 by Joachim Wagner . Last Modified 22 Jul 2019 13:56
Documents

Full text available as:

[thumbnail of W_NUT_2015_Lex_Norm-vA.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
176kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record