Barman, Utsab, Wagner, Joachim ORCID: 0000-0002-8290-3849 and Foster, Jennifer ORCID: 0000-0002-7789-4853 (2016) Part-of-speech tagging of code-mixed social media content: pipeline, stacking and joint modelling. In: Second Workshop on Computational Approaches to Code Switching, 2 Nov 2016, Austin, Texas, USA.
Abstract
Multilingual users of social media sometimes use multiple languages during conversation. Mixing multiple languages in content
is known as code-mixing. We annotate a subset of a trilingual code-mixed corpus (Barman
et al., 2014) with part-of-speech (POS) tags.
We investigate two state-of-the-art POS tagging techniques for code-mixed content and
combine the features of the two systems to
build a better POS tagger. Furthermore, we
investigate the use of a joint model which performs language identification (LID) and partof-speech (POS) tagging simultaneously.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Subjects: | Computer Science > Machine translating |
DCU Faculties and Centres: | DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing Research Initiatives and Centres > ADAPT |
Published in: | Proceedings of the Second Workshop on Computational Approaches to Code Switching. . Association for Computational Linguistics (ACL). |
Publisher: | Association for Computational Linguistics (ACL) |
Official URL: | https://doi.org/10.18653/v1/W16-5804 |
Copyright Information: | © 2016 Association for Computational Linguistics (ACL) |
Funders: | Science Foundation Ireland (Grant 12/CE/I2267) as part of CNGL (www.cngl.ie) at Dublin City University. |
ID Code: | 23067 |
Deposited On: | 11 Mar 2019 11:56 by Thomas Murtagh . Last Modified 27 Apr 2023 14:02 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution 4.0 237kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record