Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Retrievability of code mixed microblogs

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138, Bandyopadhyay, Ayan, Mitra, Mandar and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2016) Retrievability of code mixed microblogs. In: 39th International ACM SIGIR conference on Research and Development in Information Retrieval, 17-21 July 2016, Pisa, Italy. ISBN 978-1-4503-4069-4

Abstract
Mixing multiple languages within the same document, a phenomenon called (linguistic) code mixing or code switching, is a frequent trend among multilingual users of social media. In the context of information retrieval (IR), code mixing may affect retrieval effectiveness due to the mixing of different vocabularies with different collection statistics within a single collection of documents. In this paper, we investigate the indexing and retrieval strategies for a mixed collection of documents, comprising of code-mixed and the monolingual documents. In particular, we address three alternative modes of indexing, namely (a) a single index for the two sub-collections; (b) a separate index for each sub-collection; and (c) a clustered index with two individual sub-collection statistics coupled with the overall one. We make use of the expected retrievability scores of the two classes of documents to empirically show that indexing strategies (a) and (b) mostly retrieve the monolingual documents at top ranks with standard retrieval approaches. Our experiments show that, by contrast, the clustered index (c) is able to alleviate this problem by improving the retrievability of the code-mixed documents.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Microblog Retrieval; Code Mixing; Retrievability; Fusion
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > ADAPT
Published in: SIGIR '16 Proceedings of the 39th International ACM SIGIR conference on Research and Development in Information Retrieval. . Association for Computing Machinery (ACM). ISBN 978-1-4503-4069-4
Publisher:Association for Computing Machinery (ACM)
Official URL:http://dx.doi.org/10.1145/2911451.2914727
Copyright Information:© 2016 ACM
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:y Science Foundation Ireland (SFI) as a part of the ADAPT Centre at DCU (Grant No: 13/RC/2106) and by a grant under the SFI ISCA India consortium.
ID Code:23399
Deposited On:04 Jun 2019 16:12 by Thomas Murtagh . Last Modified 04 Jun 2019 16:12
Documents

Full text available as:

[thumbnail of Retrievability_of_Code_Mixed_Microblogs[1].pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
854kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record