Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation

Leveling, Johannes orcid logoORCID: 0000-0003-0603-4191, Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138 and Jones, Gareth J.F. orcid logoORCID: 0000-0003-2923-8365 (2010) DCU@FIRE2010: term conflation, blind relevance feedback, and cross-language IR with manual and automatic query translation. In: FIRE 2010 - Forum for Information Retrieval Evaluation, 19-21 February 2010, Gandhinagar, India.

Abstract
For the first participation of Dublin City University (DCU) in the FIRE 2010 evaluation campaign, information retrieval (IR) experiments on English, Bengali, Hindi, and Marathi documents were performed to investigate term conation (different stemming approaches and indexing word prefixes), blind relevance feedback, and manual and automatic query translation. The experiments are based on BM25 and on language modeling (LM) for IR. Results show that term conation always improves mean average precision (MAP) compared to indexing unprocessed word forms, but different approaches seem to work best for different languages. For example, in monolingual Marathi experiments indexing 5-prefixes outperforms our corpus-based stemmer; in Hindi, the corpus-based stemmer achieves a higher MAP. For Bengali, the LM retrieval model achieves a much higher MAP than BM25 (0.4944 vs. 0.4526). In all experiments using BM25, blind relevance feedback yields considerably higher MAP in comparison to experiments without it. Bilingual IR experiments (English!Bengali and English!Hindi) are based on query translations obtained from native speakers and the Google translate web service. For the automatically translated queries, MAP is slightly (but not significantly) lower compared to experiments with manual query translations. The bilingual English!Bengali (English!Hindi) experiments achieve 81.7%-83.3% (78.0%-80.6%) of the best corresponding monolingual experiments.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Subjects:Computer Science > Information retrieval
DCU Faculties and Centres:Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL)
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Official URL:http://www.isical.ac.in/~fire/2010/working_notes.h...
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:15842
Deposited On:22 Nov 2010 13:47 by Shane Harper . Last Modified 25 Oct 2018 10:59
Documents

Full text available as:

[thumbnail of DCU@FIRE2010_Term_Conflation,_Blind_Relevance.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
179kB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record