Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Topical relevance models

Ganguly, Debasis orcid logoORCID: 0000-0003-0050-7138 (2013) Topical relevance models. PhD thesis, Dublin City University.

Abstract
An inherent characteristic of information retrieval (IR) is that the query expressing a user's information need is often multi-faceted, that is, it encapsulates more than one specific potential sub-information need. This multifacetedness of queries manifests itself as a topic distribution in the retrieved set of documents, where each document can be considered as a mixture of topics, one or more of which may correspond to the sub-information needs expressed in the query. In some specific domains of IR, such as patent prior art search, where the queries are full patent articles and the objective is to (in)validate the claims contained therein, the queries themselves are multi-topical in addition to the retrieved set of documents. The overall objective of the research described in this thesis involves investigating techniques to recognize and exploit these multi-topical characteristics of the retrieved documents and the queries in IR and relevance feedback in IR. First, we hypothesize that segments of documents in close proximity to the query terms are indicative of these segments being topically related to the query terms. An intuitive choice for the unit of such segments, in close proximity to query terms within documents, is the sentences, which characteristically represent a collection of semantically related terms. This way of utilizing term proximity through the use of sentences is empirically shown to select potentially relevant topics from among those present in a retrieved document set and thus improve relevance feedback in IR. Secondly, to handle the very long queries of patent prior art search which are essentially multi-topical in nature, we hypothesize that segmenting these queries into topically focused segments and then using these topically focused segments as separate queries for retrieval can retrieve potentially relevant documents for each of these segments. The results for each of these segments then need to be merged to obtain a final retrieval result set for the whole query. These two conceptual approaches for utilizing the topical relatedness of terms in both the retrieved documents and the queries are then integrated more formally within a single statistical generative model, called the topical relevance model (TRLM). This model utilizes the underlying multi-topical nature of both retrieved documents and the query. Moreover, the model is used as the basis for construction of a novel search interface, called TopicVis, which lets the user visualize the topic distributions in the retrieved set of documents and the query. This visualization of the topics is beneficial to the user in the following ways. Firstly, through visualization of the ranked retrieval list, TopicVis facilitates the user to choose one or more facets of interest from the query in a feedback step, after which it retrieves documents primarily composed of the selected facets at top ranks. Secondly, the system provides an access link to the first segment within a document focusing on the selected topic and also supports navigation links to subsequent segments on the same topic in other documents. The methods proposed in this thesis are evaluated on datasets from the TREC IR benchmarking workshop series, and the CLEF-IP 2010 data, a patent prior art search data set. Experimental results show that relevance feedback using sentences and segmented retrieval for patent prior art search queries significantly improve IR effectiveness for the standard ad-hoc IR and patent prior art search tasks. Moreover, the topical relevance model (TRLM), designed to encapsulate these two complementary approaches within a single framework, significantly improves IR effectiveness for both standard ad-hoc IR and patent prior art search. Furthermore, a task based user study experiment shows that novel features of topic visualization, topic-based feedback and topic-based navigation, implemented in the TopicVis interface, lead to effective and efficient task completion achieving good user satisfaction.
Metadata
Item Type:Thesis (PhD)
Date of Award:November 2013
Refereed:No
Supervisor(s):Jones, Gareth J.F.
Subjects:Computer Science > Interactive computer systems
Computer Science > Visualization
Computer Science > Information storage and retrieval systems
Computer Science > Information retrieval
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 3.0 License. View License
Funders:Science Foundation Ireland
ID Code:19406
Deposited On:26 Nov 2013 16:11 by Gareth Jones . Last Modified 25 Oct 2018 14:24
Documents

Full text available as:

[thumbnail of DGthesis.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
3MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record