Leveling, Johannes ORCID: 0000-0003-0603-4191 (2010) A Comparative analysis: QA evaluation questions versus real-world queries. In: LREC 2010 Workshop on Web Logs and Question Answering (WLQA 2010), 22 May 2010, Valletta, Malta.
Abstract
This paper presents a comparative analysis of user queries to a web search engine, questions to a Q&A service (answers.com), and questions employed in question answering (QA) evaluations at TREC and CLEF. The analysis shows that user queries to search engines contain mostly content words (i.e. keywords) but lack structure words (i.e. stopwords) and capitalization. Thus, they resemble natural language input after case folding and stopword removal. In contrast, topics for QA evaluation and questions to answers.com mainly
consist of fully capitalized and syntactically well-formed questions. Classification experiments using a na¨ıve Bayes classifier show that stopwords play an important role in determining the expected answer type. A classification based on stopwords is considerably more accurate (47.5% accuracy) than a classification based on all query words (40.1% accuracy) or on content words (33.9% accuracy). To
simulate user input, questions are preprocessed by case folding and stopword removal. Additional classification experiments aim at reconstructing the syntactic wh-word frame of a question, i.e. the embedding of the interrogative word. Results indicate that this part of
questions can be reconstructed with moderate accuracy (25.7%), but for a classification problem with a much larger number of classes compared to classifying queries by expected answer type (2096 classes vs. 130 classes). Furthermore, eliminating stopwords can lead to multiple reconstructed questions with a different or with the opposite meaning (e.g. if negations or temporal restrictions are included). In conclusion, question reconstruction from short user queries can be seen as a new realistic evaluation challenge for QA systems.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Workshop |
Refereed: | Yes |
Uncontrolled Keywords: | Question answering; Queries |
Subjects: | Computer Science > Information retrieval |
DCU Faculties and Centres: | Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL) Research Initiatives and Centres > National Centre for Language Technology (NCLT) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
ID Code: | 16035 |
Deposited On: | 01 Jun 2011 14:36 by Shane Harper . Last Modified 26 Oct 2018 11:24 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
126kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record