Eskevich, Maria ORCID: 0000-0002-1242-0753, Jones, Gareth J.F. ORCID: 0000-0003-2923-8365, Larson, Martha and Ordelman, Roeland (2012) Creating a data collection for evaluating rich speech retrieval. In: The Eighth international conference on Language Resources and Evaluation (LREC) 2012, 21-27 May 2012, Istanbul, Turkey.
Abstract
We describe the development of a test collection for the investigation of speech retrieval beyond identification of relevant content. This collection focuses on satisfying user information needs for queries associated with specific types of speech acts. The collection is based on an archive of the Internet video from Internet video sharing platform (blip.tv), and was provided by the MediaEval benchmarking initiative. A crowdsourcing approach was used to identify segments in the video data which contain speech acts, to create a description of the video containing the act and to generate search queries designed to refind this speech act. We describe and reflect on our experiences with crowdsourcing this test collection using the Amazon Mechanical Turk platform. We highlight the challenges of constructing this dataset, including the selection of the data source, design of the crowdsouring task and the specification of queries and relevant items.
Metadata
Item Type: | Conference or Workshop Item (Paper) |
---|---|
Event Type: | Conference |
Refereed: | Yes |
Uncontrolled Keywords: | Speech Search; Speech Collection Creation; Speech Retrieval; Crowdsourcing |
Subjects: | Computer Science > Multimedia systems Computer Science > Information retrieval |
DCU Faculties and Centres: | Research Initiatives and Centres > Centre for Digital Video Processing (CDVP) Research Initiatives and Centres > Centre for Next Generation Localisation (CNGL) DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing |
Published in: | Proceedings of LREC 2012. . |
Official URL: | http://www.lrec-conf.org/lrec2012/ |
Use License: | This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License |
Funders: | Science Foundation Ireland, European Framework Programme 7 |
ID Code: | 16901 |
Deposited On: | 15 Jun 2012 08:55 by Gareth Jones . Last Modified 10 Oct 2018 09:20 |
Documents
Full text available as:
Preview |
PDF
- Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
217kB |
Downloads
Downloads
Downloads per month over past year
Archive Staff Only: edit this record