Semantic indexing of wearable camera images: kids’cam concepts

Smeaton, Alan F. ORCID: 0000-0003-1028-8389, McGuinness, Kevin ORCID: 0000-0003-1336-6477, Gurrin, Cathal ORCID: 0000-0003-2903-3968, Zhou, Jiang ORCID: 0000-0002-3067-8512, O'Connor, Noel E. ORCID: 0000-0002-4033-9135, Wang, Peng, Davis, Brian ORCID: 0000-0003-1336-6477, Azevedo, Lucas, Freitas, Andre, Signal, Louise N. ORCID: 0000-0003-4715-6718, Smith, Moira ORCID: 0000-0003-2085-7522, Stanley, James, Barr, Michelle, Chambers, Tim and Ní Mhurchú, Cliona ORCID: 0000-0002-1144-9167 (2016) Semantic indexing of wearable camera images: kids’cam concepts. In: Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion, 16 Oct 2016, Amsterdam, the Netherlands. ISBN 978-1-4503-4519-4/16/10

Abstract
Metadata
Downloads
Documents

[+][-]

Abstract

In order to provide content-based search on image media, including images and video, they are typically accessed based on manual or automatically assigned concepts or tags, or sometimes based on image-image similarity depending on the use case. While great progress has been made in very recent years in automatic concept detection using machine learning, we are still left with a mis-match between the semantics of the concepts we can automatically detect, and the semantics of the words used in a user’s query, for example. In this paper we report on a large collection of images from wearable cameras gathered as part of the Kids’Cam project, which have been both manually annotated from a vocabulary of 83 concepts, and automatically annotated from a vocabulary of 1,000 concepts. This collection allows us to explore issues around how language, in the form of two distinct concept vocabularies or spaces, one manually assigned and thus forming a ground-truth, is used to represent images, in our case taken using wearable cameras. It also allows us to discuss, in general terms, issues around mis-match of concepts in visual media, which derive from language mis-matches. We report the data processing we have completed on this collection and some of our initial experimentation in mapping across the two language vocabularies.

Metadata

Item Type:	Conference or Workshop Item (Paper)
Event Type:	Workshop
Refereed:	Yes
Subjects:	Computer Science > Lifelog Computer Science > Machine learning Business > Marketing Computer Science > Image processing Computer Science > Information retrieval
DCU Faculties and Centres:	DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering Research Initiatives and Centres > INSIGHT Centre for Data Analytics DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Published in:	iV&L-MM '16 Proceedings of the 2016 ACM workshop on Vision and Language Integration Meets Multimedia Fusion. . Association for Computing Machinery. ISBN 978-1-4503-4519-4/16/10
Publisher:	Association for Computing Machinery
Official URL:	http://dx.doi.org/10.1145/2983563.2983566
Copyright Information:	© ACM, 2016. This is the author's version of the work. It is posted here by permission of ACM for your personal use. Not for redistribution. The definitive version was published in http://dx.doi.org/10.1145/2983563.2983566
Use License:	This item is licensed under a Creative Commons Attribution-NonCommercial-Share Alike 3.0 License. View License
Funders:	Science Foundation Ireland, National Natural Science Foundation of China, Beijing Key Laboratory of Networked Multimedia, Health Research Council of New Zealand Programme Grant
ID Code:	21434
Deposited On:	18 Oct 2016 09:23 by Alan Smeaton . Last Modified 07 Apr 2021 13:02

Documents

Full text available as:

Preview

PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
1MB

Downloads

Downloads per month over past year

Archive Staff Only: edit this record

DORAS | DCU Research Repository

Semantic indexing of wearable camera images: kids’cam concepts

Downloads