Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Temporal bilinear encoding network of audio-visual features at low sampling rates

Hu, Feiyan orcid logoORCID: 0000-0001-7451-6438, Mohedano, Eva, O'Connor, Noel E. orcid logoORCID: 0000-0002-4033-9135 and McGuinness, Kevin orcid logoORCID: 0000-0003-1336-6477 (2021) Temporal bilinear encoding network of audio-visual features at low sampling rates. In: 16th International Conference on Computer Vision Theory and Applications - VISAPP 2021, 8-10 Feb 2021, Vienna, Austria (Online). ISBN 978-989-758-488-6

Abstract
Current deep learning based video classification architectures are typically trained end-to-end on large volumes of data and require extensive computational resources. This paper aims to exploit audio-visual information in video classification with a 1 frame per second sampling rate. We propose Temporal Bilinear Encoding Networks (TBEN) for encoding both audio and visual long range temporal information using bilinear pooling and demonstrate bilinear pooling is better than average pooling on the temporal dimension for videos with low sampling rate. We also embed the label hierarchy in TBEN to further improve the robustness of the classifier. Experiments on the FGA240 fine-grained classification dataset using TBEN achieve a new state-of-the-art (hit@1=47.95%). We also exploit the possibility of incorporating TBEN with multiple decoupled modalities like visual semantic and motion features: experiments on UCF101 sampled at 1 FPS achieve close to state-of-the-art accuracy (hit@1=91.03%) while requiring significantly less computational resources than competing approaches for both training and prediction.
Metadata
Item Type:Conference or Workshop Item (Paper)
Event Type:Conference
Refereed:Yes
Uncontrolled Keywords:Video classification; bilinear pooling; Action classification; Deep learning; Audio-visual; Compact Bilinear Pooling
Subjects:Computer Science > Artificial intelligence
Computer Science > Image processing
Computer Science > Machine learning
Computer Science > Digital video
Computer Science > Video compression
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
DCU Faculties and Schools > Faculty of Engineering and Computing > School of Electronic Engineering
Research Initiatives and Centres > INSIGHT Centre for Data Analytics
Published in: Proceedings of the 16th International Joint Conference on Computer Vision, Imaging and Computer Graphics Theory and Applications: VISAPP,. 5. SciTePress. ISBN 978-989-758-488-6
Publisher:SciTePress
Official URL:http://dx.doi.org/10.5220/0010337306370644
Copyright Information:© 2021 The Authors (CC BY-NC-ND 4.0)
Funders:Science Foundation Ireland (SFI) under grant number SFI/15/SIRG/3283 and SFI/12/RC/2289_P2.
ID Code:26253
Deposited On:13 Sep 2021 10:16 by Feiyan Hu . Last Modified 13 Sep 2021 10:16
Documents

Full text available as:

[thumbnail of VISAPP_2021-2.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
217kB
Downloads

Downloads

Downloads per month over past year

Available Versions of this Item

Archive Staff Only: edit this record