Login (DCU Staff Only)
Login (DCU Staff Only)

DORAS | DCU Research Repository

Explore open access research and scholarly works from DCU

Advanced Search

Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles

Munirathnam, Venkatesh Gurram orcid logoORCID: 0000-0002-4393-9267 (2023) Multimodal spatio-temporal deep learning framework for 3D object detection in instrumented vehicles. PhD thesis, Dublin City University.

Abstract
This thesis presents the utilization of multiple modalities, such as image and lidar, to incorporate spatio-temporal information from sequence data into deep learning architectures for 3Dobject detection in instrumented vehicles. The race to autonomy in instrumented vehicles or self-driving cars has stimulated significant research in developing autonomous driver assistance systems (ADAS) technologies related explicitly to perception systems. Object detection plays a crucial role in perception systems by providing spatial information to its subsequent modules; hence, accurate detection is a significant task supporting autonomous driving. The advent of deep learning in computer vision applications and the availability of multiple sensing modalities such as 360° imaging, lidar, and radar have led to state-of-the-art 2D and 3Dobject detection architectures. Most current state-of-the-art 3D object detection frameworks consider single-frame reference. However, these methods do not utilize temporal information associated with the objects or scenes from the sequence data. Thus, the present research hypothesizes that multimodal temporal information can contribute to bridging the gap between 2D and 3D metric space by improving the accuracy of deep learning frameworks for 3D object estimations. The thesis presents understanding multimodal data representations and selecting hyper-parameters using public datasets such as KITTI and nuScenes with Frustum-ConvNet as a baseline architecture. Secondly, an attention mechanism was employed along with convolutional-LSTM to extract spatial-temporal information from sequence data to improve 3D estimations and to aid the architecture in focusing on salient lidar point cloud features. Finally, various fusion strategies are applied to fuse the modalities and temporal information into the architecture to assess its efficacy on performance and computational complexity. Overall, this thesis has established the importance and utility of multimodal systems for refined 3D object detection and proposed a complex pipeline incorporating spatial, temporal and attention mechanisms to improve specific, and general class accuracy demonstrated on key autonomous driving data sets.
Metadata
Item Type:Thesis (PhD)
Date of Award:March 2023
Refereed:No
Supervisor(s):Little, Suzanne and O'Connor, Noel E.
Subjects:Computer Science > Image processing
Computer Science > Digital video
DCU Faculties and Centres:DCU Faculties and Schools > Faculty of Engineering and Computing > School of Computing
Research Initiatives and Centres > INSIGHT Centre for Data Analytics
Use License:This item is licensed under a Creative Commons Attribution-NonCommercial-No Derivative Works 4.0 License. View License
Funders:Science Foundation Ireland via Insight Research Centre for Data Analytics, DCU
ID Code:27984
Deposited On:31 Mar 2023 09:04 by Suzanne Little . Last Modified 08 Dec 2023 15:13
Documents

Full text available as:

[thumbnail of VGM_Thesis_18213945_04_01_2023.pdf]
Preview
PDF - Requires a PDF viewer such as GSview, Xpdf or Adobe Acrobat Reader
Creative Commons: Attribution-Noncommercial-No Derivative Works 4.0
23MB
Downloads

Downloads

Downloads per month over past year

Archive Staff Only: edit this record