Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal (2019) Viewpoint invariant semantic object and scene categorization with RGB-D sensors. Autonomous Robots, 43 (4). pp. 1005-1022. ISSN 0929-5593 E-ISSN 1573-7527 (In Press)
PDF (Evidence from publishers' website for MYRA)
- Published Version
Restricted to Repository staff only Download (3MB) | Request a copy |
||
|
PDF (Scopus)
- Supplemental Material
Download (176kB) | Preview |
|
|
PDF (WOS)
Download (112kB) | Preview |
Abstract
Understanding the semantics of objects and scenes using multi-modal RGB-D sensors serves many robotics applications. Key challenges for accurate RGB-D image recognition are the scarcity of training data, variations due to viewpoint changes and the heterogeneous nature of the data. We address these problems and propose a generic deep learning framework based on a pre-trained convolutional neural network, as a feature extractor for both the colour and depth channels. We propose a rich multi-scale feature representation, referred to as convolutional hypercube pyramid (HP-CNN), that is able to encode discriminative information from the convolutional tensors at different levels of detail. We also present a technique to fuse the proposed HP-CNN with the activations of fully connected neurons based on an extreme learning machine classifier in a late fusion scheme which leads to a highly discriminative and compact representation. To further improve performance, we devise HP-CNN-T which is a view-invariant descriptor extracted from a multi-view 3D object pose (M3DOP) model. M3DOP is learned from over 140,000 RGB-D images that are synthetically generated by rendering CAD models from different viewpoints. Extensive evaluations on four RGB-D object and scene recognition datasets demonstrate that our HP-CNN and HP-CNN-T consistently outperforms state-of-the-art methods for several recognition tasks by a significant margin.
Item Type: | Article (Journal) |
---|---|
Additional Information: | 8293/64696 |
Uncontrolled Keywords: | Object categorization; Scene recognition; RGB-D image; Multi-modal deep learning |
Subjects: | Q Science > Q Science (General) > Q300 Cybernetics Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): | Kulliyyah of Engineering > Department of Mechatronics Engineering |
Depositing User: | Dr. Hasan Firdaus Mohd Zaki |
Date Deposited: | 24 Jul 2018 12:17 |
Last Modified: | 01 Aug 2019 10:25 |
URI: | http://irep.iium.edu.my/id/eprint/64696 |
Actions (login required)
View Item |