Mohd Zaki, Hasan Firdaus and Shafait, Faisal and Mian, Ajmal (2016) Convolutional hypercube pyramid for accurate RGB-D object category and instance recognition. In: 2016 IEEE International Conference on Robotics and Automation (ICRA), 16-21 May 2016, Stockholm, Sweden.
PDF
- Published Version
Restricted to Repository staff only Download (1MB) | Request a copy |
|
PDF (scopus)
- Supplemental Material
Restricted to Repository staff only Download (61kB) | Request a copy |
Abstract
Deep learning based methods have achieved unprecedented success in solving several computer vision problems involving RGB images. However, this level of success is yet to be seen on RGB-D images owing to two major challenges in this domain: training data deficiency and multi-modality input dissimilarity. We present an RGB-D object recognition framework that addresses these two key challenges by effectively embedding depth and point cloud data into the RGB domain. We employ a convolutional neural network (CNN) pre-trained on RGB data as a feature extractor for both color and depth channels and propose a rich coarse-to-fine feature representation scheme, coined Hypercube Pyramid, that is able to capture discriminatory information at different levels of detail. Finally, we present a novel fusion scheme to combine the Hypercube Pyramid features with the activations of fully connected neurons to construct a compact representation prior to classification. By employing Extreme Learning Machines (ELM) as non-linear classifiers, we show that the proposed method outperforms ten state-of-the-art algorithms for several tasks in terms of recognition accuracy on the benchmark Washington RGB-D and 2D3D object datasets by a large margin (upto 50% reduction in error rate).
Item Type: | Conference or Workshop Item (Plenary Papers) |
---|---|
Additional Information: | 8293/60177 |
Uncontrolled Keywords: | category theory;computer vision;convolution;image classification;image colour analysis;image fusion;image representation;learning (artificial intelligence);neural nets;object recognition;CNN;ELM;RGB-D images;RGB-D object category;RGB-D object recognition;classification;coarse-to-fine feature representation;computer vision;convolutional hypercube pyramid;convolutional neural network;deep learning;extreme learning machines;fusion scheme;instance recognition;multimodality input dissimilarity;nonlinear classifiers;point cloud data;training data deficiency;Feature extraction;Hypercubes;Image color analysis;Object recognition;Robots;Three-dimensional displays;Training |
Subjects: | Q Science > QA Mathematics > QA75 Electronic computers. Computer science |
Kulliyyahs/Centres/Divisions/Institutes (Can select more than one option. Press CONTROL button): | Kulliyyah of Engineering > Department of Mechatronics Engineering |
Depositing User: | Dr. Hasan Firdaus Mohd Zaki |
Date Deposited: | 06 Aug 2018 16:13 |
Last Modified: | 06 Aug 2018 16:13 |
URI: | http://irep.iium.edu.my/id/eprint/60177 |
Actions (login required)
View Item |