вход по аккаунту



код для вставкиСкачать
Combining Features For RGB-D object
Wasif Khan
Ekachai Phaisangittisagul
Department of Electrical Engineering
Kasetsart University, Bangkhen Campus,
Bangkok, Thailand.
Department of Electrical Engineering
Faculty of Engineering, Kasetsart University
Bangkok, Thailand.
Luqman Ali
Duangrat Gansawat
Department of Electrical Engineering
Kasetsart University, Bangkhen Campus,
Bangkok, Thailand
Human Computer Communication Research Unit,
NECTEC, Pathumthani, Thailand
Itsuo Kumazawa
Imaging Science and Engineering Laboratory,
Tokyo Institute of technology, Japan
Abstract — Object category and instance recognition have
received much attention in this era of modern technologies.
Advanced image sensing technologies provide high resolution
color and depth synchronized videos such as RGB-D (Kinect
style) camera. At present, various features extraction schemes
are introduced to improve classification performance.
Extracting useful features from both color and depth images
have recently gained much attention in this research area. In
this paper, we proposed an approach to create new features
using a feature combination of various feature extraction
techniques: color autrocorrelogram, wavelet moments, local
binary pattern (LBP) and principal component analysis (PCA)
for RGB-D data. The experiments on benchmark dataset shows
that the new features obtained by the proposed method using knearest neighbor (k-NN) classifier provide promising
classification results.
Keywords— k-Nearest Neighbors (k-NN), Local Binary
Pattern (LBP), Principal Component Analysis (PCA), RGB-D
Object Recognition.
Object recognition plays a vital role in machine learning,
computer vision, and robotics in the modern technologies. It
can be referred as a method to identify an object in a digital
image or video to a pre-defined class by using features or
appearance of the object. This task, unlike human eye, is
relatively difficult and challenging, due to changes of
978-1-5090-4666-9/17/$31.00 ©2017 IEEE
viewpoint, intensity of light, etc. The main task is to improve
the recognition performance of the system. In today research
environment, researchers try different attempts to use other
various features of the objects such as spatial geometry
information and shape [7]. Despite gaining better
performance, there are still limitations as from 2D RGB
images only have limited amount of information and lacking
information such as spatial geometry and shape of the object.
The spatial geometry and shape features may not be reliable
due to various variations in RGB images.
In order to overcome this problem, depth camera such as
Microsoft Kinect is recently introduced. This camera is able
to provide high quality depth of the object and RGB images.
An infrared (IR) sensor is used to capture depth information
while at the same time RGB camera is used to take the RGB
images. RGB-D images provide much more information as
compared to RGB image alone. Depth images provide
sufficient information about the spatial geometry and shape
of the object in scenes [7]. As the depth and RGB images
contain different information about the object, the
challenging and interesting part is how to combine both depth
and RGB images so that better features can be created. The
major problem in machine learning and computer vision
applications is to improve the object recognition and
classification performance. In this field several work [1-7]
have been done in order to improve object recognition
High performance object recognition system can be exercised
by using features extracting schemes and by using different
classification algorithms. Feature extraction is important in
object recognition because an image is too large to be
processed by any classifier and there is also redundant
information present in an image, hence from both depth and
RGB images useful features are extracted in order to
overcome this problem. In image processing various feature
extraction schemes are used for both RGB and depth images
such as HMP [1], Kernel Descriptors [2], Histogram of
Oriented Gradients (HOG) [11], PCA [8], Convolutional
neural networks (CNN) [10] etc. Then, the extracted features
are then passed through a classifier to measure the
classification performance of the object recognition system.
Various classifier can be used, for example, linear support
vector machines (SVM) [1], deep Regularized
Reconstruction Independent Component Analysis network
R2ICA [3], convolutional- recursive neural networks CNNRNN [7], k-NN [12] etc.
This paper aims to improve not only the feature extraction
and classification time but also to improve the object
recognition performance by using different feature extraction
schemes such PCA, Wavelet transforms, Auto-Correlogram
and LBP for both depth and RGB images. The accuracy of the
object recognition depends on the quality of training data as
well as the learning algorithm. The algorithms presented in
previous papers [1-6] took too much time and too complex as
compared to our approach. The remaining part of the paper is
organized as follows. Section II provides details about related
works that have been done previously in object recognition
system. Section III describes the proposed methodology used
in this study; then followed by experimental results in section
IV. Conclusion and discussion are drawn in section V.
Literature review shows that [4] introduced largest
hierarchical multi-view object dataset. This dataset was made
by using RGB-D camera and they also present object
recognition performance using state-of-the-art features like
spin images and SIFT [4]. Three state of the art classifiers;
linear support vector machine (LinSVM), random forest (RF)
and Gaussian kernel support vector machine (KSVM) were
used for category and instance recognition performance. In
[5] the authors used five depth kernel descriptors and their
combination for Depth maps and 3D point clouds, before [5]
kernel descriptors were used only for RGB images. They
presented better results than previous papers. In [6] authors
proposed hierarchical matching pursuit (HMP), which built a
feature hierarchy layer-by-layer using an efficient matching
pursuit encoder [6]. HMP has three main parts: spatial
pyramid pooling, contrast normalization and batch (tree)
orthogonal matching pursuit. Although it outperformed all
the previous papers, it has still two main limitations. In order
to overcome the problem [1] uses sparse coding to learn
hierarchical feature representations from raw RGB-D data in
an unsupervised way [1]. In [1] the authors proposed two
innovations to make the approach suitable for RGB-D based
object recognition. Firstly, feature learning was implemented
to both color and depth images because in their previous work
[6] they used only grayscale images which is not efficiently
in object recognition. Different qualities such as color,
appearance details and depth channels have been included
and can also improve robustness of a recognition system.
Secondly, they extracted features not only from the top of the
feature hierarchy, but also from lower layers [1]. In paper [7],
the author proposed semi-supervised learning by using
unsupervised convolutional recursive neural networks
(CNN-RNN) to learn a set of basis features for RGB and
depth images respectively. CNN-RNN is faster than HMP [1]
and does not need additional features such as surface
normals [7].
Methodology of the proposed algorithm consists of the
following steps as shown in fig. 1.
• Data preprocessing
• Features Extractions
• Classification
A. Data preprocessing
The database used in this paper is Washington RGB-D
object dataset [4]. It consists of approximately 45,000 RGBD images with 300 different objects (instances) and they are
divided into 51 different categories. This dataset is chosen
because it contains both textured and textured-less objects
and during light condition the data frames of RGB-D show
large changes due to variations in luminance. Examples of
textured objects are soda can, food box to name a few and
texture-less are fruits, bowls etc. as described in fig. 2(a), 2(b)
In this paper results will be evaluated on two levels: one is
category level and other is instance level. For example,
recognizing a ball is category recognition but recognition it
as a specific “football”, “basketball” etc. is instance level
recognition. First of all, we reduce the size of all images in
the dataset into dimension of (384×256) pixels. Some filters
are applied to denoise the images such as bilateral filter.
B. Feature extraction
Feature extraction starts with applying color
autocorrelogram to the RGB images. Color autocorrelogram
is used in this paper because these features are easy to extract;
its size is small as compared to other features. It has spatial
correlation of colors, and for the description of the global
distribution of spatial correlation of colors distribution [9]. A
color correlogrom expresses how the spatial correlation of
pairs of colors varies with distance [9]. After this, the image
is converted to grayscale in order to compute wavelet
moments by applying wavelet transform.
As the above extracted features are not enough for object
classification. Some additional features are required. For this
purpose, LBP features are extracted.
Recently, LBP features got a lot of attention in various
applications such as image retrieval, image segmentation to
name a few because it enables the user to capture each and
every details present in an image.
In LBP feature extraction, the operator task is to label the
pixels of an image, which is done by thresholding the
neighbors (3*3) of each pixel value with the central pixel and
taking the result in binary representation as shown in fig. 3.
LBP operator can be used for different neighbor’s sizes such
4,16 and so on. Some of the best qualities of these features
are the tolerance to monotonic illumination variation and
their simplicity in computation.
All these features extracted from RGB images are
concatenated and store in vectors. Then, Mat file is generated
by assigning labels to the instances and categories present in
the dataset. This file will be used in classification process
later on.
To extract the feature from depth images PCA is used. The
step by step procedure is shown in fig. 4. First all the images
are resized to an equal dimension of 384×256. After resizing,
the images are stored in a column vector M of size 98304
dimensions. All of the depth images are arranged in a data
matrix (D) in which each column represents a single depth
image. The purpose of the formation of the data matrix is to
reduce the column dimension from 98304 to variables L=99
for each observation in the data matrix. First, the mean of the
data matrix is found and stored in a vector. After the mean
calculation, the center matrix (CM) is formed by the
subtraction of mean from the data matrix. A covariance
matrix (V) is formed after center matrix as shown in the
equation (1) followed by Eigen faces, Feature vector and final
vector in equation 2,3, and 4 respectively.
Cov(CM) = CM.CMT
Eigen faces = CM ×V
Feature vector = (eig1, eig2…… eigL)
Feature vector= feature vectorT×CM
Fig. 1. The proposed method
Fig. 2 (a): Textured objects
Fig. 2(b): Texture-less objects
196 192 189
193 191 186
189 189 188
Fig. 3. LBP features extractions operation.
The feature vector obtained from depth images is used for the
classification of depth images.
To the end, both RGB and depth images feature vectors are
concatenated and then are stored in the Mat file to evaluate
the classification accuracy of RGB-D object dataset.
C. Classification
In this paper, k-nearest neighbor (k-NN) classifier is used
for the purpose of classification. The concept of k-NN
algorithm assigns the class label to the query sample based
on the majority vote among its k nearest neighbors. The mat
file obtained from feature extraction is given to the classifier
for the purpose of classification. In the mat file labels are
assign to each category and instances. The three separate
files prepared from RGB, Depth and RGBD images are given
to the classifier. The classifier gives results on the basis of
category and instance recognition. Beside the accuracy of
classifier time and processing speed of the classification are
the two criterion to be observed.
The testing results for k-NN classifier are shown below
in table 1-4. While in table 5, the comparison between our
approach and state of the art HMP method [1] are presented.
Each table shows results for different folds of data. Training
and testing data is arranged in different folds such as 90 10
folds means 90% of data is for classifier training and 10% is
for classifier testing, Similarly for 80 20 folds, 70 30 and 50
50 folds.
Table 1: k-NN classifier Experimental Results for 90-10 folds training and
testing data
Category level recognition
Instances level recognition
Table 2: k-NN classifier Experimental Results for 80-20 folds training and
testing data
Category level recognition
Instances level recognition
Table 3: k-NN classifier Experimental Results for 70-30 folds training and
testing data
Category level recognition
Instances level recognition
Table 4: k-NN classifier Experimental Results for 50-50 folds training and
testing data
Category level recognition
Instances level recognition
Fig. 4. Block diagram of PCA
In the below table 5, we have presented the comparison
between our approach and HMP [1] on 70 30 folds. The
experiments show that our approach is more suitable than
HMP. As it can be seen from the results that our approach
outperformed state of the art HMP on this dataset division for
both RGB-D category and RGB-D instance recognition
specially in case of depth images.
The accuracy of our system in RGB-D instance recognition
is 84.5 while the accuracy of HMP is 81.2, similarly, our
accuracy is better in RGB-D category recognition as shown
in table 5. Similarly, below table shows that the test time for
one image in HMP [1] was 0.51 sec while the test time in this
work is 0.37 sec.
Table 5: Comparison of our approach and HMP on 30 70
training and testing data.
Category level
Instances level
In this paper we proposed combination of different features
for RGB-D object classification performance. From the
above results we concluded that combination of such features
provides better results in relatively lesser time than state of
the art approach HMP. The processing time is low as
compared to [1-5] and the speed is high. Which makes this
system to be practical in low cost service robots. Motivated
by the fact, classifiers like CNN are too much time
consuming so simple classifier such as k-NN is used for
classification. The results also reveal that this approach gives
us 8% better results in Depth category recognition while
almost 2% in depth instance recognition when compared to
HMP on 70 30 folds of training and testing respectively.
While in case of category recognition the results of proposed
system provide promising result.
This research is financially supported by Thailand
Advance Institute of Science and Technology (TAIST),
National science and Technology Development Agency
(NSTDA), Tokyo Institute of Technology, Kasetsart
University (KU), under the TAIST TOKYO TECH program.
Bo, Liefeng, Xiaofeng Ren, and Dieter Fox. “Unsupervised feature
learning for RGB-D based object recognition”. In Experimental
Robotics, pp. 387-402. Springer International Publishing, 2013.
J Bo, Liefeng, Kevin Lai, Xiaofeng Ren, and Dieter Fox. “Object
recognition with hierarchical kernel descriptors”. In Computer Vision
and Pattern Recognition (CVPR), 2011 IEEE Conference on, pp. 17291736. IEEE, 2011.
Jhuo, I-Hong, Shenghua Gao, Liansheng Zhuang, D. T. Lee, and Yi
Ma. “Unsupervised feature learning for RGB-D image classification”.
In Asian Conference on Computer Vision, pp. 276-289. Springer
International Publishing, 2014.
Lai, Kevin, Liefeng Bo, Xiaofeng Ren, and Dieter Fox. "A large-scale
hierarchical multi-view rgb-d object dataset." In Robotics and
Automation (ICRA), 2011 IEEE International Conference on, pp.
1817-1824. IEEE, 2011..
Bo, Liefeng, Xiaofeng Ren, and Dieter Fox. “Depth kernel descriptors
for object recognition.” In 2011 IEEE/RSJ International Conference on
Intelligent Robots and Systems, pp. 821-826. IEEE, 2011.
Bo, Liefeng, Xiaofeng Ren, and Dieter Fox. “Hierarchical matching
pursuit for image classification: Architecture and fast algorithms.” In
Advances in neural information processing systems, pp. 2115-2123.
Y. Cheng, X. Zhao, K. Huang and T. Tan, "Semi-supervised Learning
for RGB-D Object Recognition," 2014 22nd International Conference
on Pattern Recognition, Stockholm, 2014, pp. 2377-2382.
Ali, L., A. Sarwar, A. Jan, M. I. Khattak, and M. Shafi. "Identification
of Seamless Connection in Merged Images using Evolutionary
Artificial Neural Network (EANN)." University of Engineering and
Technology Taxila. Technical Journal 20, no. 2 (2015): 34
Huang, Jing, S. Ravi Kumar, MandarMitra, Wei-Jing Zhu, and
RaminZabih. "Image indexing using color correlograms." In Computer
Vision and Pattern Recognition, 1997. Proceedings., 1997 IEEE
Computer Society Conference on, pp. 762-768. IEEE, 1997.
[10] Gupta, Saurabh, Ross Girshick, Pablo Arbeláez, and Jitendra Malik.
"Learning rich features from RGB-D images for object detection and
segmentation." In European Conference on Computer Vision, pp. 345360. Springer International Publishing, 2014.
[11] Dalal, Navneet, and Bill Triggs. "Histograms of oriented gradients for
human detection." In 2005 IEEE Computer Society Conference on
Computer Vision and Pattern Recognition (CVPR'05), vol. 1, pp. 886893. IEEE, 2005
[12] Sarode, Nivedita S., and A. M. Patil. "Iris Recognition using LBP with
Classifiers-KNN and NB."
Без категории
Размер файла
285 Кб
2017, ieecon, 8075877
Пожаловаться на содержимое документа