close

Вход

Забыли?

вход по аккаунту

?

EMBC.2017.8037461

код для вставкиСкачать
Convolutional neural network classifier for distinguishing Barrett’s
esophagus and neoplasia endomicroscopy images
Jisu Hong, Bo-yong Park, and Hyunjin Park*

Abstract— Barrett’s esophagus is a diseased condition with
abnormal changes of the cells in the esophagus. Intestinal
metaplasia (IM) and gastric metaplasia (GM) are two
sub-classes of Barrett’s esophagus. As IM can progress to the
esophageal cancer, the neoplasia (NPL), developing methods for
classifying between IM and GM are important issues in clinical
practice. We adopted a deep learning (DL) algorithm to classify
three conditions of IM, GM, and NPL based on endimicroscopy
images. We constructed a convolutional neural network (CNN)
architecture to distinguish among three classes. A total of 262
endomicroscopy imaging data of Barrett’s esophagus were
obtained from the international symposium on biomedical
imaging (ISBI) 2016 challenge. 155 IM, 26 GM and 55 NPL
cases were used to train the architecture. We implemented
image distortion to augment the sample size of the training data.
We tested our proposed architecture using the 26 test images
that include 17 IM, 4 GM and 5 NPL cases. The classification
accuracy was 80.77%.
Our results suggest that CNN
architecture could be used as a good classifier for distinguishing
endomicroscopy imaging data of Barrett’s esophagus.
I. INTRODUCTION
Barrett’s esophagus is one of the gastroesophageal reflux
disease where normal tissue in the esophagus is replaced with
the tissue that resembles intestinal tissue [1], [2]. The
intestinal metaplasia (IM) and gastric metaplasia (GM) are the
sub-classes of the Barrett’s esophagus, and IM is a sub-class
that could be progressed to the neoplasia (NPL), the
esophageal cancer [1], [2]. As distinguishing between IM and
GM is a difficult task, developing effective machine learning
algorithms for classifying IM and GM is critical.
Many previous studies adopted a simple classifier such as
support vector machine (SVM) to distinguish sub-classes
within a given disease state [3]–[5]. Park and colleagues
distinguished the sub-types of attention deficit hyperactivity
disorder (ADHD) patients using linear SVM framework with
high accuracy and proved its usefulness [3]. However, such a
simple classifier has disadvantages that the users need to
extract features manually. Additionally, it might not yield high
performance when complex images are given as inputs.
In recent classification research, the “deep learning” (DL)
algorithms have been widely adopted as the algorithm can
yield high performances [6], [7]. The DL algorithms consist of
two
representative
steps
of
feed-forward
and
back-propagation [6]–[8]. In feed-forward step, the DL
algorithms extract numerous features using various filters. It is
computed by using a series of convolution and pooling layers
[6]–[8]. In the final layer, the errors are calculated by
comparing the output labels from the architecture and the
ground truth labels. The back-propagation step is then
performed to update the weights of the filters in each layer.
The objective of the back-propagation is to minimize the final
error rates [6]–[8]. The convolutional neural network (CNN)
is one of the most commonly used architecture in DL studies
[6]–[10]. It uses the hierarchical convolutional layers to
extract features and has shown high performance on image
classification [6]–[8].
In addition to the classification problems, DL algorithms
could be used for object recognition and image segmentation
problems [9]–[12]. Previous studies adopted the basic
architecture and hyper-parameters of the CNN and used it for
face detection [9], [10]. Other studies adopted CNN
architecture to segment objects, animals, humans, and brain
[11], [12]. All of these DL studies showed high performance
on object detection and segmentation [9]–[12].
In this study, we proposed an architecture based on the 2-D
CNN architecture and classified the esophageal
endomicroscopy images into IM, GM and NPL sub-classes.
As the CNN requires enormous input data for training, we
applied random image distortions to enlarge the number of
training esophageal images. The performance of the trainied
CNN architecture was tested using the independent data set.
II. METHODS
A. Imaging data
A total of 262 endomicroscopy imaging data of Barrett’s
esophagus were obtained from the international symposium
on biomedical imaging (ISBI) 2016 challenge database. The
imaging data consisted of three sub-classes of 155 IM, 26 GM
and 55 NPL. Among 262 imaging data, 236 images were
assigned to the training set and the remaining 26 images were
*Research supported by the Institute for Basic Science (grant number
IBS-R015-D1) and National Research Foundation of Korea (grant numbers
NRF-2016H1A2A1907833 and NRF-2016R1A2B4008545).
H. Park is with the School of Electronic and Electrical Engineering,
Sungkyunkwan University and the Center for Neuroscience Imaging
Research (CNIR), Institute for Basic Science, Suwon, Korea (corresponding
author; phone: 82-31-299-4956; e-mail: hyunjinp@skku.edu).
J. Hong and B. Y. Park are with the Department of Electronic, Electrical
and Computer Engineering, Sungkyunkwan University, and the Center for
Neuroscience Imaging Research (CNIR), Institute for Basic Science, Suwon,
Korea
(phone:
82-31-299-4107;
e-mail:
bal25ne@skku.edu,
by6860@skku.edu).
978-1-5090-2809-2/17/$31.00 ©2017 IEEE
2892
Figure 1. The sample images of (A) IM, (B) GM and (C) NPL
sub-classes.
assigned to the test set. The training data consisted of 155 IM,
26 GM and 55 NPL, and the test data consisted of 17 IM, 4
GM and 5 NPL. Three representative images of each sub-class
were shown in Fig. 1.
B. Data augmentation
Due to the insufficient number of training data, five
different types of image distortion methods were applied to
increase the number of training data. Image distortion methods
include random scaling, random flip, random rotation, random
brightness, and random contrast (Fig. 2). In random scaling
step, Gaussian random numbers with mean of 1.0 and standard
deviation of 0.2 were used as the scaling factors. The scaled
images were randomly flipped vertically or horizontally and
then they were rotated in counter-clockwise direction by 0º,
90º, 180º and 270º. In the image brightness adjusting step,
random float numbers under 30 were added to all pixels of the
images. Finally, the image contrasts were adjusted with the
contrast factor between 0.8 and 1.2. In this step, the image
pixel values, , were adjusted using (1).

  = ( − ()) × ( ) + () 
, where  means the image matrix,  denotes the
expectation operator, and  means the i-th pixel value in
image matrix .
C. Proposed CNN architecture
We designed the CNN architecture as presented in Fig. 3.
The architecture was composed of four convolutional layers,
two max-pooling layers, and two fully-connected (FC) layers
(Fig. 3). The size of the input image for the architecture was
128 × 128 × 3. The z-dimension denotes the three color
channels of red, green, and blue. The size of the kernels in the
convolutional layers was 3× 3. The convolution processes
were performed after padding the images with zero to preserve
the original image size. The activation function of the rectified
linear unit (ReLU) was applied only in the second and fourth
convolutional layers. The ReLU function is more effective for
avoiding overfitting problems than sigmoid function [6], [7].
The max-pooling with stride of 2×2 was applied in the second
and fourth layers to obtain spatially invariant outputs and to
reduce the computational loads. The two FC layers were
attached after the fourth convolutional layer. The output labels
Figure 2. The five image distortion methods: (A) random scaling, (B)
random flip, (C) random rotation, (D) random brightness, and (E)
random contrast.
were determined by applying the logistic softmax function.
The detailed information of the hyper-parameters of our
proposed architecture are described in Table 1.
D. Training the CNN architecture
The augmented images were used to train our proposed
CNN architecture. The training step was repeated 15,000
times and 20 images were processed simultaneously in each
step. The cross-entropy was used as the cost function for
Figure 3. The overview of the propsed CNN architecture.
2893
TABLE I.
B. Processing speed
The whole DL process in this study was performed using
four GTX 1080 GPU with 8GB memory (total of 32GB
memory). The training procedure took approximately 40
minutes.
DETAILED INFORMATION OF THE PROPOSED CNN
ARCHITECTURE
Layer
Type
Filter
size
Stride
# of
filters
FC
units
Input
size
Layer
1
Conv.
3×3
1×1
16
-
128×128×
32
Layer
2
Conv.
3×3
1×1
32
-
128×128×
16
-
ReLU
Max-p.
2×2
2×2
-
-
128×128×
32
Layer
3
Conv.
3×3
1×1
48
-
64×64×32
Layer
4
Conv.
3×3
1×1
64
-
64×64×48
-
ReLU
Max-p.
2×2
2×2
-
-
64×64×64
FC 1
FC
-
-
1024
65536
FC 2
FC
-
-
1024
1024
IV. DISCUSSION
In this study, we proposed a simple CNN architecture to
distinguish among IM, GM, and NPL sub-classes. As the
number of input images were insufficient, data augmentation
steps were performed by applying distortions to all the
imaging data. Our proposed CNN architecture was composed
of four convolutional, two max-pooling, and two FC layers.
The ReLU was adopted as an activation function and the
softmax classifier was used for predicting the labels of the
input images. Our proposed CNN architecture yielded the
classification accuracy of 80.77% among IM, GM, and NPL
sub-classes. As described in Table 2, our CNN architecture
distinguished the IM and NPL with high accuracy, but could
not distinguish the GM. The small number of test images
belonging to GM sub-class might be the reason of such low
accuracy.
computing error rates and the adaptive subgradient method
was used as the optimizer to reduce the mean cross-entropy.
The initial learning rate was set to 1 − 05 and it decreased
with the rate of 1/10 in every step.
E. Validation of the trained CNN architecture
The trained CNN architeture was validated using the test
data set. In the validation step, all the dropout rates were set to
1.0. The performance of the proposed CNN architecture on the
test data set was calculated by comparing the output labels and
the ground truth labels.
III. RESULTS
A. Classification performance
The trained CNN architecture was tested on the 26
independent test data set. The output labels of the IM from the
proposed CNN totally matched with the ground truth labels
and those of NPL were almost matched with the ground truth
labels (Table 2). However, the output labels of GM were
totally mislabeled (Table 2). Our proposed CNN architecture
yielded the total classification accuracy of 80.77% (Table 2).
The details of the classification result are described in Table 2.
TABLE II.
V. CONCLUSION
We classified among IM, GM, and NPL sub-classes based
on the CNN architecture. Our proposed CNN architecture
could not distinguish the GM, but IM and NPL were well
distinguished. This study suggest that the CNN architecture
could be used to classify endomicroscopy imaging data of
esophagus.
CLASSIFICATION RESULTS AMONG IM, GM, AND NPL
USING THE PROPOSED CNN ARCHITECTURE.
True labels
Predicted
labels
Our study has a few limitations. First, we used small scale
imaging data. The CNN algorithm requires a large number of
input data but only 236 images were used for training the CNN
architecture. We performed the data augmentation procedure
to increase the number of input data, but it could still be
insufficient. We will collect large scale data for training the
architecture in the future studies. Second, we adjusted the
order of the layers to find optimal construct of the architecture.
However, the classification results did not change a lot and the
highest accuracy was 80.77% for distinguishing IM, GM, and
NPL. We tried to add additional layers but we could not
adequately test all possible combinations of the layers due to
the expensive computational costs. Combining additional
layers and arranging them in various ways might result in
better classification results.
IM
GM
NPL
IM
17
4
1
GM
0
0
0
NPL
0
0
4
100
0
80
Accuracy (%)
Total accuracy (%)
ACKNOWLEDGMENT
This work was supported by the Institute for Basic
Science (grant number IBS-R015-D1). This work was also
supported by National Research Foundation of Korea (grant
numbers
NRF-2016H1A2A1907833
and
NRF-2016R1A2B4008545). Data were provided by the ISBI
2016 challenge.
REFERENCES
[1]
21/26 = 80.77
2894
E. Veronese, E. Grisan, G. Diamantis, G. Battaglia, C. Crosta, and
C. Trovato, “Hybrid patch-based and image-wide classification of
confocal laser endomicroscopy images in Barrett’s esophagus
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
surveillance,” Proc. - Int. Symp. Biomed. Imaging, no. c, pp. 362–
365, 2013.
K. K. Wang and R. E. Sampliner, “Updated guidelines 2008 for the
diagnosis, surveillance and therapy of Barrett’s esophagus,” Am. J.
Gastroenterol., vol. 103, no. 3, pp. 788–797, 2008.
B. Park, M. Kim, J. Lee, and H. Park, “Connectivity Analysis and
Feature Classification in Attention Deficit Hyperactivity Disorder
Sub-Types: A Task Functional Magnetic Resonance Imaging
Study,” Brain Topogr., vol. 29, no. 3, pp. 429–439, 2016.
V. N. Vapnik, “An overview of statistical learning theory.,” IEEE
Trans. neural networks, vol. 10, no. 5, pp. 988–99, Jan. 1999.
C. Cortes and V. Vapnik, “Support-Vector Networks,” Mach.
Learn., vol. 20, no. 3, pp. 273–297, 1995.
A. Krizhevsky, I. Sutskever, and G. E. Hinton, “ImageNet
Classification with Deep Convolutional Neural Networks,” Adv.
Neural Inf. Process. Syst., pp. 1106–1114, 2012.
R. Socher and B. Huval, “Convolutional-recursive deep learning
for 3D object classification,” Adv. Neural Inf. Process. Syst., pp.
665–673, 2012.
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning,” Nature, vol.
521, no. 7553, pp. 436–444, 2015.
Y. Wen, Z. Li, and Y. Qiao, “Latent Factor Guided Convolutional
Neural Networks for Age-Invariant Face Recognition,” Cvpr, pp.
4893–4901, 2016.
H. Qin, J. Yan, X. Li, and X. Hu, “Joint Training of Cascaded CNN
for Face Detection,” Cvpr, pp. 3456–3465, 2016.
R. Girshick, J. Donahue, T. Darrell, and J. Malik, “Region-Based
Convolutional Networks for Accurate Object Detection and
Segmentation,” IEEE Trans. Pattern Anal. Mach. Intell., vol. 38,
no. 1, pp. 142–158, 2016.
A. De Brébisson and G. Montana, “Deep neural networks for
anatomical brain segmentation,” Cvpr, pp. 20–28, 2015.
2895
Документ
Категория
Без категории
Просмотров
4
Размер файла
775 Кб
Теги
2017, embc, 8037461
1/--страниц
Пожаловаться на содержимое документа