close

Вход

Забыли?

вход по аккаунту

?

635

код для вставкиСкачать
INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
OSCAR MOREIRA-TAMAYOR AND JOSË PINEDA DE GYVEZ*
Department of Electrical Engineering, Texas A&M University, College Station, TX 77843, U.S.A.
SUMMARY
The cellular neural network paradigm has found many applications in image processing. However, algorithms for image
compression using CNN have scarcely been explored. CNN programmability is based on a new algorithmic style based
on the spatio-temporal properties of the array. By exploiting the massive parallelism provided by CNN and the
convolutional key basic instruction, a fast and efficient compression process can be achieved. This paper presents new
templates and low-complexity algorithms to perform both the linear and non-linear operations needed for image
compression. In this work, we have addressed all the transformation steps needed in image compression, i.e. decorrelation, bit allocation, quantization and bit extraction. From all possible compression techniques the wavelet subband
coding was chosen because it is considered one of the most successful techniques for lossy compression. It allows a high
compression ratio while preserving the image quality. All these advantages are implemented in the algorithms hereby
presented. Copyright 1999 John Wiley & Sons, Ltd.
KEY WORDS: cellular neural networks; image compression; subband coding
1. INTRODUCTION
A CNN cell can be considered an analog processor capable of solving a partial differential equation in real
time. Besides this computing power, an attractive characteristic of CNN is the local cell interconnectivity
that has opened avenues to practical VLSI implementations and to the vast number of image processing
applications. For the following discussion consider a conventional two-dimensional CNN array containing
M;N locally connected cells. Any cell on the ith row and jth column, C(i, j) is connected only to its
neighbour cells. The CNN dynamics are
C
dx (t)
x (t)
GH "! GH #
R
dt
Aij; kl y (t)# Bij ; kl u #I
IJ
IJ
C 3N
C 3N
IJ
GH
IJ
GH
y (t)" ("x (t)#1"!"x (t)!1")
GH
GH
GH
(1a)
(1b)
where RC is the integration time constant of the system, and N is a set that contains cells C(k, l) in the
GH
neighbourhood of cell C(i, j). Each cell has a state x, a constant external input u, and output y, which is also
known as the activation function. There is also a constant offset signal I. The CNN is programmed through
two templates. Namely, the A template which operates on the outputs of cells C(k, l)3N(i, j), and the
B template which operates on the external inputs in the neighbourhood of C(i, j). Several plausible VLSI
architectures and high-level systems have been proposed based on this dynamical system.
*Correspondence to: J. Pineda de Gyvez, Department of Electrical Engineering, Texas A&M University, College Station, TX 77843,
USA
R O. Moreira-Tamayo is now with Texas Instruments, 8505 Forest Lane, Dallas, TX 75243, USA
Contract grant sponsor: CONACyT, the National Council of Science and Technology Mexico; The Office of Naval Research;
contract/grant no: N00014-91-1-051
CCC 0098—9886/99/010135—17$17.50
Copyright 1999 John Wiley & Sons, Ltd.
Received 15 May 1997
Revised 19 November 1997
136
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
In spite of all the progress in CNN, algorithms for image compression using CNN have scarcely been
explored. The main goal of image compression is to represent an image with a reduced number of bits but
preserving its quality and intelligibility for a given application. Images for medical applications require a high
degree of quality and fidelity because of the liability and legal responsibility. On the other hand, the quality of
images for the entertainment depends on the costs involved. Two types of approaches have been developed to
cope with the above-mentioned situations, lossless and lossy compression. In the first approach, the image is
recovered faithfully, but its compression ratio for gray-scale images is small. The second approach provides
the higher compression ratios at the expense of some loss in the image.
The algorithms presented in this paper are hardware-oriented and exploit the massive parallelism
provided by CNN. The ideal hardware platform is a DTCNN; however, the algorithms will also work if there
is a discrete control on the running mode of a continuous-time CNN. Hence, without loss of generality, we
refer to the hardware architecture as simply CNN. As a one-to-one mapping between image pixels and cells is
virtually impossible for large images, we are considering a time-multiplexing operation. This approach
divides the image in tiles in which each tile’s size corresponds to the physical size of the CNN. Most image
compression schemes rely on linear operations which we demonstrate are possible of being implemented
using CNN. To implement these linear operations we often avoid the use of template A. Even though we are
not using all the resources of CNN we are nevertheless using it as a general purpose image processor.
Moreover, the full compression procedure makes only use of a CNN and a standby memory. We have
carefully avoided operations that would require an additional digital processor as this would lessen the
effectiveness of CNN. Also, all the algorithms are based on the assumption that we have a hardware
implementation of a CNN with neighbourhood of radius 1. We pay special attention to this fact and avoid
hypothetical situations for which the hardware does not exist. Lastly, in Section 9, we present a detailed study
of the computation times required by our algorithms assuming a 64;64 CNN array size operating on
a 256;256 image. The results indicate the feasibility of performing our full compression procedure for
real-time video applications.
2. IMAGE COMPRESSION
The main steps for image compression based on a transform scheme are shown in the block diagram of
Figure 1. The first step is the transformation of either a full image or subdivisions of the image. In the case of
the popular JPEG standard, the image is subdivided in blocks of 8;8 and then each block is processed
using the cosine transform. For high rates of compression this subdivision is evident by simple visual
inspection of the decompressed image. Subband coding using wavelets (this means tree-structured filter
banks) reduces dramatically this ‘blocking’ problem because of the variable length of its basis functions. Long
basis functions are used to represent smooth and background areas while texture is represented with short
functions. After the image is transformed, it is in adequate form to be quantized so that it can be represented
with a reduced number of bits. This can be a scalar quantization if each pixel is quantized independently or
a vector quantization if a group of pixels are quantized jointly. In any case, the most critical part is the bit
allocation. More bits should be assigned to the portions of the transformed image that contain most of
its energy in order to reduce the quantization noise. Those portions of the image are estimated based on
the spectral distribution. In the case of JPEG, the intensities of the first coefficients (upper left corner of
the discrete cosine transform) on each 8 by 8 block are the largest. In the case of wavelet transform, the
distribution in the components filtered through a high pass is close to a Laplacian distribution. The
low-frequency components have a distribution closer to the one of the original image. Therefore, more bits
are assigned to this portion. Finally, to remove any redundancy left in the bit stream, each quantization level
is represented with a variable-width binary code. Those levels that have a high statistical probability of
occurrence are given the shortest code word, while levels with low probability are given the longest code
words. The technique used to do this process is known as Huffman coding. The main processing portion of
lossless compression algorithms are based on entropy coding techniques such as this one.
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
137
Figure 1. Steps of a transform-based image coder
Figure 2. Wavelet decomposition scheme. Original image and single-stage image decomposition
3. IMAGE TRANSFORMATION
Several transformations have been reported to decorrelate an image, e.g. wavelet transform, cosine transform, subband decomposition using Quadrature Mirror Filters (QMF), etc. Although this presentation
concentrates mainly on the algorithms for subband decomposition using wavelet transforms, the same
algorithms can easily be adapted for either QMF or cosine transform. For a detailed presentation on wavelet
transforms, the reader is encouraged to examine the underlying theory in Reference 13.
The wavelet decomposition shown in Figure 2 can be obtained by convolving each row and then each
column of the image with the scaling and wavelet functions. In Figure 3, G and H represent a low-pass and
a high-pass filter, respectively, that can be implemented through QMF or wavelet basis. In the latter case,
G and H are the corresponding scaling (
) and the wavelet functions (t), respectively. The results of the
transformation are denoted as LL, LH, HL, and HH, where the first letter represents the operation
performed in the horizontal direction and the second one represents the operation performed in the vertical
direction. This two-pass transformation can also be accomplished in one pass through the use of transformation matrices (two-dimensional transformation). The transformation matrices are obtained by combining the
scaling and wavelet function as
M**"
2
, M*&"
2t, M&*"t2
, M&&"t2t
(2)
In practice, images are finite sequences since they have a discrete number of pixels. Linear convolution has
some practical problems on finite sequences. If an image of size m;n is convolved with a sequence of size
p;q the size of the result will be m#p;n#q. This size is larger than the size of the image and it contains
redundant information. Fortunately, solutions for this problem are well known in literature. They are the
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
138
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
so-called overlap-add and overlap-save methods. They basically consist in using circular convolution for
which periodic boundary conditions (wrap-around effect) are assumed. For compression this is a minor
inconvenience since the pixels in one edge of the image are usually not correlated with the pixels on the
opposite edge. However, this effect is reduced by using small size transformations.
4. CNN TEMPLATES FOR WAVELET DECOMPOSITION
For the following discussion assume that we are dealing with a CNN with neighbourhood of radius 1, e.g. the
template matrix is of size 3;3, typical of today’s VLSI implementations. The order of the wavelet
transform is an important factor to consider in the development of the algorithm. For discrete-time signals
the order is given by the number of coefficients, or points, it contains. Wavelet bases that allow perfect
reconstruction are dyadic, which means that the number of points is in powers of 2. Let us start with the
simplest case, i.e. a 2-point transformation. The most popular 2-point wavelet is the Haar wavelet shown in
equation (3):
1
1
[1, 1], t "
[1 !1]
(3)
"
F (2
F (2
The corresponding two-dimensional transformations, 2;2 matrices, are obtained using equation (2).
Notice that M(),)) is an operator which is convolved with the image pretty much as the B template is in the case
of CNN. Basically, when M is applied we say that the image has been transformed. Since B is a convolutional
operator as well, we can make a map f : MPB that gives us the equivalent matrix transformation but now in
terms of the CNN environment. For the Haar wavelet transform, each compressed element needs the input of
4 adjacent pixels (called a block). This transformation can be implemented in CNN by successively using the
following templates:
A"0, I"0,
0 0 0
1
B**" 0 1 1 ,
2
0 1 1
0 0
0
1
B*&" 0 1 !1 ,
2
0 1 !1
0
0
0
1
B&*" 0
1
1 ,
2
0 !1 !1
(4)
0
0
0
1
B&*" 0
1 !1
2
0 !1
1
Notice that by having zeros in a row and a column we are reducing the effective neighbourhood of a cell
due to the characteristics of the wavelet. Therefore, for any cell C and input image P"[p ] the
GH
GH
transformation is given by
pJ (N)" B (N) p
GH
GHIJ IJ
C 3N
IJ
(5)
GH
where pJ denotes the transformation over p, and N"+LL, LH, HL, HH,. For the case of a higherdimensional biorthogonal wavelet (n-points), the scaling and wavelet functions are given by
"[c c 2 c
], t"[!c
c
!c c ]
(6)
L\
L\ L\ 2
Observe that if the size of the CNN templates is larger than the order of the transform, the transformation
matrix fits into the template matrix, as was the case with the Haar wavelet. Otherwise, the transformation
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
139
matrix needs to be broken into smaller partitions such that each partition fits into the template matrix. For
example, the 4-point Daubechies wavelet has the following coefficients:
c "(1#(3)/4(2, c "(3#(3)/4(2
(7)
c "(3!(3)/4(2, c "(1!(3)/4(2
By applying equation (2) to an n-point wavelet one can see that the size of the transformation matrices is
n;n, as shown in below:
c c
c c
c c
c c
M**" 2
2
c c c c
L\ L\ c c
L\
2 ccL\
2
2
2 cL\cL\
2
!c c
c c
L\
L\
c c
!c c
L\
L\
M*&"
2
2
c c
!c c
L\ L\
L\ L\
c
c c
L\ L\ c c
!c c
L\ L\ 2
2
c c
!c c
c c
!c c
2
2
2
2 !cL\cL
2
c
L\ L\
2 !cL\cL\
2
2
!c
c
2
L\
!c
M&*"
(8a)
2
c c
!c c
L\ L\
L\ L\
!c c
c c
L\ L\
L\ L\
M&&"
2
2
!c c
!c c
L\
L\
(8b)
c
2 !cL\c
c c
2
L\ 2
2
c
c
2
(8c)
(8d)
In order to implement this transformation we need to take into account that the CNN templates are equal
for all cells and recall that our CNN array has a neighbourhood of 1. As such, the problem we are dealing
with is essentially that of how to implement large-dimensional transformations with 3;3 templates. A way
to implement B templates at no extra hardware cost is described next. First, divide the transformation matrix
(n;n) in submatrices of equal or smaller dimensions than the template size, e.g. 3;3, 2;2, etc., such that
n mod 3 or n mod 2 is zero. Let us further assume that this partitioning created m submatrices B . Then,
GH
order these submatrices in lexicographic order from left to right and from top to bottom, e.g. with
i"j"0,2 , (m!1. If the partition yielded 3;3 submatrices then they are the actual ‘partial’ B templates.
In case of 2;2 submatrices create ‘partial’ B templates as follows:
0
0
0
B**" 0
c c
c c
G H
G H>
GH
0 c
c
c
c
G> H G> H>
Copyright 1999 John Wiley & Sons, Ltd.
(9a)
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
140
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
0
0
0
B*&" 0 !c c
c c
G L\H\
G L\H\
GH
0 c
c
!c
c
G> L\H\
G> L\H\
0
0
0
B&*" 0 !c
c
c
c
L\G\ H
L\G\ H>
GH
0
c
c
!c
c
LG\ H
L\G\ H>
0
(9b)
0
(9c)
0
B&&" 0
c
c
!c
c
L\G\ L\H\
L\G\ L\H\
GH
0 !c
c
c
c
L\G\ L\H\
L\G\ L\H\
(9d)
i"0, 2, n/2!1; j"0, 2, n/2!1
By using this last partitioning scheme the transformation matrix is divided into n/4 parts creating the
same number of partial templates. Without loss of generality, the algorithms discussed in the rest of the paper
assume that the transformation matrix was partitioned as shown in equation (9).
5. SUBBAND DECOMPOSITION ALGORITHM
The decomposition algorithm consists of performing a circular convolution of the transformation matrices
(equation (8)) and the input image. These convolutions yield four new images commonly tagged as LL, LH,
HL and HH. For the case of CNN, each set of partial templates (equation (9)) is sequentially applied to the
input image for which a corresponding output array denoted as P (N) with N"+LL, LH, HL, HH, is
obtained; see Figure 2. The CNN templates for the decomposition algorithm are A"0, B from equation (9),
I"0, with the initial conditions in the CNN array set to zero: X(0)"0. As the template contains a row and
a column of zeros, the information from the corresponding pixels would also be zero. Therefore, it is
necessary to shift the image so that all pixels are considered in the subband decomposition. To accomplish
the decomposition, each partial template B (N) in equation (9) is used as follows. Take the first partial template
GH
and process the input image by running the CNN for one unit of time. Keep the processed state of each cell
such that it appears as initial condition for the next step. Shift the input image in the horizontal and then
vertical direction by two positions wrapping around the edges. Repeat this process with the remaining
templates. Observe that each CNN cell must hold its value in each iteration in order to accumulate, after
processing with all partial templates, the summation of all terms in the convolution. Decimation is performed
after the signal is filtered by extracting from the CNN state only the elements with odd indices e.g. the
elements in positions (1,1), (1,3), (3,1), etc. Notice that after the decimation process is complete each of the four
output images is one-fourth the size of the original image. The complexity of this algorithm is O(m#k) where
k is the time needed to shift the image and m is the number of partial templates.
Figure 3 shows simulation results for a Daubechies wavelet decomposition. The simulation results were
obtained by building a CNN simulator through a Matlab program.
5.1. Subband reconstruction algorithm
The reconstruction process is similar to the algorithm just described. In the case of 2-point orthogonal
wavelets, such as the Haar wavelet, the reconstruction basis is the same as the decomposition basis. However,
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
141
Figure 3. Results of decomposition using Daubechies wavelet. Clockwise: LL, LH, HL, and HH
the reconstruction and decomposition of real-valued biorthogonal wavelets, e.g. Daubechies wavelets, is
different. The reconstruction basis (also called dual) is obtained by reversing the decomposition sequence and
by alternating the signs of the coefficients, i.e.
[
I ] "(!1)G [
] ,
" G
" L\G
[tI ] "(!1)G [t ]
" G
" L\G
(10)
Using equation (6) a new set of transformation matrices, denoted as M
, is obtained based on the principle
indicated in equation (10). These matrices are similar to the ones shown in equation (8). The image
reconstruction is implemented by adding the results of the convolution of the different components of M
with
the up-sampled pixels P of the decomposed image. The equations that govern these operations are the
following:
p "pJ ** ) M
I **#pJ *& ) MI **#pJ &* ) M
I **#pJ && ) MI **
GH
GH
GH
GH
GH
(11a)
p
"pJ ** ) M
I *&#pJ *& ) M
I *&#pJ &* ) MI *&#pJ && ) MI *&
G>H
GH
GH
GH
GH
(11b)
p
"pJ ** ) M
I &*#pJ *& ) M
I &*#pJ &* ) MI &*#pJ && ) MI &*
GH>
GH
GH
GH
GH
(11c)
p
"pJ ** ) M
I &&#pJ *& ) M
I &&#pJ &* ) M
I &&#pJ && ) M
I &&
G>H>
GH
GH
GH
GH
(11d)
∀i"0, 2, 2, M!2 and j"0, 2,2 , N!2; where M and N are the dimensions of the image.
Each of these equations is implemented in four steps, e.g. one step for each of the inverse transforms of LL,
LH, HL and HH. Since the image was decimated by 2 in both horizontal and vertical directions, we need to
do an up-sampling so that every step processes one transformed pixel, pJ , from a 2;2 block that contains
GH
the corresponding decomposed pixel elements LL, LH, HL, and HH. As in the decomposition algorithm,
circular convolution is also used in this case. As way of illustration, consider, for instance, equation (11a) for
the case of i"0, 2,2 , M!2 and j"0, 2, 2, N!2. This equation makes use of input data P expanded as
shown in Figure 4. Notice that each pJ (),)) in equation (11a) corresponds to a 2;2 block in Figure 4 that is now
GH
multiplied by M
**. The reconstructed pixels p with i"0, 2, 2, M!2 and j"0, 2, 2, N!2 are taken
GH
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
142
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
Figure 4. Input arrangement for reconstruction
from the cells having odd indices, e.g. the elements in positions (1,1), (1,3), (3,1), etc. The process is repeated for
pixels p
,p
, and p
.
GH> G>H
G>H>
This reconstruction algorithm was simulated using a Matlab program as well using one-to-one pixel to cell
mapping. The mean-square error between the input image and the reconstructed one is in the order of 10\
and it is due to the round up error.
6. COMPUTATION OF STATISTICAL PARAMETERS FOR QUANTIZATION AND
BIT ALLOCATION
The bit allocation problem consists of finding the minimum number of bits needed to quantize each of the
subbands with minimum error. Bit allocation schemes make use of statistical parameters such as the mean
and standard deviation of all pixels in the image. The mean squared error (MSE) of a quantized image can be
reduced by assigning non-linearly spaced decision levels in such a way that all levels have the same
probability. This type of quantizer is known as the Lloyd-Max quantizer. After decomposing an image,
the statistical distribution of pixels in the elements LH, HL, HH, can be approximated with a Laplacian
distribution which can be described by its parameters: mean (k) and variance (p). Therefore, it is important
to calculate these parameters in order to have an effective quantizer. The algorithm for computing the mean
of an image consists of several iterations depending upon the image’s size. The templates to compute the
average of a block of four adjacent pixels are the following:
0
A"[0],
I"0,
0
0
B" 0 1/4 1/4
(12)
0 1/4 1/4
In our approach only one processed cell is needed for each block, e.g. the output of the upper left cell in
each 4 pixel block, the remaining outputs are discarded. These local averages are then fed again as inputs and
the process is repeated until one cell accumulates the final mean value. Notice that in each iteration the size of
the input array is one-fourth with respect to the previous iteration since we are taking only one output for
each block. The fact that we are considering a time-multiplexing system and a standby memory allows us to
choose which set of pixels are used as input data for the CNN. Decimation is performed by just discarding the
pixels that we do not want and by presenting to the CNN the ones of interest. Now, if the size of the image is
2+ by 2+ then the complexity of this algorithm is O(M), i.e. the number of iterations is M.
The process of calculating the variance involves obtaining the expected value of the squared value of the
pixel minus the mean, i.e. p"E[(u!k)]"E[u]!k. Implementing the squared value is not simple.
However, the non-linearity provided at the output of the CNN array can be exploited to compute the 1-norm
of the image (#u!k# "E["u!k"]). This is done by calculating the mean of the absolute value of the image.
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
143
For the Laplacian distribution, p can be estimated from this norm as follows:
E["u(x)!k"]"
\
"u(x)!k" f (x) dx"
S
\
"u(x)!k"
(2
(2 !((2 "u(x)!k")/p
e
dx"
p
2p
2
(13)
Therefore,
p"2E ["u (x)!k"]
(14)
Using this equation the quantization tables for the Lloyd-Max quantizer can be based on the 1-norm.
Table I shows the decision and reconstruction levels for a 2-bit quantizer based on the Laplacian distribution. The reconstruction level is the value given to the pixels that belong to a quantization level so that the
MSE can be minimized.
The 1-norm can be computed using CNN by separating the positive and negative parts of the absolute
value function. The positive part of the absolute value function is obtained by performing the following steps.
First, initialize the CNN array in the saturation limit, i.e. X(0)"!1. The symbol 1 denotes a unity vector.
Then, set the CNN parameters as
0 0 0
I"!k,
A"[0], and B" 0 1 0
(15)
0 0 0
and run the network for one time unit *t. Observe from (1) that the pixel values will be displaced from their
[!1, 1] normalized range to [!1!k, 1!k] because of the constant input bias value k. At the output of
the CNN’s activation function, the input image is rectified since negative terms in u !k are truncated as
GH
illustrated in Figure 5. The range of pixel values is represented by the hot-colour bar in this figure. Without
any bias and because of the linear slope of the activation function, the input range of pixel values (hot-colour
bar) is the same as the output one. When a bias is applied, the output range will be displaced and truncated
towards one of the saturation levels. In Figure 5, point F represents the limit value for the image’s mean
k (bias term) and E corresponds to values at the positive saturation level which were down-displaced because
of the bias. One can see that due to this displacement values below the mean will become !1 because of the
limiter function. To eliminate the biasing introduced by the initialization of the CNN (X(0)"!1), the
following templates I"1, A"0, B"0 are used for a time unit *t. This will effectively add 1 to the pixel
values. The range of the pixels now goes from 0 to its maximum value. To separate the negative terms in
U!k1 a similar operation is performed. The network is now initialized in the positive saturation limit, i.e.,
X(0)"1, and the same templates of equation (15) are used. This will leave only the elements below the mean
since those above the mean are clipped by the saturated output function of the CNN. The bias is
compensated again by running the array with I"!1. Finally, the first image (positive values) is subtracted
from the second image (negative values) to produce the absolute value of the image. This operation can be
performed by initializing the array with the first image, providing as input the second image, and using the
templates given by equation (15) with the centre value of B equal to !1. The 1-norm (E ["u!k"]) is just the
mean of this absolute value image and can be obtained with the previously explained procedure.
Table I. Placement of decision and reconstruction levels for a 2-bit Lloyd-Max quantizer, for Laplacian distribution
and s"1
Decision levels d
G
Reconstruction levels r
G
!R, !0)1269, 0)0000, 1)1269
!1)8340, !0)4198, 0)4198, 1)8340
Bits
2
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
144
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
Figure 5. Computation of the absolute value using the non-linearity of the CNN output
Figure 6. Absolute value of input image
Figure 6 shows a rectified image processed by using the algorithm just described. Its denormalized
parameters are the following, the mean is k"99)1179 and the variance is p"2 ) (43)8136).
7. QUANTIZATION, BIT EXTRACTION AND ENTROPY CODING
The purpose of quantization is to represent the image with a lower number of bits. Considering that CNN is
an analog processor, we will implement with CNN the computational operations to convert the information
processed by the CNN in a digital form. We will quantize the image and classify the pixels that belong to
a quantization level in order to encode them. Encoding can be linear, e.g. for 2 bits 00, 01, 10, 11, or using
entropy coding 0, 10, 110, 111. For entropy coding we need to calculate the probability that a pixel belongs to
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
145
any of the quantization levels. This is image dependent and therefore it needs to be calculated for each
particular image.
7.1. Quantization
The objective of a Lloyd-Max quantizer is to divide the signal range in a discrete number of levels in such
a way that each level has equal probability (this will minimize the MSE). This section will address the
problem of assigning pixels to the levels they belong. If one were to represent an image with one bit per pixel
we have two levels. Positive values are represented with logic 1 and negative values with logic 0. The image
can be just thresholded with respect to its mean to obtain two output levels, 1 and -1 for the CNN array. This
can be done by making I"k and A"B with B as in (14) and taking the output from Y. The network will be
allowed to run until the output values reach their limits. Figure 7 shows an illustration of a 2-bit quantizer. It
shows four levels with their respective decision levels and the reconstruction value that minimizes the MSE.
The algorithm for quantization described in this section provides a binary output. This output consists of
determining the pixels belonging to a quantization level in order to encode them.
Consider a quantizer consisting of n levels and n decision boundaries d . A pixel p belongs to a level m if
K
GH
d (p (d
. The pixels that belong to a particular level can be extracted using the following recursive
K
GH
K>
process. For the first level, start by the highest decision boundary d . For example, from Table 1 for 2 bits,
K
d "1)1268; see also Figure 7. Set I"d , and again A"B with B as in (14). The bit extraction process
K
consists of keeping those cells whose outputs y are 1. These pixels belong to the first level (p 'd ).
GH
GH
K
Consider now the next decision boundary d
(for 2 bits d "0)0). The image is now thresholded by setting
K\
I"d
, and again A"B with B as in (14). All pixels with values above d
will be set to 1. This includes
K\
K\
the pixels that belong to the first and second levels. The next step consists of eliminating the pixels that
belong to the first level, i.e. the pixels that also presented output 1 for d . To do this we subtract the output of
K
the current iteration from the previous iteration. This is a recursive operation for subsequent bits, i.e. the
results of all previous iterations are subtracted from the current iteration. Repeat this process for d
,d
,
K\ K\
etc. Notice that any pixel will have an output of 1 only for the level it belongs and will be !1 for all other
levels.
7.2. Entropy coding
Entropy coding consists of assigning a code word of variable length to a pixel or set of pixels according to
their probability. The shortest code word is assigned to the pixel value that has the highest probability of
Figure 7. Illustration of 2 bit quantization levels
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
146
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
Figure 8. Bit allocation. Number of bits assigned to each of the subbands
occurrence. Pixels can be coded separately or jointly. Higher compression can be achieved by compressing
bits jointly. If an image is quantized with one bit it is convenient to quantize several pixels jointly. For
example, if four pixels are quantized jointly, there are 16 possible combinations. Thus, we need to calculate
the probability of occurrence for each of those combinations.
The probability for a quantization level can be obtained by computing the mean of the image containing
only the pixels corresponding to such level. Recall that pixels that belong to a given level have a value of 1,
and the rest are !1. Computing the mean will give us a measure of the probability. For instance, if the mean
is !1, it means that there are no pixels belonging to this level. If the mean is 1, then all pixels belong to this
level.
To compute the joint probability of two or more pixels we can compute the logic AND operation of all
their possible combinations. Consider two consecutive pixels p and p
quantized with 1 bit and
GH
G>H
classified with the algorithm described before. The possible combinations for the 2 levels are [1,1], [1,!1],
[!1,1], [!1,!1].
As an illustration of quantization and bit allocation, the original image from Figure 2 was decomposed
using the 4-point Daubechies wavelet. The number of bits allocated to each of the subbands is shown in
Figure 8. The reconstruction of this image will be shown in the next section.
8. RECONSTRUCTION
Once the bits are extracted we can reconstruct the image from the binary data. This is an interesting
feature for CNN since it implies that input data can be provided in a digital form as well as in the
existing analog form. Suppose that we have the reconstruction levels associated to the Lloyd-Max quantizer r and that we have n bits. To reconstruct the image we will use the following process. Feed the first bit
G
of each pixel of the whole image as the input array, then set the template as in (15). By running the CNN for
one time unit the value of each weighted bit will be loaded in the CNN. This process is repeated for all
remaining bits and the partial results are accumulated having at the end the image loaded in the CNN array.
Notice that the D/A conversion is actually performed by multiplying the digital bit by its analog value
r "2\G :
G
0
I"0,
Copyright 1999 John Wiley & Sons, Ltd.
0
0
A"0, and B" 0 r 0
G
0 0 0
(16)
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
147
Figure 9. Reconstructed image
9. ESTIMATE OF CNN IMAGE COMPRESSION PROCESSING TIME
In this section we will consider the computation time estimates taking into account the processing times of
reported hardware implementations. It is assumed that the CNN is operated under a time-multiplexing
scheme. Under this scheme, the image is divided in tiles of size equal to the CNN array size and the
processing is done tile by tile in a lexicographical order. To avoid blocking effects it is necessary to have
a slight overlap between consecutive blocks and to provide neighbour inputs to each tile. Our main concern
here is to estimate if real-time video processing is possible with a CNN array feasible for a near future
hardware implementation. The calculations to follow do not consider the possible non-idealities of a CNN
system. Although this is a simplification, these times represent an upper bound for speed estimation. One
must be aware that there are many factors that need to be considered with an analog hardware, e.g. cell
offsets, time retention of the capacitors, mismatch among cells and so on. In fact, it is well known that a very
crude analog implementation yields 7—8 bits of resolution (accuracy: 2\"7)81e-3) and that an analog
memory suffers of retention problems. Yet, we are assuming for our algorithms a hardware platform with an
accuracy greater than 1e-3 (about 10 bits of resolution). Additionally, to avoid analog memory retention
problems it is also possible to use a digital memory with high precision D/As and A/Ds coupled to the CNN,
although this would cause a further overhead in the actual I/O times.
Computation times are based on the algorithms described in the previous sections and concern only
the CNN processing and access times. Thus, we avoid relying on a high-level system for which the
instruction level of parallelism is variable and depends on compiler optimization. For our time estimations
we suppose that if we have access to a RAM, data could be stored at choice and that there is a finite-state
machine controlling the CNN, the standby memory and its memory manager. This finite-state machine
should take care of scheduling image tiles for the time multiplexing approach, of providing the right
templates and data sequencing for each algorithm, and of storing and retrieving data from the standby
memory. For readability purposes we briefly summarize each algorithm and the corresponding data flow in
a pseudo-code.
To simplify the calculations, and without loss of generality, we are assuming that the overlap between
consecutive blocks in the time-multiplexing scheme is zero.
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
148
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
Step 1. Image transformation
Consider an n point filter, an N;N image, an M;M CNN array, and a decomposition of l subbands. For
each subband we need to process n/4 partial templates, and in each subband (N/2J\/M) tiles. The total
image transform time for can be estimated as follows:
n J N/2J\ t "(q#d #d ) 4
M
J
(17)
where q is the CNN processing time, d and d represent the CNN input and output times, respectively. The
J
corresponding algorithm is
function transform( )
N;N"image; M;M"array; n"filter order;
for subband"1 to 4
( N/2 subband!1 )2
for tiles"1 to
M2
for partial—templates 1 to n 2/4
load (input image)
run(cnn, 1q)
output (processed image)
end
end
end
Step 2. Computation of statistical parameters
This step involves computing the mean and standard deviation of the pixel distribution. In this case we
need to compute N /M tiles and for each tile we need to iterate log M times to find the mean. Hence, the
total time to compute the mean is
N
t "(q#d #d )
log M
M
(18)
The data flow algorithm is as follows:
function mean—computation( )
N;N"image ; M;M"array;
for tiles"1 to N 2/M 2
for block"1 to log2 (M )
load (input image)
run(cnn, 1q)
output (processed image)
end
end
Similarly, the time to compute the standard deviation is calculated for N/M tiles and a constant processing
time k as follows
N
t "
k (q#d #d )
M
Copyright 1999 John Wiley & Sons, Ltd.
(19)
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
149
The specifics of the processing are shown in the next algorithm
function std—computation( )
N;N" image; M;M"array;
for tiles"1 to N 2/M 2
load (input image)
run (cnn, 5q)/*compute positive image*/
output (positive image)
load (input image)
run (cnn, 5q)/*compute negative image*/
load (positive image)
run (cnn, 5q)/*subtract positive from negative image*/
output (processed image)
end
Step 3. Quantization and bit extraction
This step consists of two parts; the first part identifies the elements of the image that belong to each of the
quantization levels while the second part assigns a binary code word to them (bit extraction). The first part is
accomplished by thresholding the image in such way that the elements that belong to a quantization level
present an output of 1, and all others elements are zero. The bit extraction operation will convert the
elements that belong to a quantization level into its code word representation, e.g. for a two bits binary code
we have that the code words are 00, 01, 10, 11. Each of these two bits is extracted from the quantization level
in one iteration. In practice, a multiresolution decomposition is carried out. That is, typically the LL subband
is further decomposed into LL, HL, LH, and HH subbands. In turn, the new LL subband is further
decomposed, and so on. Therefore, we have m multiresolution levels and since the CNN acts on each of the
subbands of the transform, we need to process W(N/(2KM))X tiles for each subband. Then, for each subband
n we need to assign a variable number of code words c. The total computation time is
N GK HL
C(i, j)
(20)
t "k(q#d #d ) 2GM
G H
where k is the processing time for one quantization level, and c (i, j) is the number of codes for subband j in
multiresolution level i. The corresponding algorithm is shown next:
function quantization( )
N;N"image; M;M" array;
m"levels for multiresolution analysis
c(decomposition, subband)"number of codewords in subband
for decomposition"1 to m
for subband"1 to n
for codewords"1 to c(decomposition, subband)
/*computation for one quantization level */
for tiles"1 to (N/4decomposition)2/M 2
load (input image)
run (cnn, 5q)/*compute (#) part, e.g. elements above level */
load (input image)
run (cnn, 5q)/*subtract input from (#) image*/
output (processed image)
end
end
end
end
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
150
O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ
Step 4. Entropy coding
In this step the binary information obtained in the previous step is coded to reduce the redundancy left in
the data. Section 7.2 described this algorithm based on a two-dimensional Huffman coding scheme. The
binary information obtained in the previous step is compared with templates that provide a code depending
on the frequency of occurrence of the template on the image. The computation time consists in processing
N/M tiles and of applying 2N template combinations for each tile, where p stands for the number of pixels
that are quantized jointly
2NN
t "(q#d #d )
M
(21)
The corresponding algorithm is shown next:
function entropy—coding( )
N;N"image; M;M"array; pixels "number of pixels jointly quantized;
for tiles"1 to N 2/ M 2
for combinations"1 to 2pixels
load (input image)
run (cnn, 5q)/ *compare with template */
output (processed image)
end
end
9.1. ¹otal processing time
Our main concern here is to estimate if real-time video processing is possible with a CNN array feasible for
a near future hardware implementation. As way of example and without loss of generality consider the case of
a 64;64 CNN array, an image of 256;256 pixels, and assume the following processing times: Convergence:
q"0.5 ls, input access: d "12 ls, and output access: d "8 ls. We are considering a 4-point Daubechies
wavelet transform for the subband coding, a multiresolution analysis of three levels, a worst-case allocation
scheme of 2 bits for two subbands in the first decomposition, 1 bit for the second decomposition for three
subbands, 2 bits for the third decomposition in only one subband, and a joint quantization of four pixels.
Evaluation of formulae (17)—(21) yields a total processing time which is obtained from summing each
processing time t : Image transformation (t ) plus statistical parameters (t #t ) plus quantization (t ) plus
G
entropy coding (t ), i.e. 1742)5 ls#1328)0 ls#656)0 ls#5760)0 ls"9486)5 ls.
The relevance of this result is that it suggests that the construction of CNN hardware for real-time
processing is feasible. In commercial television the amount of frames per second is 30—interlacing the odd
number of lines with the even number of lines. Each interlaced image has a period of 1/60 s"167 ms. The
estimated computation time for image compression is 9)49 ms. Therefore, the implementation of a real-time
64;64 CNN image compressor is feasible. An important observation can be made by considering the
time estimates; most of the processing time for the compression operation is determined by time required for
the input and output of data to and from the CNN.
10. CONCLUSIONS
Algorithms for image compression based on a CNN image processor have been developed. They include all
the operations needed in standard image compression. It was demonstrated that the CNN is capable of
performing image compression without requiring an external digital processor to support the algorithms or
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
SUBBAND CODING AND IMAGE COMPRESSION USING CNN
151
those operations not implemented with CNN. This implies that an image processor can be built with a CNN
as the only central process unit.
These algorithms exploit the massive parallelism that CNN provides; therefore, they take advantage of the
fast processing characteristics of CNN with respect to a digital processor. As it was estimated in Section 9,
a real time image compressor for commercial video can be implemented using the proposed algorithms. The
compression algorithms given in this paper are also suitable for multiplexing CNN arrays.
REFERENCES
1. L. O. Chua and L. Yang, ‘Cellular neural networks: theory’, IEEE ¹rans. Circuits Systems, 35 (10), (1988).
2. P. Kinget and M. S. J. Steyaert, ‘A programmable analog cellular neural network CMOS chip for high speed image processing’,
IEEE ¹rans. Solid State Circuits, 235—243 (1995).
3. H. Harrer, J. A. Nossek, and R. Steltz, ‘An analog implementation of discrete time cellular neural networks’, IEEE ¹rans. Neural
Networks, 3, 466—476 (1992).
4. T. Roska and L. O. Chua, ‘The CNN universal machine: an analogic array computer’, IEEE ¹rans. Circuits Systems, 40 (3), 163—173
(1993).
5. P. L. Venetianer and Tamás Roska, ‘Image compression by CNN’, Proc. IEEE Int. Conf. on Neural Networks, June 1996, pp.
1510—1515.
6. O. Moreira-Tamayo and J. Pineda de Gyvez, ‘Preprocessing operators for image compression using cellular neural networks’, Proc.
IEEE Int. Conf. on Neural Networks, June 1996, pp. 1500—1505.
7. C. C. Lee and J. Pineda de Gyvez, ‘Time-multiplexing CNN simulator’, Proc. Int. Symp. on Circuits and Systems ’94, December 1994,
pp. 407—410.
8. A. Q. Fong, A. Kanji and J. Pineda de Gyvez, ‘Time multiplexing scheme for cellular neural networks based image processing’,
J. Real-¹ime Imaging 2 (3), 231—239 (1996).
9. S. J. Lim, ¹wo-Dimensional Signal and Image Processing, PTR Prentice-Hall, Englewood Cliffs, NJ, 1990.
10. G. K. Wallace, ‘The JPEG still picture compression standard’, Commun. ACM, 34 (4), 30—44 (1991).
11. R. A. DeVore, B. Jawerth and B. J. Lucier, ‘Image compression through wavelet transform coding’, IEEE ¹rans. Inf. ¹heory, 38 (2),
719—746 (1992).
12. S. G. Mallat, ‘Multifrequency channel decomposition of images and wavelet models’, IEEE ¹rans. Acoust Speech, Signal Process 37,
(12), 2091—2110 (1989)
13. M. Vetterli and J. Kovacevic, ¼avelets and Subband Coding, Prentice-Hall PTR, Englewood Cliffs, NJ 07632, 1995.
14. O. Rioul and P. Duhamel, ‘Fast algorithms for discrete and continuous wavelet transforms’, IEEE ¹rans. Inform. ¹heory, 38,
569—586 (1992).
15. G. Strang and T. Nguyen, ¼avelets and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA, 1996.
16. R. Domı́nguez-Castro, S. Espejo, A. Rodrı́guez-Vázquez, R. Espejo and R. Carmona, ‘A CNN universal chip in CMOS technology’,
Proc. IEEE ¼orkshop on CNN and Apps., CNNA-94, December 1994, pp. 91—96.
17. Matlab version 4.2c with Image Processing Toolbox version 1.0b, The MathWorks Inc., Natick, Massachusetts, 1994.
18. O. Moreira-Tamayo and J. Pineda de Gyvez, ‘Wavelet transform coding using cellular neural networks’, IEEE NO¸¹A ’95 Proc.
December 1995, pp. 541—544.
19. S. P. Lloyd, ‘Least squares quantization in PCM’, IEEE ¹rans. Inform. ¹heory, IT-28, 129—137 (1982).
20. J. Max, ‘Quantizing for minimum distortion’, IEEE ¹rans. Inform. ¹heory, IT-6, 7—12 (1960).
Copyright 1999 John Wiley & Sons, Ltd.
Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)
Документ
Категория
Без категории
Просмотров
5
Размер файла
431 Кб
Теги
635
1/--страниц
Пожаловаться на содержимое документа