INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN OSCAR MOREIRA-TAMAYOR AND JOSË PINEDA DE GYVEZ* Department of Electrical Engineering, Texas A&M University, College Station, TX 77843, U.S.A. SUMMARY The cellular neural network paradigm has found many applications in image processing. However, algorithms for image compression using CNN have scarcely been explored. CNN programmability is based on a new algorithmic style based on the spatio-temporal properties of the array. By exploiting the massive parallelism provided by CNN and the convolutional key basic instruction, a fast and efficient compression process can be achieved. This paper presents new templates and low-complexity algorithms to perform both the linear and non-linear operations needed for image compression. In this work, we have addressed all the transformation steps needed in image compression, i.e. decorrelation, bit allocation, quantization and bit extraction. From all possible compression techniques the wavelet subband coding was chosen because it is considered one of the most successful techniques for lossy compression. It allows a high compression ratio while preserving the image quality. All these advantages are implemented in the algorithms hereby presented. Copyright 1999 John Wiley & Sons, Ltd. KEY WORDS: cellular neural networks; image compression; subband coding 1. INTRODUCTION A CNN cell can be considered an analog processor capable of solving a partial differential equation in real time. Besides this computing power, an attractive characteristic of CNN is the local cell interconnectivity that has opened avenues to practical VLSI implementations and to the vast number of image processing applications. For the following discussion consider a conventional two-dimensional CNN array containing M;N locally connected cells. Any cell on the ith row and jth column, C(i, j) is connected only to its neighbour cells. The CNN dynamics are C dx (t) x (t) GH "! GH # R dt Aij; kl y (t)# Bij ; kl u #I IJ IJ C 3N C 3N IJ GH IJ GH y (t)" ("x (t)#1"!"x (t)!1") GH GH GH (1a) (1b) where RC is the integration time constant of the system, and N is a set that contains cells C(k, l) in the GH neighbourhood of cell C(i, j). Each cell has a state x, a constant external input u, and output y, which is also known as the activation function. There is also a constant offset signal I. The CNN is programmed through two templates. Namely, the A template which operates on the outputs of cells C(k, l)3N(i, j), and the B template which operates on the external inputs in the neighbourhood of C(i, j). Several plausible VLSI architectures and high-level systems have been proposed based on this dynamical system. *Correspondence to: J. Pineda de Gyvez, Department of Electrical Engineering, Texas A&M University, College Station, TX 77843, USA R O. Moreira-Tamayo is now with Texas Instruments, 8505 Forest Lane, Dallas, TX 75243, USA Contract grant sponsor: CONACyT, the National Council of Science and Technology Mexico; The Office of Naval Research; contract/grant no: N00014-91-1-051 CCC 0098—9886/99/010135—17$17.50 Copyright 1999 John Wiley & Sons, Ltd. Received 15 May 1997 Revised 19 November 1997 136 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ In spite of all the progress in CNN, algorithms for image compression using CNN have scarcely been explored. The main goal of image compression is to represent an image with a reduced number of bits but preserving its quality and intelligibility for a given application. Images for medical applications require a high degree of quality and fidelity because of the liability and legal responsibility. On the other hand, the quality of images for the entertainment depends on the costs involved. Two types of approaches have been developed to cope with the above-mentioned situations, lossless and lossy compression. In the first approach, the image is recovered faithfully, but its compression ratio for gray-scale images is small. The second approach provides the higher compression ratios at the expense of some loss in the image. The algorithms presented in this paper are hardware-oriented and exploit the massive parallelism provided by CNN. The ideal hardware platform is a DTCNN; however, the algorithms will also work if there is a discrete control on the running mode of a continuous-time CNN. Hence, without loss of generality, we refer to the hardware architecture as simply CNN. As a one-to-one mapping between image pixels and cells is virtually impossible for large images, we are considering a time-multiplexing operation. This approach divides the image in tiles in which each tile’s size corresponds to the physical size of the CNN. Most image compression schemes rely on linear operations which we demonstrate are possible of being implemented using CNN. To implement these linear operations we often avoid the use of template A. Even though we are not using all the resources of CNN we are nevertheless using it as a general purpose image processor. Moreover, the full compression procedure makes only use of a CNN and a standby memory. We have carefully avoided operations that would require an additional digital processor as this would lessen the effectiveness of CNN. Also, all the algorithms are based on the assumption that we have a hardware implementation of a CNN with neighbourhood of radius 1. We pay special attention to this fact and avoid hypothetical situations for which the hardware does not exist. Lastly, in Section 9, we present a detailed study of the computation times required by our algorithms assuming a 64;64 CNN array size operating on a 256;256 image. The results indicate the feasibility of performing our full compression procedure for real-time video applications. 2. IMAGE COMPRESSION The main steps for image compression based on a transform scheme are shown in the block diagram of Figure 1. The first step is the transformation of either a full image or subdivisions of the image. In the case of the popular JPEG standard, the image is subdivided in blocks of 8;8 and then each block is processed using the cosine transform. For high rates of compression this subdivision is evident by simple visual inspection of the decompressed image. Subband coding using wavelets (this means tree-structured filter banks) reduces dramatically this ‘blocking’ problem because of the variable length of its basis functions. Long basis functions are used to represent smooth and background areas while texture is represented with short functions. After the image is transformed, it is in adequate form to be quantized so that it can be represented with a reduced number of bits. This can be a scalar quantization if each pixel is quantized independently or a vector quantization if a group of pixels are quantized jointly. In any case, the most critical part is the bit allocation. More bits should be assigned to the portions of the transformed image that contain most of its energy in order to reduce the quantization noise. Those portions of the image are estimated based on the spectral distribution. In the case of JPEG, the intensities of the first coefficients (upper left corner of the discrete cosine transform) on each 8 by 8 block are the largest. In the case of wavelet transform, the distribution in the components filtered through a high pass is close to a Laplacian distribution. The low-frequency components have a distribution closer to the one of the original image. Therefore, more bits are assigned to this portion. Finally, to remove any redundancy left in the bit stream, each quantization level is represented with a variable-width binary code. Those levels that have a high statistical probability of occurrence are given the shortest code word, while levels with low probability are given the longest code words. The technique used to do this process is known as Huffman coding. The main processing portion of lossless compression algorithms are based on entropy coding techniques such as this one. Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 137 Figure 1. Steps of a transform-based image coder Figure 2. Wavelet decomposition scheme. Original image and single-stage image decomposition 3. IMAGE TRANSFORMATION Several transformations have been reported to decorrelate an image, e.g. wavelet transform, cosine transform, subband decomposition using Quadrature Mirror Filters (QMF), etc. Although this presentation concentrates mainly on the algorithms for subband decomposition using wavelet transforms, the same algorithms can easily be adapted for either QMF or cosine transform. For a detailed presentation on wavelet transforms, the reader is encouraged to examine the underlying theory in Reference 13. The wavelet decomposition shown in Figure 2 can be obtained by convolving each row and then each column of the image with the scaling and wavelet functions. In Figure 3, G and H represent a low-pass and a high-pass filter, respectively, that can be implemented through QMF or wavelet basis. In the latter case, G and H are the corresponding scaling ( ) and the wavelet functions (t), respectively. The results of the transformation are denoted as LL, LH, HL, and HH, where the first letter represents the operation performed in the horizontal direction and the second one represents the operation performed in the vertical direction. This two-pass transformation can also be accomplished in one pass through the use of transformation matrices (two-dimensional transformation). The transformation matrices are obtained by combining the scaling and wavelet function as M**" 2 , M*&" 2t, M&*"t2 , M&&"t2t (2) In practice, images are finite sequences since they have a discrete number of pixels. Linear convolution has some practical problems on finite sequences. If an image of size m;n is convolved with a sequence of size p;q the size of the result will be m#p;n#q. This size is larger than the size of the image and it contains redundant information. Fortunately, solutions for this problem are well known in literature. They are the Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 138 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ so-called overlap-add and overlap-save methods. They basically consist in using circular convolution for which periodic boundary conditions (wrap-around effect) are assumed. For compression this is a minor inconvenience since the pixels in one edge of the image are usually not correlated with the pixels on the opposite edge. However, this effect is reduced by using small size transformations. 4. CNN TEMPLATES FOR WAVELET DECOMPOSITION For the following discussion assume that we are dealing with a CNN with neighbourhood of radius 1, e.g. the template matrix is of size 3;3, typical of today’s VLSI implementations. The order of the wavelet transform is an important factor to consider in the development of the algorithm. For discrete-time signals the order is given by the number of coefficients, or points, it contains. Wavelet bases that allow perfect reconstruction are dyadic, which means that the number of points is in powers of 2. Let us start with the simplest case, i.e. a 2-point transformation. The most popular 2-point wavelet is the Haar wavelet shown in equation (3): 1 1 [1, 1], t " [1 !1] (3) " F (2 F (2 The corresponding two-dimensional transformations, 2;2 matrices, are obtained using equation (2). Notice that M(),)) is an operator which is convolved with the image pretty much as the B template is in the case of CNN. Basically, when M is applied we say that the image has been transformed. Since B is a convolutional operator as well, we can make a map f : MPB that gives us the equivalent matrix transformation but now in terms of the CNN environment. For the Haar wavelet transform, each compressed element needs the input of 4 adjacent pixels (called a block). This transformation can be implemented in CNN by successively using the following templates: A"0, I"0, 0 0 0 1 B**" 0 1 1 , 2 0 1 1 0 0 0 1 B*&" 0 1 !1 , 2 0 1 !1 0 0 0 1 B&*" 0 1 1 , 2 0 !1 !1 (4) 0 0 0 1 B&*" 0 1 !1 2 0 !1 1 Notice that by having zeros in a row and a column we are reducing the effective neighbourhood of a cell due to the characteristics of the wavelet. Therefore, for any cell C and input image P"[p ] the GH GH transformation is given by pJ (N)" B (N) p GH GHIJ IJ C 3N IJ (5) GH where pJ denotes the transformation over p, and N"+LL, LH, HL, HH,. For the case of a higherdimensional biorthogonal wavelet (n-points), the scaling and wavelet functions are given by "[c c 2 c ], t"[!c c !c c ] (6) L\ L\ L\ 2 Observe that if the size of the CNN templates is larger than the order of the transform, the transformation matrix fits into the template matrix, as was the case with the Haar wavelet. Otherwise, the transformation Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 139 matrix needs to be broken into smaller partitions such that each partition fits into the template matrix. For example, the 4-point Daubechies wavelet has the following coefficients: c "(1#(3)/4(2, c "(3#(3)/4(2 (7) c "(3!(3)/4(2, c "(1!(3)/4(2 By applying equation (2) to an n-point wavelet one can see that the size of the transformation matrices is n;n, as shown in below: c c c c c c c c M**" 2 2 c c c c L\ L\ c c L\ 2 ccL\ 2 2 2 cL\cL\ 2 !c c c c L\ L\ c c !c c L\ L\ M*&" 2 2 c c !c c L\ L\ L\ L\ c c c L\ L\ c c !c c L\ L\ 2 2 c c !c c c c !c c 2 2 2 2 !cL\cL 2 c L\ L\ 2 !cL\cL\ 2 2 !c c 2 L\ !c M&*" (8a) 2 c c !c c L\ L\ L\ L\ !c c c c L\ L\ L\ L\ M&&" 2 2 !c c !c c L\ L\ (8b) c 2 !cL\c c c 2 L\ 2 2 c c 2 (8c) (8d) In order to implement this transformation we need to take into account that the CNN templates are equal for all cells and recall that our CNN array has a neighbourhood of 1. As such, the problem we are dealing with is essentially that of how to implement large-dimensional transformations with 3;3 templates. A way to implement B templates at no extra hardware cost is described next. First, divide the transformation matrix (n;n) in submatrices of equal or smaller dimensions than the template size, e.g. 3;3, 2;2, etc., such that n mod 3 or n mod 2 is zero. Let us further assume that this partitioning created m submatrices B . Then, GH order these submatrices in lexicographic order from left to right and from top to bottom, e.g. with i"j"0,2 , (m!1. If the partition yielded 3;3 submatrices then they are the actual ‘partial’ B templates. In case of 2;2 submatrices create ‘partial’ B templates as follows: 0 0 0 B**" 0 c c c c G H G H> GH 0 c c c c G> H G> H> Copyright 1999 John Wiley & Sons, Ltd. (9a) Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 140 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ 0 0 0 B*&" 0 !c c c c G L\H\ G L\H\ GH 0 c c !c c G> L\H\ G> L\H\ 0 0 0 B&*" 0 !c c c c L\G\ H L\G\ H> GH 0 c c !c c LG\ H L\G\ H> 0 (9b) 0 (9c) 0 B&&" 0 c c !c c L\G\ L\H\ L\G\ L\H\ GH 0 !c c c c L\G\ L\H\ L\G\ L\H\ (9d) i"0, 2, n/2!1; j"0, 2, n/2!1 By using this last partitioning scheme the transformation matrix is divided into n/4 parts creating the same number of partial templates. Without loss of generality, the algorithms discussed in the rest of the paper assume that the transformation matrix was partitioned as shown in equation (9). 5. SUBBAND DECOMPOSITION ALGORITHM The decomposition algorithm consists of performing a circular convolution of the transformation matrices (equation (8)) and the input image. These convolutions yield four new images commonly tagged as LL, LH, HL and HH. For the case of CNN, each set of partial templates (equation (9)) is sequentially applied to the input image for which a corresponding output array denoted as P (N) with N"+LL, LH, HL, HH, is obtained; see Figure 2. The CNN templates for the decomposition algorithm are A"0, B from equation (9), I"0, with the initial conditions in the CNN array set to zero: X(0)"0. As the template contains a row and a column of zeros, the information from the corresponding pixels would also be zero. Therefore, it is necessary to shift the image so that all pixels are considered in the subband decomposition. To accomplish the decomposition, each partial template B (N) in equation (9) is used as follows. Take the first partial template GH and process the input image by running the CNN for one unit of time. Keep the processed state of each cell such that it appears as initial condition for the next step. Shift the input image in the horizontal and then vertical direction by two positions wrapping around the edges. Repeat this process with the remaining templates. Observe that each CNN cell must hold its value in each iteration in order to accumulate, after processing with all partial templates, the summation of all terms in the convolution. Decimation is performed after the signal is filtered by extracting from the CNN state only the elements with odd indices e.g. the elements in positions (1,1), (1,3), (3,1), etc. Notice that after the decimation process is complete each of the four output images is one-fourth the size of the original image. The complexity of this algorithm is O(m#k) where k is the time needed to shift the image and m is the number of partial templates. Figure 3 shows simulation results for a Daubechies wavelet decomposition. The simulation results were obtained by building a CNN simulator through a Matlab program. 5.1. Subband reconstruction algorithm The reconstruction process is similar to the algorithm just described. In the case of 2-point orthogonal wavelets, such as the Haar wavelet, the reconstruction basis is the same as the decomposition basis. However, Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 141 Figure 3. Results of decomposition using Daubechies wavelet. Clockwise: LL, LH, HL, and HH the reconstruction and decomposition of real-valued biorthogonal wavelets, e.g. Daubechies wavelets, is different. The reconstruction basis (also called dual) is obtained by reversing the decomposition sequence and by alternating the signs of the coefficients, i.e. [ I ] "(!1)G [ ] , " G " L\G [tI ] "(!1)G [t ] " G " L\G (10) Using equation (6) a new set of transformation matrices, denoted as M , is obtained based on the principle indicated in equation (10). These matrices are similar to the ones shown in equation (8). The image reconstruction is implemented by adding the results of the convolution of the different components of M with the up-sampled pixels P of the decomposed image. The equations that govern these operations are the following: p "pJ ** ) M I **#pJ *& ) MI **#pJ &* ) M I **#pJ && ) MI ** GH GH GH GH GH (11a) p "pJ ** ) M I *&#pJ *& ) M I *&#pJ &* ) MI *&#pJ && ) MI *& G>H GH GH GH GH (11b) p "pJ ** ) M I &*#pJ *& ) M I &*#pJ &* ) MI &*#pJ && ) MI &* GH> GH GH GH GH (11c) p "pJ ** ) M I &&#pJ *& ) M I &&#pJ &* ) M I &&#pJ && ) M I && G>H> GH GH GH GH (11d) ∀i"0, 2, 2, M!2 and j"0, 2,2 , N!2; where M and N are the dimensions of the image. Each of these equations is implemented in four steps, e.g. one step for each of the inverse transforms of LL, LH, HL and HH. Since the image was decimated by 2 in both horizontal and vertical directions, we need to do an up-sampling so that every step processes one transformed pixel, pJ , from a 2;2 block that contains GH the corresponding decomposed pixel elements LL, LH, HL, and HH. As in the decomposition algorithm, circular convolution is also used in this case. As way of illustration, consider, for instance, equation (11a) for the case of i"0, 2,2 , M!2 and j"0, 2, 2, N!2. This equation makes use of input data P expanded as shown in Figure 4. Notice that each pJ (),)) in equation (11a) corresponds to a 2;2 block in Figure 4 that is now GH multiplied by M **. The reconstructed pixels p with i"0, 2, 2, M!2 and j"0, 2, 2, N!2 are taken GH Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 142 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ Figure 4. Input arrangement for reconstruction from the cells having odd indices, e.g. the elements in positions (1,1), (1,3), (3,1), etc. The process is repeated for pixels p ,p , and p . GH> G>H G>H> This reconstruction algorithm was simulated using a Matlab program as well using one-to-one pixel to cell mapping. The mean-square error between the input image and the reconstructed one is in the order of 10\ and it is due to the round up error. 6. COMPUTATION OF STATISTICAL PARAMETERS FOR QUANTIZATION AND BIT ALLOCATION The bit allocation problem consists of finding the minimum number of bits needed to quantize each of the subbands with minimum error. Bit allocation schemes make use of statistical parameters such as the mean and standard deviation of all pixels in the image. The mean squared error (MSE) of a quantized image can be reduced by assigning non-linearly spaced decision levels in such a way that all levels have the same probability. This type of quantizer is known as the Lloyd-Max quantizer. After decomposing an image, the statistical distribution of pixels in the elements LH, HL, HH, can be approximated with a Laplacian distribution which can be described by its parameters: mean (k) and variance (p). Therefore, it is important to calculate these parameters in order to have an effective quantizer. The algorithm for computing the mean of an image consists of several iterations depending upon the image’s size. The templates to compute the average of a block of four adjacent pixels are the following: 0 A"[0], I"0, 0 0 B" 0 1/4 1/4 (12) 0 1/4 1/4 In our approach only one processed cell is needed for each block, e.g. the output of the upper left cell in each 4 pixel block, the remaining outputs are discarded. These local averages are then fed again as inputs and the process is repeated until one cell accumulates the final mean value. Notice that in each iteration the size of the input array is one-fourth with respect to the previous iteration since we are taking only one output for each block. The fact that we are considering a time-multiplexing system and a standby memory allows us to choose which set of pixels are used as input data for the CNN. Decimation is performed by just discarding the pixels that we do not want and by presenting to the CNN the ones of interest. Now, if the size of the image is 2+ by 2+ then the complexity of this algorithm is O(M), i.e. the number of iterations is M. The process of calculating the variance involves obtaining the expected value of the squared value of the pixel minus the mean, i.e. p"E[(u!k)]"E[u]!k. Implementing the squared value is not simple. However, the non-linearity provided at the output of the CNN array can be exploited to compute the 1-norm of the image (#u!k# "E["u!k"]). This is done by calculating the mean of the absolute value of the image. Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 143 For the Laplacian distribution, p can be estimated from this norm as follows: E["u(x)!k"]" \ "u(x)!k" f (x) dx" S \ "u(x)!k" (2 (2 !((2 "u(x)!k")/p e dx" p 2p 2 (13) Therefore, p"2E ["u (x)!k"] (14) Using this equation the quantization tables for the Lloyd-Max quantizer can be based on the 1-norm. Table I shows the decision and reconstruction levels for a 2-bit quantizer based on the Laplacian distribution. The reconstruction level is the value given to the pixels that belong to a quantization level so that the MSE can be minimized. The 1-norm can be computed using CNN by separating the positive and negative parts of the absolute value function. The positive part of the absolute value function is obtained by performing the following steps. First, initialize the CNN array in the saturation limit, i.e. X(0)"!1. The symbol 1 denotes a unity vector. Then, set the CNN parameters as 0 0 0 I"!k, A"[0], and B" 0 1 0 (15) 0 0 0 and run the network for one time unit *t. Observe from (1) that the pixel values will be displaced from their [!1, 1] normalized range to [!1!k, 1!k] because of the constant input bias value k. At the output of the CNN’s activation function, the input image is rectified since negative terms in u !k are truncated as GH illustrated in Figure 5. The range of pixel values is represented by the hot-colour bar in this figure. Without any bias and because of the linear slope of the activation function, the input range of pixel values (hot-colour bar) is the same as the output one. When a bias is applied, the output range will be displaced and truncated towards one of the saturation levels. In Figure 5, point F represents the limit value for the image’s mean k (bias term) and E corresponds to values at the positive saturation level which were down-displaced because of the bias. One can see that due to this displacement values below the mean will become !1 because of the limiter function. To eliminate the biasing introduced by the initialization of the CNN (X(0)"!1), the following templates I"1, A"0, B"0 are used for a time unit *t. This will effectively add 1 to the pixel values. The range of the pixels now goes from 0 to its maximum value. To separate the negative terms in U!k1 a similar operation is performed. The network is now initialized in the positive saturation limit, i.e., X(0)"1, and the same templates of equation (15) are used. This will leave only the elements below the mean since those above the mean are clipped by the saturated output function of the CNN. The bias is compensated again by running the array with I"!1. Finally, the first image (positive values) is subtracted from the second image (negative values) to produce the absolute value of the image. This operation can be performed by initializing the array with the first image, providing as input the second image, and using the templates given by equation (15) with the centre value of B equal to !1. The 1-norm (E ["u!k"]) is just the mean of this absolute value image and can be obtained with the previously explained procedure. Table I. Placement of decision and reconstruction levels for a 2-bit Lloyd-Max quantizer, for Laplacian distribution and s"1 Decision levels d G Reconstruction levels r G !R, !0)1269, 0)0000, 1)1269 !1)8340, !0)4198, 0)4198, 1)8340 Bits 2 Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 144 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ Figure 5. Computation of the absolute value using the non-linearity of the CNN output Figure 6. Absolute value of input image Figure 6 shows a rectified image processed by using the algorithm just described. Its denormalized parameters are the following, the mean is k"99)1179 and the variance is p"2 ) (43)8136). 7. QUANTIZATION, BIT EXTRACTION AND ENTROPY CODING The purpose of quantization is to represent the image with a lower number of bits. Considering that CNN is an analog processor, we will implement with CNN the computational operations to convert the information processed by the CNN in a digital form. We will quantize the image and classify the pixels that belong to a quantization level in order to encode them. Encoding can be linear, e.g. for 2 bits 00, 01, 10, 11, or using entropy coding 0, 10, 110, 111. For entropy coding we need to calculate the probability that a pixel belongs to Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 145 any of the quantization levels. This is image dependent and therefore it needs to be calculated for each particular image. 7.1. Quantization The objective of a Lloyd-Max quantizer is to divide the signal range in a discrete number of levels in such a way that each level has equal probability (this will minimize the MSE). This section will address the problem of assigning pixels to the levels they belong. If one were to represent an image with one bit per pixel we have two levels. Positive values are represented with logic 1 and negative values with logic 0. The image can be just thresholded with respect to its mean to obtain two output levels, 1 and -1 for the CNN array. This can be done by making I"k and A"B with B as in (14) and taking the output from Y. The network will be allowed to run until the output values reach their limits. Figure 7 shows an illustration of a 2-bit quantizer. It shows four levels with their respective decision levels and the reconstruction value that minimizes the MSE. The algorithm for quantization described in this section provides a binary output. This output consists of determining the pixels belonging to a quantization level in order to encode them. Consider a quantizer consisting of n levels and n decision boundaries d . A pixel p belongs to a level m if K GH d (p (d . The pixels that belong to a particular level can be extracted using the following recursive K GH K> process. For the first level, start by the highest decision boundary d . For example, from Table 1 for 2 bits, K d "1)1268; see also Figure 7. Set I"d , and again A"B with B as in (14). The bit extraction process K consists of keeping those cells whose outputs y are 1. These pixels belong to the first level (p 'd ). GH GH K Consider now the next decision boundary d (for 2 bits d "0)0). The image is now thresholded by setting K\ I"d , and again A"B with B as in (14). All pixels with values above d will be set to 1. This includes K\ K\ the pixels that belong to the first and second levels. The next step consists of eliminating the pixels that belong to the first level, i.e. the pixels that also presented output 1 for d . To do this we subtract the output of K the current iteration from the previous iteration. This is a recursive operation for subsequent bits, i.e. the results of all previous iterations are subtracted from the current iteration. Repeat this process for d ,d , K\ K\ etc. Notice that any pixel will have an output of 1 only for the level it belongs and will be !1 for all other levels. 7.2. Entropy coding Entropy coding consists of assigning a code word of variable length to a pixel or set of pixels according to their probability. The shortest code word is assigned to the pixel value that has the highest probability of Figure 7. Illustration of 2 bit quantization levels Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 146 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ Figure 8. Bit allocation. Number of bits assigned to each of the subbands occurrence. Pixels can be coded separately or jointly. Higher compression can be achieved by compressing bits jointly. If an image is quantized with one bit it is convenient to quantize several pixels jointly. For example, if four pixels are quantized jointly, there are 16 possible combinations. Thus, we need to calculate the probability of occurrence for each of those combinations. The probability for a quantization level can be obtained by computing the mean of the image containing only the pixels corresponding to such level. Recall that pixels that belong to a given level have a value of 1, and the rest are !1. Computing the mean will give us a measure of the probability. For instance, if the mean is !1, it means that there are no pixels belonging to this level. If the mean is 1, then all pixels belong to this level. To compute the joint probability of two or more pixels we can compute the logic AND operation of all their possible combinations. Consider two consecutive pixels p and p quantized with 1 bit and GH G>H classified with the algorithm described before. The possible combinations for the 2 levels are [1,1], [1,!1], [!1,1], [!1,!1]. As an illustration of quantization and bit allocation, the original image from Figure 2 was decomposed using the 4-point Daubechies wavelet. The number of bits allocated to each of the subbands is shown in Figure 8. The reconstruction of this image will be shown in the next section. 8. RECONSTRUCTION Once the bits are extracted we can reconstruct the image from the binary data. This is an interesting feature for CNN since it implies that input data can be provided in a digital form as well as in the existing analog form. Suppose that we have the reconstruction levels associated to the Lloyd-Max quantizer r and that we have n bits. To reconstruct the image we will use the following process. Feed the first bit G of each pixel of the whole image as the input array, then set the template as in (15). By running the CNN for one time unit the value of each weighted bit will be loaded in the CNN. This process is repeated for all remaining bits and the partial results are accumulated having at the end the image loaded in the CNN array. Notice that the D/A conversion is actually performed by multiplying the digital bit by its analog value r "2\G : G 0 I"0, Copyright 1999 John Wiley & Sons, Ltd. 0 0 A"0, and B" 0 r 0 G 0 0 0 (16) Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 147 Figure 9. Reconstructed image 9. ESTIMATE OF CNN IMAGE COMPRESSION PROCESSING TIME In this section we will consider the computation time estimates taking into account the processing times of reported hardware implementations. It is assumed that the CNN is operated under a time-multiplexing scheme. Under this scheme, the image is divided in tiles of size equal to the CNN array size and the processing is done tile by tile in a lexicographical order. To avoid blocking effects it is necessary to have a slight overlap between consecutive blocks and to provide neighbour inputs to each tile. Our main concern here is to estimate if real-time video processing is possible with a CNN array feasible for a near future hardware implementation. The calculations to follow do not consider the possible non-idealities of a CNN system. Although this is a simplification, these times represent an upper bound for speed estimation. One must be aware that there are many factors that need to be considered with an analog hardware, e.g. cell offsets, time retention of the capacitors, mismatch among cells and so on. In fact, it is well known that a very crude analog implementation yields 7—8 bits of resolution (accuracy: 2\"7)81e-3) and that an analog memory suffers of retention problems. Yet, we are assuming for our algorithms a hardware platform with an accuracy greater than 1e-3 (about 10 bits of resolution). Additionally, to avoid analog memory retention problems it is also possible to use a digital memory with high precision D/As and A/Ds coupled to the CNN, although this would cause a further overhead in the actual I/O times. Computation times are based on the algorithms described in the previous sections and concern only the CNN processing and access times. Thus, we avoid relying on a high-level system for which the instruction level of parallelism is variable and depends on compiler optimization. For our time estimations we suppose that if we have access to a RAM, data could be stored at choice and that there is a finite-state machine controlling the CNN, the standby memory and its memory manager. This finite-state machine should take care of scheduling image tiles for the time multiplexing approach, of providing the right templates and data sequencing for each algorithm, and of storing and retrieving data from the standby memory. For readability purposes we briefly summarize each algorithm and the corresponding data flow in a pseudo-code. To simplify the calculations, and without loss of generality, we are assuming that the overlap between consecutive blocks in the time-multiplexing scheme is zero. Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 148 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ Step 1. Image transformation Consider an n point filter, an N;N image, an M;M CNN array, and a decomposition of l subbands. For each subband we need to process n/4 partial templates, and in each subband (N/2J\/M) tiles. The total image transform time for can be estimated as follows: n J N/2J\ t "(q#d #d ) 4 M J (17) where q is the CNN processing time, d and d represent the CNN input and output times, respectively. The J corresponding algorithm is function transform( ) N;N"image; M;M"array; n"filter order; for subband"1 to 4 ( N/2 subband!1 )2 for tiles"1 to M2 for partial—templates 1 to n 2/4 load (input image) run(cnn, 1q) output (processed image) end end end Step 2. Computation of statistical parameters This step involves computing the mean and standard deviation of the pixel distribution. In this case we need to compute N /M tiles and for each tile we need to iterate log M times to find the mean. Hence, the total time to compute the mean is N t "(q#d #d ) log M M (18) The data flow algorithm is as follows: function mean—computation( ) N;N"image ; M;M"array; for tiles"1 to N 2/M 2 for block"1 to log2 (M ) load (input image) run(cnn, 1q) output (processed image) end end Similarly, the time to compute the standard deviation is calculated for N/M tiles and a constant processing time k as follows N t " k (q#d #d ) M Copyright 1999 John Wiley & Sons, Ltd. (19) Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 149 The specifics of the processing are shown in the next algorithm function std—computation( ) N;N" image; M;M"array; for tiles"1 to N 2/M 2 load (input image) run (cnn, 5q)/*compute positive image*/ output (positive image) load (input image) run (cnn, 5q)/*compute negative image*/ load (positive image) run (cnn, 5q)/*subtract positive from negative image*/ output (processed image) end Step 3. Quantization and bit extraction This step consists of two parts; the first part identifies the elements of the image that belong to each of the quantization levels while the second part assigns a binary code word to them (bit extraction). The first part is accomplished by thresholding the image in such way that the elements that belong to a quantization level present an output of 1, and all others elements are zero. The bit extraction operation will convert the elements that belong to a quantization level into its code word representation, e.g. for a two bits binary code we have that the code words are 00, 01, 10, 11. Each of these two bits is extracted from the quantization level in one iteration. In practice, a multiresolution decomposition is carried out. That is, typically the LL subband is further decomposed into LL, HL, LH, and HH subbands. In turn, the new LL subband is further decomposed, and so on. Therefore, we have m multiresolution levels and since the CNN acts on each of the subbands of the transform, we need to process W(N/(2KM))X tiles for each subband. Then, for each subband n we need to assign a variable number of code words c. The total computation time is N GK HL C(i, j) (20) t "k(q#d #d ) 2GM G H where k is the processing time for one quantization level, and c (i, j) is the number of codes for subband j in multiresolution level i. The corresponding algorithm is shown next: function quantization( ) N;N"image; M;M" array; m"levels for multiresolution analysis c(decomposition, subband)"number of codewords in subband for decomposition"1 to m for subband"1 to n for codewords"1 to c(decomposition, subband) /*computation for one quantization level */ for tiles"1 to (N/4decomposition)2/M 2 load (input image) run (cnn, 5q)/*compute (#) part, e.g. elements above level */ load (input image) run (cnn, 5q)/*subtract input from (#) image*/ output (processed image) end end end end Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) 150 O. MOREIRA-TAMAYO AND J. PINEDA DE GYVEZ Step 4. Entropy coding In this step the binary information obtained in the previous step is coded to reduce the redundancy left in the data. Section 7.2 described this algorithm based on a two-dimensional Huffman coding scheme. The binary information obtained in the previous step is compared with templates that provide a code depending on the frequency of occurrence of the template on the image. The computation time consists in processing N/M tiles and of applying 2N template combinations for each tile, where p stands for the number of pixels that are quantized jointly 2NN t "(q#d #d ) M (21) The corresponding algorithm is shown next: function entropy—coding( ) N;N"image; M;M"array; pixels "number of pixels jointly quantized; for tiles"1 to N 2/ M 2 for combinations"1 to 2pixels load (input image) run (cnn, 5q)/ *compare with template */ output (processed image) end end 9.1. ¹otal processing time Our main concern here is to estimate if real-time video processing is possible with a CNN array feasible for a near future hardware implementation. As way of example and without loss of generality consider the case of a 64;64 CNN array, an image of 256;256 pixels, and assume the following processing times: Convergence: q"0.5 ls, input access: d "12 ls, and output access: d "8 ls. We are considering a 4-point Daubechies wavelet transform for the subband coding, a multiresolution analysis of three levels, a worst-case allocation scheme of 2 bits for two subbands in the first decomposition, 1 bit for the second decomposition for three subbands, 2 bits for the third decomposition in only one subband, and a joint quantization of four pixels. Evaluation of formulae (17)—(21) yields a total processing time which is obtained from summing each processing time t : Image transformation (t ) plus statistical parameters (t #t ) plus quantization (t ) plus G entropy coding (t ), i.e. 1742)5 ls#1328)0 ls#656)0 ls#5760)0 ls"9486)5 ls. The relevance of this result is that it suggests that the construction of CNN hardware for real-time processing is feasible. In commercial television the amount of frames per second is 30—interlacing the odd number of lines with the even number of lines. Each interlaced image has a period of 1/60 s"167 ms. The estimated computation time for image compression is 9)49 ms. Therefore, the implementation of a real-time 64;64 CNN image compressor is feasible. An important observation can be made by considering the time estimates; most of the processing time for the compression operation is determined by time required for the input and output of data to and from the CNN. 10. CONCLUSIONS Algorithms for image compression based on a CNN image processor have been developed. They include all the operations needed in standard image compression. It was demonstrated that the CNN is capable of performing image compression without requiring an external digital processor to support the algorithms or Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999) SUBBAND CODING AND IMAGE COMPRESSION USING CNN 151 those operations not implemented with CNN. This implies that an image processor can be built with a CNN as the only central process unit. These algorithms exploit the massive parallelism that CNN provides; therefore, they take advantage of the fast processing characteristics of CNN with respect to a digital processor. As it was estimated in Section 9, a real time image compressor for commercial video can be implemented using the proposed algorithms. The compression algorithms given in this paper are also suitable for multiplexing CNN arrays. REFERENCES 1. L. O. Chua and L. Yang, ‘Cellular neural networks: theory’, IEEE ¹rans. Circuits Systems, 35 (10), (1988). 2. P. Kinget and M. S. J. Steyaert, ‘A programmable analog cellular neural network CMOS chip for high speed image processing’, IEEE ¹rans. Solid State Circuits, 235—243 (1995). 3. H. Harrer, J. A. Nossek, and R. Steltz, ‘An analog implementation of discrete time cellular neural networks’, IEEE ¹rans. Neural Networks, 3, 466—476 (1992). 4. T. Roska and L. O. Chua, ‘The CNN universal machine: an analogic array computer’, IEEE ¹rans. Circuits Systems, 40 (3), 163—173 (1993). 5. P. L. Venetianer and Tamás Roska, ‘Image compression by CNN’, Proc. IEEE Int. Conf. on Neural Networks, June 1996, pp. 1510—1515. 6. O. Moreira-Tamayo and J. Pineda de Gyvez, ‘Preprocessing operators for image compression using cellular neural networks’, Proc. IEEE Int. Conf. on Neural Networks, June 1996, pp. 1500—1505. 7. C. C. Lee and J. Pineda de Gyvez, ‘Time-multiplexing CNN simulator’, Proc. Int. Symp. on Circuits and Systems ’94, December 1994, pp. 407—410. 8. A. Q. Fong, A. Kanji and J. Pineda de Gyvez, ‘Time multiplexing scheme for cellular neural networks based image processing’, J. Real-¹ime Imaging 2 (3), 231—239 (1996). 9. S. J. Lim, ¹wo-Dimensional Signal and Image Processing, PTR Prentice-Hall, Englewood Cliffs, NJ, 1990. 10. G. K. Wallace, ‘The JPEG still picture compression standard’, Commun. ACM, 34 (4), 30—44 (1991). 11. R. A. DeVore, B. Jawerth and B. J. Lucier, ‘Image compression through wavelet transform coding’, IEEE ¹rans. Inf. ¹heory, 38 (2), 719—746 (1992). 12. S. G. Mallat, ‘Multifrequency channel decomposition of images and wavelet models’, IEEE ¹rans. Acoust Speech, Signal Process 37, (12), 2091—2110 (1989) 13. M. Vetterli and J. Kovacevic, ¼avelets and Subband Coding, Prentice-Hall PTR, Englewood Cliffs, NJ 07632, 1995. 14. O. Rioul and P. Duhamel, ‘Fast algorithms for discrete and continuous wavelet transforms’, IEEE ¹rans. Inform. ¹heory, 38, 569—586 (1992). 15. G. Strang and T. Nguyen, ¼avelets and Filter Banks, Wellesley-Cambridge Press, Wellesley, MA, 1996. 16. R. Domı́nguez-Castro, S. Espejo, A. Rodrı́guez-Vázquez, R. Espejo and R. Carmona, ‘A CNN universal chip in CMOS technology’, Proc. IEEE ¼orkshop on CNN and Apps., CNNA-94, December 1994, pp. 91—96. 17. Matlab version 4.2c with Image Processing Toolbox version 1.0b, The MathWorks Inc., Natick, Massachusetts, 1994. 18. O. Moreira-Tamayo and J. Pineda de Gyvez, ‘Wavelet transform coding using cellular neural networks’, IEEE NO¸¹A ’95 Proc. December 1995, pp. 541—544. 19. S. P. Lloyd, ‘Least squares quantization in PCM’, IEEE ¹rans. Inform. ¹heory, IT-28, 129—137 (1982). 20. J. Max, ‘Quantizing for minimum distortion’, IEEE ¹rans. Inform. ¹heory, IT-6, 7—12 (1960). Copyright 1999 John Wiley & Sons, Ltd. Int. J. Circ. ¹heor. Appl., 27, 135—151 (1999)

1/--страниц