INTERNATIONAL JOURNAL OF CIRCUIT THEORY AND APPLICATIONS, VOL. 24, 111- 120 (1996) EVALUATION OF CNN TEMPLATE ROBUSTNESS TOWARDS VLSI 1MPLEMENTATION.t PETER KINGET AND MICHIEL STEYAERT ESAT-MICAS, Departement Electrotechniek. Kaiholieke Universiteit Leuven, Kardinaal Mercierlaan 94,B-3001 Leuven, Belgium SUMMARY In this paper a method for the evaluation of static robustness towards random variations in cellular neural network (CNN) templates is proposed. From this evaluation, circuit accuracy specifications for a VLSI implementation are derived which allow the designer to optimize the performance. Moreover, from this evaluation method, guidelines for robust template designs are derived and parametric testing templates are developed. 1. INTRODUCTION For the implementation of cellular neural networks (CNNs)' the cell area has to be minimized in order to obtain the maximal cell density. However, using circuit components of small area results in large component variations, so that accuracy specifications have to be available to the hardware designer which guarantee correct operation of the network. Up to now very limited data on the robustness of CNNs towards random variations have been available in the open literature. In this paper a method for the evaluation of the static robustness of CNN templates is proposed. In Section 2 the technological implementation limitations are discussed in detail. The robustness evaluation method is described in Section 3. First the robustness of a single cell towards variations is calculated and from this the behaviour of the whole network is evaluated using statistical estimations. The validity of the approximations and assumptions is checked by Monte Car10 simulations in Section 4. This evaluation method can be used to generate the specifications for the building blocks of a CNN circuit implementation as illustrated in Section 5 . Furthermore, the evaluation method is also very helpful in the development and performance optimization of templates. Finally it can be applied to the generation of test templates that can be used to parametrically test CNN chips for their accuracy. 2. HARDWARE IMPLEMENTATION LIMITATIONS AND ERRORS For the VLSI implementation of a signal processing or computational system, accuracy specifications are of prime importance. They determine the optimal implementation style and the ultimate performance limits for power and area consumption and processing speed. In digital VLSI circuits the signals are discrete in time and amplitude. The transistors are only used as switches. Whereas the advantage of digital circuits is their unlimited attainable accuracy, the main disadvantages are their high power drain, high area consumption and reduced processing speed. In analogue circuits the full signal-processing capabilities of transistors are used, which results in higher speed, low power consumption and small size but the attainable accuracies are lower. Especially in application fields where a high processing speed is required and only t Part of this research has been reported in the Proceedings of the 1994 IEEE International Workshop on Cellular Neural Networks and Their Applications held in Rome. CCC 0098-9886/96/010111-10 0 1996 by John Wiley & Sons, Ltd. Received 15 January 1995 Revised 6 April 1995 112 P. KINGET AND M.STEYAERT limited accuracy is necessary, analogue VLSI circuits still yield much better performances than digital ~mplementations.~ In a continuous time cellular neural network with a self-feedback greater than unity, a cell’s output can only be stable in a high or low state, but the cell’s state transition is graded. Two types of errors in the computation can occur: (i) static errors where the component variations are such that a cell has an incorrect behaviour independently of the evolution of its neighbours; or (ii) dynamical errors-comparable with races in digital circuits-which originate from internal dynamical errors in the cell or from an incorrect interaction between cells due to a large mismatch in the time constants. In this paper a method of evaluating the influence of static errors on the correct operation of a CNN is proposed. The error causes can be systematic or random. The inherent non-linearity of transistors causes deterministic distortion errors. Systematic errors can be eliminated by correct biasing, good signal amplitude choices or compensation schemes. The random variation in component values and the noise sources in active components cause random errors, which limit the minimal signal that can be processed correctly by a circuit. For the analogue implementation of computational circuits the matching behaviour of components is the main effect limiting the attainable performance. The component values and properties have a normal distribution and the variance of the component spreading is inversely proportional to the area of the components used and proportional to their mutual distance.* For many circuits the distance effect is negligible and the circuit accuracy is proportional to the square root of its area. The proportionality coefficient is process-dependent and can be related to physical and technological effects. An improvement in the matching is not directly guaranteed by a downscaling of the technology, although the matching tends to improve. The speed of a circuit will be limited by the capacitive loading of the nodes. The (parasitic) capacitive loading is proportional to the area of the component and the speed can only be increased by increasing the signal or bias current and thus the power consumption. The maximal attainable speed for a given power will be increased by downscaling the technology. An analogue VLSI circuit designer is thus confronted with important trade-offs: area versus accuracy, power versus speed and speed or power versus accuracy. The overall performance scaling with a downscaling of the technology is not straightforward and will be dependent on the specific processing technologies used. The main application field of CNNs up to now has been in image processing. These applications require a high operation speed but limited precision. For the design of a CNN the main objective is to minimize the area of a cell and to reduce its power consumption in order to integrate as many cells as possible on a single die. It is clear that these specifications can only be achieved with VLSI circuits in combination with lowaccuracy specifications. Moreover, the area and speed can only be fully optimized if the lower bounds on the accuracy are known to the designer. 3. TEMPLATE ROBUSTNESS CNN templates can be divided into two classes. Non-propagating templates such as the edge detector (see Table I) or noise filtering’ have an A-template with all zeros except the self-feedback. Each cell evolves independently of the evolution of the output of its neighbouring cells. For this type of template a correct behaviour of each individual cell will guaranree a correct network behaviour. For information-propagating templates such as the connected component detector5 or holefiller6 (see Table I) the cell’s evolution depends on the evolution of its neighbours and the dynamics are more complex. However, the activity of the network is restricted to a small number of cells. This implies that if an individual cell behaves correctly and no races or dynamical errors occur, the network should behave correctly. The validity of this assumption will be shown via Monte Car10 simulations. However, the correct operation of every single cell is a necessary condition for the correct operation of the network. 3.1. Correct operation of a single cell ‘The behaviour of one cell with its neighbours remaining stable can be studied by drawing its dynamic 113 CNN TEMPLATE ROBUSTNESS Table 1. CNN templates Template name B A I AND4 [!s I:- CCD' [;; -;I CCD-LARGE [:; :] EDGE' 0 0 0 ia i sl HOLE^ [:i 81 HOLE-MOD SHADOW \ \ + t A \ Figure 1. Dynamic mutes for a cell c in the connected component detector for different states of its two neighbours. The stable equilibria are indicated by full circles, the unstable one by an open circle 114 P. KINGET AND M. STEYAERT routes for all possible combinations of its neighbouring states.' The cell state (x,) evolution equation can be rewritten as dx, = -Gx,. + A,f(x,.) + k dt where k contains the constant contributions of the I-template, the B-template and the A-template with the self-feedback (A,) excluded. From these diagrams the stable and non stable equilibrium points of the cell can be determined. In Figure 1 the dynamic routes for a single cell in a connected component detector5are displayed. In Figure 2 the influence of a deviation in the different template values is shown. The deviation Ak in the k-value results in a uniform shift of the dynamic routes. Ak can originate from a deviation in the A- (except A,), B- or I-template values or in the output levels of the neighbours. A deviation of the cell conductance value G affects the slope of the dynamic routes in all regions, whereas a change AAcin the A-template selffeedback coefficient results in a slope change in the linear region and a constant shift in the saturation regions. In Figure 2 the effect of positive values for the A's is shown but since the A's are random variables, they can have negative or positive values. (a) The influence of a deviation in the k value, which can originate from an error in the A, B or I template values or the output levels of the neighbours (b) The influence of a deviation in the cell conductance value (c) The influence of a deviation in the self-feedback value in the A template Figure 2. Influence of template coefficient variations on the operation of a CCD CNN TEMPLATE ROBUSTNESS 115 The effect of these deviations on the correct cell behaviour can be evaluated from the change in the equilibrium points of the cells for the different neighbouring states. From Figures 1 and 2 it is clear that the dynamic routes can shift without a change in the cell behaviour as long as the shift (Arou,) at the unit state 1 or - 1 is smaller than the maximal allowed deviation (Awmpla,), which is indicated by the arrows in Figure 1. For the connected component detector, Atemplate = 1. The variation in the dynamic route is dependent on the variation in the template values, which have a normal distribution since the circuit components are also normally distributed. Moreover, since the different template coefficients are implemented by different circuits and the variation in transistors is uncorrelated,' the deviations will be statistically invariant. The total shift of the dynamic route in the unit state is A,,,, = A G + z A A i + x A B i + A I i i The standard deviation ,,< ,I of the shift of the dynamic route in the unit state is calculated from the standard deviation of the template values:' that a cell is operating correctly with the given deviation in the template coefficients The probability Pcorrecl of the dynamic route in the unit state is smaller than the maximal is the probability that the deviation AmUte for the given template. The deviationA,,, follows a normal distribution with allowed deviation Alemplate and the probability P,,,,, is calculated from' mean zero and standard deviation amUte I,* 4% 1 erf(x) = - exp( +) 2 dt 3.2. Yield of an N-cell CNN For a network with N cells and a probability P,,,,, that a single cell is functioning correctly, the probability that the network is correct or the yield of the network chips is P,",,,,, assuming that all cells have Qroutc Figure 3. Yield of a network as a function of umuic: -, 32 x 32 network; ......,128 x 128 network; -.-., 512 x 512 network 116 P.KINGET AND M.STEYAERT to operate correctly for a correct network. In Figure 3 the yield of the network for a template with Atemplate = 1 is displayed as a function of the spread (J,, of ~ the dynamic route for different network sizes. The larger the network, the smaller is the spread allowed for a given yield. From the designer viewpoint it is important to remark that below a certain threshold in u,,,, the yield of the network becomes very high. This threshold determines the accuracy specifications that guarantee an economic VLSI implementation of the CNN. Inversely, for a given desired yield of the network chips, a given size N of the network and a given of the dynamic route, the allowed spread umuteof the cell's maximal allowed deviation Atemplate dynamic route can be calculated by inverting equation (5). This spread can be transformed to the allowed standard deviation o,,,uB,and u, of the template values using equation (3). These are used to derive the specifications for the different circuit components of the cell realization as explained in Section 5.2. 4. SIMULATION RESULTS A deviation in the template coefficients will have an influence on the dynamic behaviour or the state evolution of the cell, since the current injected in the capacitor will deviate from its ideal value. For nonpropagating templates this will only result in a change in the convergence time but will have no influence on the correct computations. However, when the information is propagating through the network, dynamical effects can influence the correct computations. The evaluation method only takes into account the effect of static errors on the correct behaviour of a CNN. To evaluate whether the static effects of the template deviations are dominant over their dynamic effects, Monte Carlo simulations were performed for information-propagating templates. In Table I1 the desired yield and the corresponding probability P,,,, for a cell to behave correctly are of the template values, Monte Carlo simulations of tabulated. With the resulting calculated allowed uvalue the CNN are performed. For the CCD template (Table I) the simulation is performed with the edges and all cells initialized to -1 except for the first cell. For this test image all cells (except the first and the last one) have to make transitions from - 1 to 1 and from 1 to - 1, which implies that all dynamic routes are checked. In Figure 4 a correct simulation of the 150 100 x 1 CCD simulations is displayed, showing the correct Table II. The necessary probability of correct operation of a cell (P,,,,) calculated for a given yield and the corresponding simulated yield performance. The last column (ucmr)is the standard deviation of the simulated yield and the number of simulations is indicated in parentheses. In the third column the allowed absolute deviation of the template values is given. A11 values are percentages Calculations Yield PCO,,, Monte Carlo simulation fl"illlU 4 x 1 connected component detector 70.0 91.50 29-0 80.0 94.60 26.0 90.0 22.50 22.5 99.0 99.75 16.5 100 x 1 connected component detector Yield @ern, 66 78 87 99.6 2.9 (250) 2.5 (250) 2.5 (250) 0.6 (250) 90.0 99.89 32 x 32 modified holefiller 15.26 91.7 2.4 (150) 99.99 9.73 97.3 1.6 (150) 95.0 (90.0) 117 CNN TEMPLATE ROBUSTNESS ..........,,..........;.,.,"""'~.........;....~..,,._......... I -51 50 0 100 150 200 Time Figure 4. Simulation of a 100 x 1 CCD with varying template values over the cells for an input image [l -1 ... - 11 1 evolution of all cells with varying template values for an input image of [I - 1 ... - 11. The variation in the template values over the cells results in a variation in the equilibrium points of the cells and their end states which can clearly be observed. For the modified holefiller template HOLE-MOD (see Section 5.1 and Table I) the input image is set to a white image with black edges. All cells are initialized to black for this template. If the network operates correctly all cells remain black. If at least one cell becomes white due to errors, the whole image becomes white. With this test image, however, not all dynamic routes are tested. In fact the network will = 1) downwards. This is only behave incorrectly if the shift in the dynamic routes is larger than Atempiate( on1 the case for half of the incorrect networks and the yield obtained by the simulations is then or 95%. The accuracy of the simulated yield estimate is given by (1 - Yield) x Yield G o , = no. of simulations (7) Taking into account the accuracy of the simulated yield estimates, all simulation results agree well with the calculations. For the tested templates the influence of the variations in the template values on the static behaviour is dominant over their impact on the dynamic behaviour. 5. APPLICATIONS 5.1 Robust template design The method is very helpful for the evaluation of templates towards the feasibility of their hardware implementation. On the TOP of Figure 5 the two critical dynamic routes for the classical holefiller template6 are drawn. The upper route is for k = 1 corresponding to the case of a black input pixel and all but one neighbouring states white. The lower route is for k = 0 corresponding to a white input pixel and all neighbouring states black or a black input pixel and all neighbouring states white. If the cells all start with an initial state of black and the input image is presented at the inputs, the network will fill up the holes in the image. However, for k = 1 the equilibrium is unstable and the slightest variation will result in a wrong behaviour. Such a template cannot be used in a hardware implementation since variations will always exist. If the I-template value is changed from I = - 1 to 0, the cell behaviour becomes much more insensitive to non idealities. Now Atemplate = 1 is achieved and the higher robustness makes a hardware implementation 118 P. KINGET AND M. STEYAERT t (a) The critical dynamic routes for the original holetiller template; stable equilibria are indicated with a tilled circle and unstable with a circle t (b) The critical dynamic routes for the modified template; the maximal allowed deviation for a correct operation is increased from 0 to I Figure 5 feasible (see Section 5.2). Moreover, the holefiller will now work for black images on white backgrounds and for white images on black backgrounds as reported in Reference 8. In this way the robustness evaluation can be used to optimize the template robustness or to (re)design templates. This procedure can be compared with classical design centreing approaches.' There the acceptability region for the parameters is first determined. Then a goal function or quality measure is defined and the parameters are redesigned within the acceptability region towards optimal quality. In the presented approach the quality of a template is determined by its robustness, which is related to Atemplate. The influence of the different template entries on Arempllte has to be determined as well as the acceptability range for the parameters so that the network function remains intact. With this information the template can be redesigned for high robustness. We will briefly illustrate this procedure for a connected component function. Templates of the form A = [a, a + 1, - a ] , B = 0 and I = 0 with a > 0 have dynamic routes that are isomorphic to the standard CCD template and all perform the CCD function. Aternplare is equal to a. Thus, to obtain optimal robustness, a has to be chosen as large as possible. In practice the U-value will then be limited by a practical limitation such as the limited state swing. 119 CNN TEMPLATE ROBUSTNESS 5.2. Generation of accuracy specifrcations In Section 3.2 a procedure was introduced to determine the upper limit for urnUte for a given network size, Atemplate and desired yield. This specification can be translated into accuracy specifications for the different template circuits starting from equation (3). Assuming that all circuits have the same relative accuracy T(,, independent of the template value, ureI can be calculated from urOute by For a template circuit implementation with a constant absolute error uahindependent of the template value, uab,is given by M~~~= J(no. of template values # 01 (1 1) In Table 111 the necessary accuracies for several templates are listed for a network of 128 x 128 cells and M,, and the corresponding urel are calculated. From these results, with a yield of 90%. Values of urnUte interesting conclusions can be drawn. The more complex the template, the more sensitive it becomes to mismatches (see e.g. edge detector). This can be explained intuitively, since a cell must be able to distinguish a smaller signal change from a signal with a larger common mode, which is also reflected in the This is known by analogue designers to be a difficult task. Moreover, by optimizing lower vale of Atemplate. the template values, a template with a similar behaviour can be obtained with much higher robustness, as illustrated by the CCD-LARGE and HOLE-MOD templates. In the last column a normalized estimate for the area of the synapses in the cell circuit is given. The area of a template or synapse circuit for a fixed function chip implementation is proportional to the number of non-zero entries in the template and inversely proportional to the allowed error u,~.For a programmable CNN implementation the worst-case specifications have to be taken into account which results in large circuit areas. Moreover, the implementation of a programmable synapse or template circuit requires more area than fixed value synapses. 5.3. Parametric test templates In the fabrication process of VLSI chips the testing after fabrication is of extreme importance to guarantee high quality chips. Functional testing with test inputs is performed to eliminate hard errors such Table III. The allowed uroUtc calculated for a 128 x 128 network and for a yield of 90%. From these specifications the maximal relative error umlfor the template circuits can be calculated from equation (8). In the last column a normalized estimate of the area of the synapses is given AND CCD CCD-LARGE EDGE HOLE HOLE-MOD SHADOW 1.5 1 2 0.25 0 1 2 0.33 0.22 0.44 0-06 0.00 0.22 0.44 2.3 2.6 4.2 3.4 5 5 3.6 14 8.4 10.5 1.9 0 4.4 12.3 2x51 3 x 142 3x91 11 x 2770 m 5 x 516 3 x 66 120 P. KINGET AND M. STEYAERT as stuck-at faults. Besides, parametric testing is performed to check whether the circuits meet the specification ranges on their inputs and outputs. Test templates can be designed to test a fabricated network chip parametrically. By using e-g. a CCD function test template of the form A = [ a ,a + 1, - a ] , B = 0 and 1 = 0 with a test image of white pixels and one column of black pixels on the left, the smallest value of a can be measured for which the network still that the A-template circuits of the chip can operates correctly. This value is the lower bound on Atemplate process. Similar test templates can be designed for the B-template circuits starting from e.g. the edge detector. With this strategy a parametric test procedure can be set-up for CNN chips. 6. CONCLUSIONS In this paper a method is proposed for the evaluation of the static robustness of CNN templates towards random variations in the template values, which are inevitable in a VLSI implementation. These robustness specifications are translated into circuit specifications that guarantee a correct network operation with a given yield. With these specifications a circuit designer can optimize VLSI circuit implementations towards minimal area and power consumption and maximal speed. The evaluation method is also applicable to the optimization of templates. It is used in the optimization and choice of template values towards a good VLSI and hardware implementability. Furthermore, test templates can be derived to parametrically test fabricated cellular neural network chips. ACKNOWLEDGEMENT The authors are funded by the Belgian National Fund for Scientific Research. REFERENCES I . L. 0. Chua and L. Yang, ‘Cellular neural networks: theory and applications’, IEEE Trans. Circuits and Systems, CAS-35, 1257- 1290 (1988). 2. M. J. H. Pelmom, A. C. J. Duinmaiier and A. P. G. Welbers, ‘Matching of MOS transistors’, IEEE J . Solid-State - properties _ . Circuits, SC-54, 1433-1439 (1989).3. E. A. Vittoz, ‘Future of analog in the VLSI environment’, Proc. IEEE Int. Symp. on Circuits and Systems, May 1990, pp. 1372 - 1375. 4. T. Roska and L. Keks, (eds.), ‘Analogic CNN program library,’Rep. DNS-5-1994,Analogical and Neural Computing Laboratory, Computer and Automation Institute, Hungarian Academy of Sciences, Budapest, 1994. 5. T. Matsumoto, L. 0. Chua and H. Suzuki, ‘CNN cloning- template: connected component detector’, IEEE Trans. Circuits and . Systems, CAS-37,633-635, (1990). 6. T. Matsumoto, L. 0. Chua and R. Furukawa, ‘CNN cloning template: hole-filler’, IEEE Trans. Circuits and Systems, CAS-37, 635-638 11990). I . A. Papoulis, Probability, Random Variables and Stochastic Processes, McGraw-Hill, New York, 1991. 8. L. 0. Chua and P. Thiran, ‘An analytical method for designing simple cellular neural networks’, IEEE Trans. Circuits and Systems, CAS-38, 1332-1341 (1991). 9. R. K. Brayton and R. Spence, Sensitivity and Optimization, Elsevier, Amsterdam, 1980.