Dev. Chem. Eng. Mineral Process. 13(3/4),pp. 211-220, 2005. Stochastic Model Reduction by Maximizing Independence Hui Zhang* and You-Xian Sun National Laboratory of Industrial Control Technology, Institute of Modern Control Engineering, Department of Control Science and Engineering, Zhejiang University,Hangzhou 31002 7, P.R. China By analysing information descriptions in state space models of linear stochastic systems, this paper proposes two model reduction methods via principles of marimizing independence and conditional independence among the reduced state vector, respectively. These methods are based on state aggregation. The independence and conditional independence are measured by the Kullback-Leibler information distance. It is demonstrated that the maximum conditional independence method is not only applicable to stable systems, but also applicable to unstable systems. Simulation results illustrate the efficiency of the present methods. Introduction Information theoretic methods are attracting more and more attention in the field of control theory [ 11, 12, 161. The present paper focuses on the problem of model reduction in the framework of information theory. For stochastic systems, many model reduction approaches are available in the literature, such as state aggregation [2], covariance equivalent realization method [ 141, stochastic balanced truncation [6] and L2method [13]. There are some special approaches based on information theory, such as the method of minimizing the Kullback-Leibler information distance (KLID) between the full and reduced order models [ 1 11. In &IS paper, the concept of KLID [lo] is adopted to deduce two methods of model reduction, however, not as a measure of distance between the full and reduced order models, but as a measure of statistical independence among the components of reduced state vector. KLID is discussed and two model reduction methods based on state aggregation, the maximum independence method and the maximum conditional independence method, are derived. Illustrative examples are included. Measure of Independence It is frequently of interest to measure statistical independence (or dependence) between two stochastic processes or among a random vector in such areas as engineering, econometrics, and medicine. A typical example is the discrimination function of features in the field of pattern recognition [4]. Among these areas, several measures are available in the literature, such as feedback measure 171 and information * Authorfor correspondence (zhanghui-iipc@zju.edu.cn). 211 ffuiZhang and You-Xian Sun theoretic measures [4,5]. This section introduces the measure of KLID. Information of a random variable can be measured by entropy. Let v be a random vector with probability density p v . The entropy of v is H ( v ) = -E[lnp,(v)], where E denotes taking mathematical expectation. As proposed in the field of independent component analysis (ICA) [ 1, 41, statistical independence can be measured by KLID between the joint probability density of a random vector and the product of its marginal densities. Suppose v = [v, v2 v, 3' E R" is a random vector with probability density p v , the densities of components of v are pv,,i = 1,2, ...,n . Then n pv n the KLID between p v and i=l ' is defined as: where E, denotes taking expectation with respect to p v . I(v) can be rewritten in terms of entropy: I(v)= f H ( V i ) -H(v) ...(2) i=l The quantity I ( v ) provides a measure of independence of the components of v, and is the difference between the information of the vector and the sum of information of its components. It is always positive and vanishes if, and only if, pv = n pv,,i.e. the n (=I components are mutual independent. Little dependence of the components of v implies little redundancy in the information supplied by v. Model Reduction via Maximizing Independence Problem Statement Consider a linear time-invariant, stable stochastic system modeled by: & ( t ) = Ax(t) + Bw(t) y ( t ) = Cx(t) + v(t) ...(3) where x ( t ) E R " , w ( t ) E R", y(t), v(t) E R P, A, B, C are constant matrices with appropriate dimensions. S is defined as &(t) = x(t + 1) , t E Z (2 denotes the set of integers) for discrete time system; or as &(t) = i ( t ) , t E R (R denotes the set of real numbers) for continuous time system. w ( t ) and v(t) are mutually independent zero-mean white Gaussian random vectors with covariance mattices Q and R, respectively, and uncorrelated with x(0). To approximate the system described by the hll-order model (l), we wish to find a stable reduced-order model: 212 Stochastic Model Reduction by Maximizing Independence where x , ( t ) E R‘ , 1 c n , A , , B , , Cr are constant matrices with appropriate dimensions. The same processes w and Y are retained [ 1 13. In the states aggregation [2], x, (t) is referred to as the aggregated state, x, ( t ) = W t ) ...(5) I where A € R”“ is the aggregation matrix, rankA= I , and the elements of are all non-negative. From Equations (3), (4) and (5): (,‘)-I ...(6) AA=A,A Let A+ denote the Moore-Penrose inverse of A , then: A, = AAA’, B , = AB, C, = CA+ ...( 7) Several approaches to choosing the aggregation matrix have been introduced in literature. Here, we choose the aggregation matrix by maximizing the independence or conditional independence of reduced state vector measured by KLID. Method I : Maximizing Independence of the State Vector System dynamic character is defined by the structure and parameters of model. However, the “information” of the dynamics is contained in system states. The dynamic information of the “!ill order description” of Equation (3) is contained in x ( t ), while the dynamic information of the “reduced order description” as given by Equation (4) is contained in x , ( t ) . Suppose the covariance matrices of x ( t ) and x, ( t ) are n(t) = E{x(t)xT( t ) }, IT,( t )= E { x ,(t)x: ( 1 ) ) ,respectively. In this paper, we will focus on the steady state information. Because the full order system of Equation (3) is stable, its steady state covariance IT = lim,,,n(t) is the unique positive definite solution to the Lyapunov equation: A17+LMT+BQBT=0 ...(8) when Equation (3) is a continuous system. Or the unique positive definite solution to: A m T + BQBT =l7 ...(9) when Equation (3) is a discrete system. For the same reason, the steady state covariance of system Equation (4) can be defined by: I7,= lim,,n, ( t ) . Since the systems Equations (3) and (4)are Gaussian, we can define the steady state entropies of system (3) and (4)respectively, as: 1 H(x ) = !!ln(2ne) + -In d e w 2 2 ...(10) 213 Hui Zhang and You-Xian Sun I 1 I H ( x , ) =-ln(2ne) +-In d e n , =-ln(2ra?) 2 2 2 1 + -In detMUT 2 ...(11) Let us investigate the statistical dependence among the components of the steady value of x, using measure (1). Let x, =[xtI,...,x,, I T , S , , i = l , ...,1 denote the diagonal elements of 17,.Then, the measure of independence among elements of x, can be written as: 1 det(ALMT)" 1 1 ' Z(xr)=-lnnSi - - l n d e m T =-In 2 2 detAllXT 2 #=I *. where A' denotes the matrix A when its off-diagonal elements are put equal to zero. From the viewpoint of information, the approximating performance of reduced order model is determined by the amount of information of x retained in x, . When the independence among components of x, is maximized, i.e. the redundancy or duplication in the information supplied by x, is minimized, we get the highest efficiency of information retaining. So less dependence of x, implies better approximating performance. Maximizing the independence among elements of x, is equivalent to minimizing Z(x,) . Let A,, i = l , ...,I denote the eigenvalues of the correlation matrix of x, , [a,], i, j = 1, ..., 1 ,where a, = cosh(x,,,xg) . Then Equation (12) can be rewritten as: Z(X,) 1 1 . ..(1 3) = --cltlAi 2 i=l / Hence, in order to minimize I ( x , ) , we have to maximize CltlA, . It can be shown i=I [9] that I C Idi is maximum if Al, i = 1, ...,I are the 1 largest eigenvalues of the 1-1 correlation matrix of x. Then, I ( x , ) is minimum when the aggregation matrix A is chosen as: A = [ 9 , , ...,91 I' ...(14) where q l , ...,q, are the ortho-normal eigenvectors corresponding to the 1 largest eigenvalues of the correlation matrix ofx, 0 = [#" 1, i, j = 1, ..., n , #u = cosh(x,, x, ) . Now, the reduced-order model with maximum independence can be obtained from Equations (S), (7) and (14). The correlation matrix 0 (providing the basis upon which the aggregation matrix is computed) can be obtained from l7, which is the unique positive solution to Equation (8) or (9). It is obvious that the aggregation matrix given by Equation (14) satisfies the conditions of rankn = I and the elements are all non-negative. of (MT)-I 214 Stochastic Model Reduction by Maximizing Independence Method 2: Maximizing the Conditional Independence of State Vector Another concept of independence adopted in ICA is the conditional independence [11, which can be also measured by KLID. In ehrs section, we will firstly focus on the conditional information of state vectors of full and reduced order models when system outputs are observed, and then deduce another method of model reduction based on the principle of maximizing conditional independence. We assume that the system discussed in this section is discrete time, and set r=k, k E Z . The results for continuous time systems can be obtained in the same way. In order to discriminate this method from the notation of method 1, we set the aggregation matrix as V in this section, then: x , ( k ) = V x ( k ) , A, = V A V + , B , =VB, c, = C V + ...( 15) Suppose we have a certain sequence of output observations of the true system, Y' = {y(1),y(2),..., y(k)} . Suppose i ( k + 1) = E[x(k + 1) I Y' 1 is the one step ahead Kalman estimation of x(k+l) based on the given Y' (Suppose k is large enough), and Xu(k + 1) = x(k + 1) - i ( k + 1) , C(k + 1) = E[3(k + l)ZT(k + 1)) . For system (4), suppose i , ( k + 1) = E[x,(k + 1) 1 Y k ] , Z,(k+ 1) = x,(k + 1) - i r ( k + I ) , C,( k + 1) = E[%,(k + l)Xur (k T+ 1)J. Suppose Z = Iimk+,2(k) and Xu, = lixnk+mXur(k)are the steady estimation errors, and I= = lim,,,C(k) and I=, = limk+mZr(k) are the covariance matrices of steady estimation errors of full- and reduced-order models of Equations (3) and (4), respectively. It is known that Z satisfies the following Riccati equation: 1= A(C - X ' ( C X T + R)-'CC)AT + BQBT ...( 16) Let ' I = {y( l), y(2),...,y(c0)). From the estimation and information theory [3] we get the conditional entropies of the full and reduced states, respectively: n 1 H ( x 1 Y ) = H ( Z ) =-ln(2lce) +-1ndetZ 2 2 1 1 H ( x , I Y)= H ( 2 , ) =-ln(21ce)+-LndetEr 2 2 ..,.(17) ...(1 8) Let E,= [F,, ,..., ?,I' , g, ,i = 1, ..., 1 denote the diagonal elements of C,. Then the conditional independence among components of x, with given Y is measured by: 1 ' 1 Z(x, I Y ) = - l n n g i --lndetVWT 2 i=l 2 .. ,. (19) Maximizing the conditional independence of x, is equivalent to minimizing I(x, 1 Y). It can be concluded in the same way as in method 1 that, in order to maximize the conditional independence among the components of reduced state vector, we have to set the aggregation matrix as: 21s Hui Zhang and You-Xian Sun ... (20) where p , , ..., p, are the ortho-normal eigenvectors corresponding to the I largest eigenvalues of the correlation matrix of Z , Y =[p,,],i,j = 1,..., n , qu= cosh(?!, Y j ). The reduced order model with maximum conditional independence can be obtained using Equations (15) and (20). The correlation matrix !P (providing the basis upon which the aggregation matrix is computed) can be obtained from Z , which is the unique positive solution to Equation (16). It is obvious that the aggregation matrix given by Equation (20) also satisfies the conditions of rankV = I , and the elements of (WT)-'are all nonnegative. In the case of continuous time system, the computation of reduced-order model by maximizing conditional independence are the same as in the case of discrete time system, although the Riccati equation is different. Analysis and Discussion In this section it is also assumed that the system under discussion is discrete time. The results for continuous time systems can be obtained in the same way. Stability A common property of these two methods is that the deduced reduced-order models preserve the stability of the original model. It is well known that model (3) is stable if, and only if for any positive definite matrix W, there exists a positive definite solution P to the Lyapunov equation: APA* + w = P ...(21) For AA = A, A , we obtain the following equation by multiplying Equation (2 1) left with A and right with AT , respectively: A, MA' A,' i- A WA' ...(22) = APA* Let A' =[AT )I,+,..)I 1' , where y ,+,,..., q are the ortho-normal eigenvectors corresponding to the n - 1 smallest eigenvalues of the correlation matrix 0 . It can be seen that M A T and AWAT are the I x 1 matrices in the top left-hand comer of A'P(Af)' and A'W(Af)' , respectively. Hence, MA' and AWAT are positive definite if P and Ware positive definite. With Equation (22) we can conclude that, when full order model is stable, the reduced order model from method 1 is also stable. The conclusion for method 2 is the same. I II On the Unstable Systems However, there is a difference between these two methods. The aggregation matrix in method 1 can be computed only when the steady state covariance of the fkll order system exists. Therefore, method 1 is applicable only when the full system is stable. 216 Stochastic Model Reduction by Maximizing Independence In method 2, the covariance state estimation error 6 ( k ) will converges to a constant matrix 2’ when the system is stable. However, even in the case of unstable systems, the steady covariance X also exists and satisfies the Riccati equation (16) if the pair [A, C‘j is observable and the pair [A, HJ is stabilizable, where HH’ = Q . Hence, method 2 is applicable for both stable and unstable systems. Illustrative Examples Several examples have been successfully employed to illustrate the efficiency of the present methods. The foliowing two examples are presented. Example 1. The original 8-order stable discrete-time model [8]. A: G(Z) - 1.682’ +1.116z6- 0 . 2 1 ~+0.1522‘ ~ -0.5162’ -0.2622’ +0.044~-0.006 8,’ -5.0462’ - 3 . 3 8 ~+0.63t5 ~ -0.4562‘ +1.5482’ +0.786~’ -0.132~+0.018 Before applying the methods in thls paper to this model, we transformed the transfer function G to state space model as given by Equation (3), where noises w and v are assumed to be mutual independent Gaussian sequences with covariance Q=R=l. Applying method 1 to system A, we get a 3-order model as: ’ 0.1933 Z * - 0.1606 z + 0.01 141 Gr’(z) = z’ - 1.923z’ + 1.0492 - 0.08672 Figure 1 shows Bode plots of model A and A,, (the frequency responses are only plotted for frequencies smaller than the Nyquist frequency). The 3-order reduced-order model from method 2 is: 0.16152’ -0.1253~+0.005995 At’ : Gr’(z)= 2’ -1.798~’+0.89032 -0.05204 Figure 2 shows Bode plots of model A and A,*. Simulation results of this example illustrate the efficiency of the present methods for stable systems. It can be seen that the reduced order models A,, and Ar2 are also stable, and possess good approximating performances. Example 2. To illustrate the efficiency of the maximum conditional independence method in the case of unstable systems, we consider here a continuous-time unstable model described by Equation (3) with parameters as [ 151: 21 7 Hui Zhang and You-Xian Sun Figure 1. Bode plots of models A (solid line) and A,, (dashed line). Figure 2. Bode plots of models A (solid line) and Arl (dashed line). B: I - 0 0 0 0 -114 1 0 0 0 -86 A= 0 1 0 0 35 , B = [ 7 0 114 55 12 1]T, 0 0 1 0 0 0 0 1 c=[O 0 0 0 11. 3 -5 - The noises w and v are also assumed to be mutual independent Gaussian processes with covariance Q and R, respectively, Q = R =1 , Poles of model B are 218 Stochastic Model Reduction by Maximizing Independence Figure 3. Singular value plots of B (solid line) and B, (dashed line) B,. (2.0565k1.4622i,-3.9435f 1.7012i,-1.2261). In this example, we examine the approximating performance of the reduced-order model by comparing its singular value to that of the original model. The obtained approximating model with 3-order is: 1 35.3574 54.9201 - 62.2864 - 120.2451 19.5113 32.2019 -36.2318 x , ( t ) + -4.6853 w ( t ) 39.8363 61.6846 -71.4941 -72.60871 ’ y ( t )= [0.3861 0.6047 - 0.6965)ur(t)+ ~ ( t ) [ Poles of B, are {2.9232?1.3322i,-9.7813}. Figure 3 shows the singular values of the 111-order model B and the reduced-order model B,. This example illustrates the usefulness of method 2 in the case of unstable systems. Conclusions By analyzing the information and conditional information descriptions in a state space model of linear stochastic systems, this paper proposes two model reduction methods, with criteria of maximum independence and maximum conditional independence, among the reduced state vector, respectively. These methods are based on state aggregation. Simulation results illustrated the efficiency of the present methods. It was demonstrated that when the original model is stable, the reduced order models are also stable. In addition, the maximum conditional independence method is not only applicable to stable systems, but also applicable to unstable systems. 219 Hui Zhang and You-Xian Sun Acknowledgement This work is supported by China 973 Project (No. 2002CB3 12200). References 1. 2. 3. 4. 5. 6. 7. 8. 9. 10. 11. 12. 13. 14. 15. 16. Akaho, S. 2002. Conditionally independent component analysis for supervised feature extraction. Nmrocomputing, 49: 139-150. Aoki, M. 1968. Control of large-scale dynamic systems by aggregation. IEEE Trans. on Automatic Control, 13: 246-253. Caines, P.E. 1988. Linear Stochastic Systems, John Wiley, New York. Comon, P. 1994. Independent component analysis, a new concept? Signal Processing, 36: 287-314. Darbellay, G.A., and Wuertz, D. 2000. The entropy as a tool for analyzing statistical dependences in financial time series. Physica A, 287: 429-439. Desai, U.B., and Pal, D. 1984. A transformation approach to stochastic model reduction. IEEE Trans. on Automatic Control, 29: 1097-1100. Geweke, J. 1982. Measurement of linear dependence and feedback between multiple time series. J. American Statistical Association, 77(378): 304-3 13. Kangsanant, T. 1988. Model reduction of discrete-time systems via power series transformation. Proc. 1988 International Conference on Control, 13-15 April, Oxford, UK:pp.241-252. Kapur, J.N., and Kesavan, H.K. 1992. Entropy Optimization Principles with Applications. Academic Press, Inc., U K pp.215-216. Kullback, S. 1959. Information Theory and Statistics. John Wiley, New York. Leland, R. 1999. Reduced-order models and controllers for continuous-time stochastic systems: An information theory approach. IEEE Trans. on Automatic Control, 44: 1714-17 19. Tian, Y.C. 1993. Applications of Information Entropy in Nonlinear Systems. PhD thesis, Zhejiang University, P.R. China. Tjtimstrom, F., and Ljung, L. 2002. LI model reduction and variance reduction. Automntica, 38: 1517-1530. Wagie, D.A., and Skelton, R.S. 1986. A projection approach to covariance equivalent realizations of discrete systems. IEEE Trans. on Automatic Control, 31: 1114-1120. Xiao, C.S., Feng, Z.M., and Shan, X.M., 1992. On the solution of the continuous-time Lyapunov matrix equation in two canonical forms. IEEE Proceedings-D, 139(3): 286-290. Zhang, H. 2003. Information Descriptions and Approaches in Control Systems. PhD thesis, Zhejiang University, P.R. China. Received 14 November 2003; Accepted after revision: 17 August 2004. 220

1/--страниц