INTERNATIONAL FOR NUMERICAL METHODS IN ENGINEERING, VOL. 39,3933-3951 (1996) AN ITERATIVE SOLUTION SCHEME FOR SYSTEMS OF WEAKLY CONNECTED BOUNDARY ELEMENT EQUATIONS K. DAVEY AND I. ROSINDALE Department of Mechanical Engineering, UMXST, PO Box 88, Manchester M60 IQD,U.K. SUMMARY In this paper an iterative scheme of first degree is developed for solving linear systems of equations. The systems investigated are those which are derived from boundary integral equations and are of the form 1 Hijxj = c i , i = 1,2, . . . , N, where Hij are matrices, x j and ci are column vectors. In addition, N denotes the number of domains and for i # j , Hij is considered to be small in some sense. These systems, denoted as weakly connected, are solved using first-order iterative techniques initially developed by the authors for solving single-domain problems. The techniques are extended to solve multi-domain problems. Novel solution strategies are investigated and procedures are developed which are computationallyefficient. Computation times are determined for the iterative proceduresand for elimination techniques indicating the benefits of iterative techniques over direct methods for problems of this nature. KEY WORDS linear systems; indirect methods; boundary elements 1. INTRODUCTION Techniques for solving systems of linear simultaneous equations can be categorized as either direct or iterative. An iterative method can be computationally more efficient than a direct method provided the scheme converges and the rate of convergence is sufficiently rapid. Moreover, if the solution of a system of equations is approximately known, which is often the case for physical problems, then the number of iterations required can be small. Fast convergence with knowledge of the solution makes iterative procedures substantially more attractive than elimination techniques. A number of authors’P2 have discounted the use of stationary iterative schemes, such as G a u ~ s - S e i d e l , ~and , ~ Richardson,’ for the solution of systems whose coefficients are formulated from integral equations. This is due to the nature of the resulting coefficient matrix in boundary element analysis, which is unsymmetric and there is no guarantee that it is either diagonally dominant or positive definite. However, some success has been achieved by transforming unsymmetric boundary element systems into equivalent symmetric’ or equivalent row diagonally dominant6 systems. Unfortunately, in such cases, the cost of employing the transformation algorithm was found to be prohibitively high. In general, direct methods such as Gauss elimination are preferred for solving systems of boundry element equations. * Iterative techniques with which a measure of success has been gained include the Lanczo~,’~” Conjugate Gradient,9*3*4Bi-Conjugate Gradient,’. l o and Generalised Minimal Resid~al,’~ methods. A number of semi-iterative techniques have also been developed.’’.’3 ’ CCC 0029-598 11961233933-19 0 1996 by John Wiley & Sons, Ltd. Received I7 November 1994 Revised 3 January 1996 3934 K. DAVEY AND 1. ROSINDALE + The authors have examined the first-order stationary iterative scheme14 E x f k + ' )= D x ' ~ ) d, where E and D are square matrices. It was shown that by considering matrix D embedded in a vector space15 and reducing its size with respect to a chosen matrix norm, that convergence rates can be successfully improved. Equation ordering and parameter matrices were used to reduce the size of D. However, only single-domain problems were considered. In multi-domain BEM analysis, the resulting coefficient matrix is unsymmetric and block banded in s t r ~ c t u r e . ~Techniques ,'~ which are used for solving such systems often take advantage of the numerous blocks of zeros which occur within the system matrix to reduce storage requirements and computation. Efficient techniques for solving blockbanded systems have been developed by CrottyI6 Tomlin," Lachat et a1.,'8*l 9 Das," Kane," and Stabrowski." Unfortunately, these techniques rely heavily on Gauss elimination. In this paper, systems of boundary integral equations of the form H i j x j = ci, i = 1,2,. . . ,N are investigated, where H i j are matrices, xi and ci are column vectors. In addition, N denotes the number of domains and for i # j , H i j is considered to be small in some sense. Furthermore, H i i is assumed to be a square invertible matrix. Systems of this type result if the solution domain is divided into regions (subdomains) where large changes in the field variable in one domain do not cause similar magnitudes of change in others. Examples of problems of this type frequently occur in engineering and applied physics. Heat transfer between regions through a resistive barrier is one example. Deformation of blocks connected together by a relatively pliable material is another. The iterative schemes considered are of the form, E i x ? + l ) = D i x p + di, where Ei and Di are square matrices and di and x i are column vectors, with i indicating the domain operated on. A number of solution strategies are considered. One approach is to use Gauss-Seidel for iteration between the domains and the first-order scheme developed by the authors14 to solve the system of equations for each domain. Moreover, a scheme that incorporates the features of the first-order scheme and the block Gauss-Seidel procedure is considered. This generalized scheme is shown to give higher rates of convergence for the examples considered. Although it is recognized in the literature that the order in which equations and domains are processed affects convergence, no simple tests are available to determine the optimum order prior to the application of the solution procedure. Consequently, in this paper domain order is investigated, and a method for determining the near optimum domain order is described. The philosophy behind the domain ordering procedure is an extension of that utilised by the authors to determine equation order for single domain zy= problem^.'^ The mathematical foundations upon which the first-order scheme developed by the auth01-s'~is based, are presented. Proofs are provided which indicate that the matrix norm minimization strategies of equation ordering and parameter matrix ~election,'~ ultimately, reduce the number of iterations required for convergence. These proofs reinforce the validity of the tests for the determination of near-optimum domain ordering. The strategy for parameter matrix selection is discussed in detail and the structure of the optimum parameter matrix, i.e. the matrix that enables convergence in one iteration, is introduced. The structure of the parameter matrix effectively limits the minimization strategies and the implications of this are discussed. The iterative procedures are tested on a boundary element model that is used to predict the temperatures on the surfaces of a die in the pressure die-casting process. Solution times are obtained for the iterative procedures and for elimination techniques. 3935 AN ITERATIVE SOLUTION SCHEME 2. BACKGROUND THEORY The system of equations considered can be represented in matrix form where H is a non-singular square matrix, and x and c are column matrices. It is assumed that H has unit diagonal coefficients which can always be arranged by row scaling. Consider the multiplication of H with a non-singular square matrix w and the decomposition of H to give w(1- L - U)X = wc (2) where U and L are strictly upper and strictly lower triangular matrices, respectively. Equation (2) can be manipulated to give an iterative procedure of the form (I - WL)X(k+1) = ( I - w + wU)x'k' + wc (3) where k denotes the kth iterate. Note that if w is set equal to I then (3) reduces to the Gauss-Seidel procedure. Furthermore, if w is set equal to PI, where is a parameter, then (3) reduces to a procedure denoted as successive over or under relaxation3 depending on the numerical value of the parameter. Consider replacing x ( ~with ) x + dk)in (3), where x is the exact solution and E is the error. This gives where M is known as the iteration matrix. A necessary and sufficient condition for the convergence of (3), for all vectors c, is that the eigenvalues of M have absolute values which are less than i.e. the spectral radius of M , p ( M ) , is strictly less than one. 3. GENERAL CONCEPTS In this section, the criteria for the determination of the parameter matrix w is discussed along with the overall strategy for improving the convergence of iterative scheme (3). The basic idea is to choose the parameter matrix to reduce the size of the matrix I - w wU subject to the restriction that I - wL is non-singular and of a form that is easily invertible. In fact, it is desirable, provided this later restriction is attained, and where possible, to select w so that I - w wU = 0. This is because equation (3) then reduces to + + ( I - WL)X(k+l)= wc (5) and consequently convergence is assured in one iteration. However, it should be emphasized that it is essential that the matrix I - wL is non-singular and of a simpler form than H , making it computational more desirable to solve (5) rather than (1). In practice, it is impractical to select w so that I - w + wU = 0 as this would require inversion. Consequently, w is selected to reduce the Frobenius norm,15 /I I - w + wU lIF. The reason for using the Frobenius norm, rather than another norm, is that it is differentiable everywhere. In the propositions that follow, the form of w is not restricted unless explicitly stated. Proposition 1. If H is non-singular and the Frobenius norm Ill I - woL is non-singular. - wo + woU I I F = 0, then 3936 K. DAVEY AND 1. ROSINDALE Proof: III-oO+WOUIIF=o~I-~O+WO[l=o * I =oo(I - U ) (6) =I - W O L = o o ( I - u - L) = o o H (7) Equation (6) implies that wo is non-singular since deto, = det(I - U)-'= 1. Moreover, I - o o L is equal to the product of two invertible matrices and so is non-singular. 0 This proposition ensures that equation (5) is solvable with a unique solution. The parameter matrix oo= (I - U)- is considered to be the optimum since I - wo + ooU = 0 and equation (3) converges in the minimum number of iterations possible, i.e one. Note that the optimum matrix oo is an upper triangular matrix with ones on the diagonal. This is an important observation as it provides some restriction on the choice of w . ' Proposition 2. If H is non-singular, there exists a positive real number 6, with II I - o < 6, such that I - W Land w are non-singular. + wU \IF Iw i jI along with the The following proof makes use of the m -norm /I w 11 = max, inequality IIoII, < n1/211wIIF where o is an n x n matrix. Moreover, use is made of the continuity of the determinant function det: M,(R) + R, where M,(R) is the set of n x n matrices with real coefficients. Proofi Consider ll~--ollm= l l ~ - ( I - ~ ) - ' I l m = ll(I - o ( I - U ) ) ( I - u)-' [Im < II(I - U)-I IIm I I I - 0 + IIm < nl/'ll(I - U ) - ' )Im11 I - o + wUIIF (8) Moreover, consider I1( I - W L ) - ( I - W O L ) I1m = I1(0- @ O W I1m < n"211L/Im I I ( I - U)-'Ilm 1 1 1- a + WUIIF (9) The inequalities indicate that as 11 I - w + wU 11 + 0, w + wo and I - OL + I - woL = o o H . In addition to this det w + det wo = det(I - U ) - = 1 and det (I - wL) + det (woH)= det wo det H = det H.Since det is a continuous function there exists an open q-ball(&-neighbourhood) centred at oo such that d e t o # 0. Likewise an open &'-ball centred at woH so that det(1 - oL)# 0 exists. It immediately follows that 6' exists with 11 I - o oU IIF c al, so that ( 1 o - oo11 < c 1 with o non-singular. In addition, B z exists with 1) I - o + wU 11 < d2, so that II(I - oL)- o o H llm < c2 with (I - oL) non-singular. Thus, specifying 6 = min(bl ,d2) completes the proof. 0 + This proposition indicates the desirability of choosing the parameter matrix to reduce the norm 11 I - o + oU / I F since this ensures that, provided the norm is sufficiently small, I - oL is non-singular. 3937 AN ITERATIVE SOLUTION SCHEME + Proposition 3. If H is non-singular, there exists a positive real number E with I( I - w oU I( F < E so that equation (3) converges. Proof. Convergence of ( 3 ) is assured if the spectral radius p ( M )of the iterative matrix M is less than 1, i.e. p ( M ) < 1. Gershgorin's theorem pro-ides the inequality, p ( M ) < 11 M Il,,where 1 I mijl- Thus, II M 11 a = max, < i < n p ( M ) < IIMll, = ll(I - w L - I ( I - w + WU)//, < ll(I - wL)-'IIm I l ( I - 0 + wU)IIm < n1I211 ( I wL)- 11 11 (I - w + oU)IIp (10) - Moreover, note that ~ ~ ( I - w L ) - ' l=~ m ll(z-wL)-' - ( I - w o L ) - ' < Il(I-wL)-' +(I-WoL)-'/l, -(I-woL)-'Ilm + ~ ~ ( I - - o ~ L ) - l ~ l m (11) Proposition 2 ensures that, S2 exists with 11 Z - w + wU 11 < Sz,so that II(I - wL) - woH II < t Zwith (I - wL) non-singular. Thus, for 11 I - o + wU 11 < Szthere exists a &-ball centred at 00 I H - within which (I - wL)- is non-singular. It follows that the uniform boundedness p r i n ~ i p l e 'can ~ be applied to inequality (11) to give Il(1 - OIL)-'I l m < P where P is a positive number independent of 0. This inequality holds over a c3-ball centred at 00 H - ' with E~ < $. Note, for convenience, it is assumed that the g-ball and &-ball coincide as this can always be made possible either by decreasing Sz or increasing P. Hence, specifying E = min(dl, e 3 ) where c3 = (n'/'P)-' provides the inequality ' ' ' p(M)< n'/'PE <1 (12) Hence, the spectral radius is less than 1 and convergence is assumed. [7 Proposition 4. I f H is non-singular and (3) converges in K iterations with K > 1 then there exists a positive E with )II - o oU 11 c E so that equation ( 3 ) converges in L iterations with L c K . Proof. The proof of this proposition follows immediately from Proposition 1, since if the form of w is unrestricted it can be made equal to wo = (I - U ) - l and convergence can be achieved in one iteration. 0 + Propositions 1-4 indicate the desirability of reducing 11 I - w + W UIIF. The following two sections examine strategies for the reduction of 11 I - w + wU 11 with the provision that I - oL is a lower triangular matrix. 4. EQUATION ORDERING In this section, the idea of using equation and column ordering of the original matrix H to reduce 11 I - w + wU / I F is examined. Note that after ordering the parameter matrix is chosen to minimize / / I - w + wU/IF and if w = I, I - o + w U = U, and the following inequality holds: 1 1 1- (3 + W u l l F < 11 UIlF n 11 u 1, (13) The basic idea is to use row (equation) and column ordering to place the smaller terms of H in U to minimize either (1 U (1 or (1 U (1 m . Once the reordering is complete the parameter matrix is selected to minimize 11 I - w + oU I I F . It should be recognized there is no guarantee that 11 I - o + wU // will be smaller with reordering but this is the most likely outcome. The authors 3938 K. DAVEY AND 1. ROSINDALE examined the effects of reordering in Reference 14 where it was found, for the test-cases considered, that changing rows and columns to minimize 11 U 11 gave the best result. The mechanics of the reordering scheme is quite simple and is described in detail in Reference 14. It should be recognized however, that the ordering procedure may not give rise to the smallest value of 11 U 11 achievable. However, the procedure is simple to implement and computationally efficient and therefore recommended by the authors. 5. PARAMETER MATRIX The class of parameter matrices considered here are restricted to be of the upper bidiagonal form w = (wij), where wii+ are arbitrary, wii = 1 with all other coefficients set to zero (14) Note that this matrix is of the same form as the optimum matrix w o , i.e. an upper triangular matrix with ones on the principal diagonal. This is the only form for which I - wL is a lower triangular matrix, thus enabling back substitution. The fact that L is strictly lower has allowed terms wii+ i = 1,. . . ,n - 1, to be included. The basic idea behind the selection of the parameter matrix is to allow further reduction of the iteration matrix M. The authors demonstrated that ordering no minimize 11 U ( 1 and selecting the parameter matrix to minimize 1) I - w + wU 1) gives the best results for single-domain pr0b1ems.l~The coefficients wii+ are therefore chosen to minimize I( I - w wU I(F , subject to certain constraints. In particular, the choice of o should ensure that w - l and (I - wL)-' exist. Consider then, the minimization of 11 I - w + wU .;1 which is of the form'' + 1 n- 1 (wii+l III-o+wUIIE= n-1 +hii+112+ i= 1 n C C (hij+uii+lhi+Ij)' i=l j=i+2 (15) Since (1 I - w + wU l I F is a strictly convex function,23it has a unique absolute minimum. This can be determined by setting dl11 - w + WUII$/~O,,+~ = 0, u= 1,2,. . . , n - 1 (16) This gives rise to the following set of equations: n (wvv+ 1 + hue+ + 1 (hvj + 1) I h v + l j ) h o + l j = 0, = 1 > L *. . , n - 1 (17) j=v+2 These equations can be used to determine w, +j=v+2 -(huu+l ow+1 = 1 + 2 + 1, i.e. hu+ljhvj , u = l , 2 , . . . ,n - 1 hv2+,j j=0+2 The success of scheme (3) depends upon on the ability of the equation ordering procedure and parameter matrix (14) to minimize sufficiently (1 I - w + oU (1 to enable convergence. 6. MULTI-DOMAIN PROBLEMS The three domain problem (see Figure 1) is used as a vehicle for the development of the iterative procedures. The procedures are, however, general and easily applicable to a problem with an 3939 AN ITERATIVE SOLUTION SCHEME Figure 1. Abstract three domain problem arbitrary number of domains. The system of equations for the three domain system can be represented in matrix form Hx= [ H11 H12 H13 H21 H22 H23 H31 H32 H3] [11[I:] = where x l , x2 and x3 are column vectors representing the unknowns in domains 1,2 and 3, respectively. Note, the off-diagonal submatrices H i j ( i # j) contain the coefficients which represent the connectivity between the domains, and in general are not fully populated, consisting of bands of zeros.9s16From a computational viewpoint it is undesirable to store these matrices in two-dimensional arrays. However, it will be shown that the iterative schemes will allow the coefficients of the off-diagonal matrices to be stored in a compacted form in a single-dimensional array or on a computer storage medium. The diagonal matrices Hii are unsymmetric and fully populated and there is no guarantee that they are either diagonally dominant or positive definite. These matrices can be stored in two-dimensional arrays. 7. BLOCK GAUSS-SEIDEL ITERATION Manipulating equation (19) gives a block Gauss-Seidel procedure of the form where k is the number of ‘block’iterations. This number represents how many times each block of equations has been processed. Replacing x ( ~with ) x + dk),where x is the exact solution and E is the error gives 3940 K. DAVEY AND I. ROSINDALE where MGSis the iteration matrix. It is clear on examination of equations (20) and (21) that if H12, and H23 are equal to zero matrices then convergence would be obtained in one (block) iteration. In practice, this would only occur if the domains were totally disconnected. However, if the domains are weakly connected then H12,H13 and Hz3 will be close to zero matrices, and this would possibly result in convergence in a small number of iterations. Another consideration is the order of the submatrices of H. For example, the scheme H13 H22 0 I[:: ;: ( k + 1) %,I[rrj =[!:I- 0 H21 : H23 X2 (22) H;3][J may give a different rate of convergence from (20). In general, the reordering of the matrices will give different performances. A simple test is required to determine the optimum order in advance of application of the iterative procedure. The philosophy behind the domain ordering approach adopted is an extension of that utilized to determine equation order for single-domain system of equations. Domain ordering is used to place the smaller off-diagonal submatrices in the upper part of H, in an attempt to minimize the norm Note, the diagonal matrices, H i i , are assumed to have their diagonal terms set to one. This provides some normalization to combat the effects of scaling. It is apparent that each block iteration requires the inversion of fully populated matrices H l l , HZ2and H33.A direct or an iterative scheme could be used to invert these systems. The scheme adopted in this paper is the iterative technique developed by the authors for solving single-domain problems14 and described here briefly in Sections 2-5. The reason for choosing this technique is that when tested on the single-domain boundary element systems, the scheme outperformed all other solution techniques considered. l 4 8. EXTENSION OF FIRST-ORDER SCHEME Consider the first row of equation (19), i.e. Hllxl = - H]2Xz - H13X3 + c] Multiplying this equation with a parameter matrix w1(14) gives 3941 AN ITERATIVE SOLUTION SCHEME Equations (26)-(28) can be conveniently expressed in matrix form, i.e. i = 1, 2, 3. The parameter matrices ol,02,and o3are chosen to minimize )II - mi+ oiUiiJIF, Thus, the same parameter matrices and equation ordering procedures are used for (29) and (20). However, in an attempt to reduce the number of block iterations the domains could be ordered to minimize However, note that 0 0 ~l~['"'~Olull I - w2 0 + 0 W2U22 1--3+03U33 1'1 so minimization can be achieved by domain ordering to minimize equation (23) and selecting the parameter matrices to minimize 11 I - oi -twiUii11 F, i = 1,2,3. 3942 K. DAVEY AND 1. ROSINDALE The differencebetween schemes (29) and (20) is that, for scheme (29), the coefficients of x are updated by successive corrections, whereas for scheme (20)they are corrected on a block by block basis. This observation suggests that a more general approach could be devised which incorporates the two procedures. 9. GENERALIZED MULTI-DOMAIN ITERATIVE PROCEDURE An iterative scheme for the solution of equation (19) which incorporates the block Gauss-Seidel and the extended stationary scheme is as follows: ( I - 0 1 L l l ) x 1( k . P + l ) - (I - 01 + (kp) ~ 1 U l l ) x l - C O ~ H ~ ~X ~F1 H’ ~1 3)~ : ” ’ ”+’ ~ 1 ~(32) 1 wherep=0,1,2, . . . , P - l a n d x l ( k + 1.0) - x ( 1k , P ) , + w 2 ~ 2 2 ) (xk .2d - w z H 2 1 x (1k + 1 . 0 ) - 0 2 H z 3 X ( S * 0 ) + w 2 c z - x (2k . Q ) , where q = 0, 1,2,. . . ,Q- 1 and x (2k + 1 . 0 ) (k.q+l) = Q , r + 1) - (I - w 3 (I - wzL22)xz (I - w 3 L 3 3 ) x 3 (I - 0 2 (33) + 0 3 U 3 3 ) x 3 ( k . r ) - w 3 H 3 1 X 1 ( k + l , O ) - 0 3 H 3 2 X (2k + 1 , 0 ) + w 3 c 3 (34) - X (3k , R ). where I = 0, 1,2,. . . ,R - 1 and x (k+1.0) 3 The integers P, Q and R indicate the number of iterations performed on each individual block of equations (inner iterative loop), and k denotes the block iterate (outer iterative loop). In the first instance x ? * ~ ) , and ~(30”) are set to initial trial values. Note, If P, Q and R are set equal to one the extended first-order procedure is obtained. If these integers are not specified and equations (32)-(34) are allowed to converge then the block Gauss-Seidel procedure is obtained. An error analysis of these equations yields E ( k . ~ + l= ) M ~ & ~ . P) 1 E(k.q+1 ) 2 =M E(k,d 2 2 - + ( I - o l L , 1 ) - ’ w l ( H 1 2 E : k ~H o )1 3 dkV0) 3 (k+1,0) + (I -w z ~ z z ) - ’ w z ( H 2 1 & 1 H &(k.O) 23 3 (35) (36) where Mi = (I - wiLii)-’ (I - oi + wi Vii), i = 1,2, or 3. Performing, P,Q, and R iterations on equations (32)-(34), respectively, gives 3943 A N ITERATIVE SOLUTION SCHEME Equations (38)-(40) can be expressed in matrix form I \i=o 0 0 I 0 1 where Pi = (I - uiLii)-', i = 1,2, or 3. Thus, for the iterative procedure defined by (32)-(34) the domains should be ordered to minimize IV ,I 0 P 1 - [il Mi) PlwlH12 i=O MP \I 0 li, Again minimization can be achieved by domain ordering to minimize equation (23) and selecting the parameter matrices to minimize I/I - mi + uiUiiI/F, i = 1,2,3. 10. ELIMINATION TECHNIQUES A direct method commonly employed for solving linear systems of algebraic equations is Gauss elimination. The technique involves using row operations to reduce the system matrix H into an upper triangular form. The solution is then obtained by simple back substitution. Partial (row) pivoting is usually employed in the forward elimination stage to enhance the stability of the scheme. The operations count for this procedure is 0(2n3/3)floating point operations (flops) for the forward elimination, plus O ( n 2 )flops for the back substitution, where n is the number of unknowns. A direct technique often employed in cases where the right-hand side vector c is changing, but the coefficient matrix H is constant, is LU decomposition. This procedure involves factorizing the coefficient matrix H into a lower triangular matrix L, and an upper triangular matrix U.If Gout's methodz4is used to perform the factorization, then the coefficients of H can be replaced by those of L and U , and thus no extra storage penalty is incurred. The factorized system can be written in matrix form HX = (LU)X= L ( U X )= Ly = c (43) 3944 K. DAVEY AND 1. ROSINDALE Once the factorization has been performed the solution can be obtained by first solving Ly = c for the vector y, then U x = y for x. y is obtained by forward substitution; x by back-substitution. The factorization requires 0(2n3/3) flops, and the forward and back-substitutions 0(n2) flops each. Thus, for a reasonably sized linear system, consisting of n equations and n unknowns, Gauss elimination and LU decomposition are approximately equivalent in terms of the number of operations required to obtain the solution. For a multi-domain problem such as that given by equation (19), an efficient procedure would be to factorize each diagonal block Hii into an equivalent LU form, and then to use block Gauss-Seidel for iteration between the domains. This would require 0 ( 2 n : / 3 + 2n:/3 + 2n:/3) flops for the factorization, where ni is the number of unknowns in domain i, and where n = nl + n2 + n3. Each block iterative sweep would then require 0(2n: 2n$ + 2 n i ) flops for the forward plus back substitutions, plus 0(4n,n2 + 4nln3 + 4n2n3) flops for the multiplication of the off-diagonal blocks. + 11. COMPUTATION AND STORAGE This section is concerned with the implementation of the multi-domain solution schemes described in the previous sections, namely the block Gauss-Seidel, extended first-order, and generalized multi-domain procedures. Efficient methods for computation and storage and discussed. In terms of computation it is expedient to formulate matrices ( I - wiLii)and (I - oi + wi Uii), i = 1,2,. . . ,N prior to commencing iteration as this avoids the need to repeatedly formulate the coefficients of these matrices within the iterative cycle. In practice the coefficients of the diagonal blocks Hii can be overwritten by these coefficients so as to minimize core storage requirements. For the off-diagonal block matrices wiHij(i# j ) there are two options: (i) wiHij(i# j ) can be formulated explicitly, prior to commencing iteration. (ii) wiHij(i# j ) can be formulated implicitly, once per block iteration, as and when required. In this case it is expedient to first formulate Hijxy’.This vector can then be multiplied by mi. It is a simple matter to show that, it is only expedient to formulate wiHij(i# j ) prior to iteration if the number of block iterations required to obtain the solution exceeds the number of unknowns associated with domain j . Therefore, in most practical situations option (ii) will be the most efficient scheme. Also, formulating wiHij(i# j ) explicitly, prior to iteration, may also reduce the sparsity of the system, as terms may be introduced into positions in Hij(i # j ) previously occupied by zeros. This can be avoided if wiHij(i# j ) is formulated implicitly, within the iterative loop. For large problems, it may be impractical to operate solvers entirely in core as this may result in a large core memory requirement. Utilizing a backing store, such as a disk, to store coefficients greatly reduces this requirement. The block structured coefficient matrix lends itself very nicely to efficient storage schemes. One efficient method is to hold the fully populated diagonal blocks in core, but to write the coefficients from the off diagonal blocks Hij(i # j ) to disk file. These off diagonal block coefficients can be first assembled into a one-dimensional array in core, then written off to sequential file. For a N domain problem it is advantageous to use N sequential files as this ensures that the off diagonal block coefficients are easily accessible. Advantage can be taken of the sparseness of these blocks so that only non-zero coefficients need to be stored. If core memory is limited then a number of the fully populated diagonal blocks can be written off to file. The minimum core memory requirement would be an array of size capable of holding the largest diagonal block. If core memory is severely restricted it is possible to further partition the diagonal blocks to obtain submatrices of manageable size. AN ITERATIVE SOLUTION SCHEME 3945 12. RESULTS AND DISCUSSION In this section, a number of examples are tested to ascertain which of the iterative schemes considered is the most efficient, and to demonstrate the effect of domain ordering on convergence rates. The examples considered are three-dimensional potential problems. A boundary element model which models the thermal behaviour of the die blocks and component within the casting process is utilized.” This boundary element model comprises the simultaneous solution of both the integral formulations of Laplace’s and Poisson’s equation, and yields the ‘steady-state’ or time-averaged temperatures on the surfaces of the die and component over a typical casting cycle. Consider a simple two-slide die as shown in Figure 2. Convective type boundary conditions of the form q(x) = E( Ti - T ) (44) apply over the boundaries of subregions RI and Rz,the die blocks, and R3 the component. Ti is the steady state surface temperature on domain i, T is the temperature of the surrounding medium, and h is the time averaged heat transfer coefficient between the surface and the surrounding medium. Incorporation of the boundary conditions into the integral formulations of Laplace’s and Poisson’s equations and discretization of the cavity, interfaces, outer die boundaries, component, and cooling channels with surface elements yields a set of linear simultaneous equations. These equations can be written in matrix form as equation (19), with vectors xl, x2, and x3 containing the unknown temperatures at nodes on the surfaces of subregions R1,Rz,and R3, respectively. Four dies are modelled, incorporating two types of elements. The die cavity, component, and outer surfaces of the die are meshed with linear triangular elements and linear pipe elements are used to represent the cooling channels within the die blocks.’6 A fine mesh is used on the cavity surfaces and the component with coarser meshes on the outer surfaces of the dies and the cooling channels. Typical values are used for the heat transfer coefficients. The four examples considered are 2, 3,4, and 9 domain problems, respectively. The 2-zone model generates 464 equations, the 3-zone model 1039, the 4-zone model 1287,and the 9-zone model 7718. Figure 3 shows the surface meshes for the four test components. The tests were carried out on a SUN SPARCstation IPX computer. The test cases are considered for each of the models to ascertain which is the computationally most efficient scheme. For each of the cases the initial trial vector do)is set to 0 which is Figure 2. Two-slide die. Key to superscripts: M u t e r die boundary; c c 4 i e cooling channel; c-die cavity/casting interface; d d i e block interface 3946 K. DAVEY AND I. ROSINDALE Figure 3. Surface meshes for the four test components: (a) model 1; (b) model 2; (c) model 3; (d) model 4 sufficiently remote from the exact solution. The iterative procedure is terminated if the residual norm3 is less than OOOO1. The results are presented in Tables 1-V. In these tables the number of 'block' iterations are given; this number represents how many times each block of equations has been processed. The total number of iterations given is the sum of the total number of iterations performed on the equations within individual blocks. The computation time is also given. The tests were performed entirely in core. In the first set of tests the following test cases are considered (1) Extended-first order scheme. (2)-(8) Generalized multi-domain iterative procedure. Test cases (2)-(8) are variations of the generalized multi-domain iterative procedure, distinguished by the maximum number of iterations performed on each block of equations. The maximum number of iterations performed on each block is set equal to i for each test case (i), i = 1,2,. . . ,8. However, it should be recognized that for a particular block the iterative procedure may converge prior to attaining the prescribed maximum number of iterations so i will be less in this case. 3947 AN ITERATIVE SOLUTION SCHEME Table I. Performance data for model 1 Test case 1 2 3 4 5 6 7 8 9 10 11 Block iterations Total iterations CPU time (s) 46 28 21 18 17 16 16 15 15 15 92 110 125 141 160 174 191 200 282 34.94 28.36 26.82 27.34 30.02 31.45 33.76 34.31 42.79 27.25 51.50 - ~ Note: Model 1. Number of zones = 2, Zone 1 equations. Total number of unknowns = 464 = 270 equations, Zone 2 = 194 Table 11. Performance data for model 2 Test case 1 2 3 4 5 6 7 8 9 10 11 Block iterations Total iterations CPU time (s) 46 30 25 24 22 21 21 21 21 21 138 180 224 28 1 319 355 395 430 560 155.27 131.33 135.10 147.71 154.57 163.04 173.79 185.35 225.34 164.40 587.68 - - Note: Model 2. Number of zones = 3, Zone 1 = 265 equations, Zone 2 = 295 equations, Zone 3 = 479 equations. Total number of unknowns = 1039 Table 111. Performance data for model 3 Test case 1 2 3 4 5 6 7 8 9 10 11 Block iterations Total iterations CPU time (s) 39 25 22 21 21 20 20 20 20 20 156 200 263 329 398 440 486 529 673 181.61 146.34 152.83 167.27 187.74 195.95 210.95 225.25 281.91 221.27 1135.64 - - Note: Model 3. Number of zones = 4, Zone 1 = 306 equations, Zone 2 = 309 equations, Zone 3 = 510 equations, Zone 4 = 162 equations. Total number of unknowns = 1287 3948 K. DAVEY A N D I. ROSINDALE Table IV. Performance data for model 4 Test case Block iterations Total iterations CPU time (s) 55 32 28 25 25 23 21 21 21 21 495 537 677 804 955 1061 1175 1293 1685 6678-07 4757.15 4658.21 4710.22 5369.16 5429.92 5388.69 5940.27 6175.21 1312953 236575-31 1 2 3 4 5 6 7 8 9 10 11 - - Note: Model 4. Number of zones = 9, Zone 1 = 724 equations, Zone 2 = 668 equations, Zone 3 = 531 equations, Zone 4 = 530 equations, Zone 5 = 182 equations, Zone 6 = 294 equations, Zone 7 = 1651 equations, Zone 8 = 1645 equations, Zone 9 = 1493 equations. Total number of unknowns = 7718. Table V. Influence of domain order on convergence rates ~ Model 1 2 3 4 Total cpu Domain order Block iterations iterations time (s) 2, 1 L2 1, 2,3 3, 2, 1 2, 1,3 L3,2 1,4,2, 3 3,2,4, 1 2,4, 3, 1 4, 1,2,3 9,5, 1,6,2,4,3,8,7 7, 8, 3,4,2,6, 1, 5,9 1,2,3,4,5,6,7,8,9 9,8,7,6,5,4,3,2, 1 28 28 30 30 30 30 25 26 26 25 32 34 33 33 110 111 180 180 180 180 200 207 207 200 537 563 55 1 553 28.36 29.26 131.33 131.54 131.47 131.48 146.34 152-97 151.17 147.36 4757-15 5037.05 4898.74 4924.14 ~~ II'//m ~ ~ d F 8.12 6.32 11.76 8.91 17-44 11.12 2858 2034 21.27 13.98 24.75 17.48 15.36 18.44 3641 28-48 30.96 27.16 17.58 20.50 13667 92-33 229.92 130.31 167.41 105.42 197.94 11573 (9) Block Gauss-Seidel iteration. In addition, two direct techniques are considered, namely: (10) LU decomposition with forward and backsubstitution. Each diagonal block matrix H i i is factorized into an LU form using Crout's algorithm.24 Partial (row) pivoting is employed to ensure stability of the decomposition procedure. Block Gauss-Seidel is then used for iteration between the domains. Each block iteration requires one forward and one backsubstitution for each diagonal block Hii. (11) Gauss elimination with back substitution. Partial (row) pivoting is incorporated in the forward elimination stage to enhance stability. This scheme is considered for the system matrix H as a whole. 3949 AN ITERATIVE SOLUTION SCHEME The domain solution order chosen for tests (1)-(10) is the one which gives rise to the minimum oo-norm for each of the models. It can be seen on examination of Tables I-IV that for the 4 examples considered convergence is obtained for test cases (1)-(lo), and that in each case the CPU time is significantly lower than that for Gauss elimination. Test case (2), the generalized multi-domain procedure with the maximum number of iterations per block set to two, gave the shortest CPU times for models 2 and 3, the 3 and 4 zone models. Test case (3) gave the shortest CPU times for models 1 and 4, the 2 and 9 zone models. It is of interest to note that although fewer iterations were performed for test case (1) than for test cases (2) and (3) this scheme is computationally less efficient. The reason for this stems from the fact that test case (1) requires significantly more block iterations. This means that additional calculations are required with the , k ranging from 1 to the number of block multiplication of the off-diagonal blocks with x ( ~ )with iterations. Compared to Gauss elimination test case (3)was some 1.9 times faster for model 1 (464 equations) and 508 times faster for model 4 (7781 equations); test case (2) was 4 5 times faster for model 2 (1039 equations) and 7.8 times faster for model 3 (1287 equations). Compared to LU decomposition test case (3) was marginally faster for model 1 and 2.8 times faster for model 4; test case (2) was 1.3 times faster for model 2 and 1.5 times faster for model 3. Reasoning as to why schemes (2) and (3) outperformed the other iterative schemes can be obtained from consideration of the error propagation at any stage during iteration. To illustrate this consider the three-zone problem and consider the correction of a particular block of variables, say, for example, the first block, xl. Equation (32) defines the iterative correction for this block. If iteration is successful then the error c1 in x1 will be reduced. The errors in x2 (c2) and x3 (c3)are unchanged whilst the correction of x 1 takes place. It is apparent that there is a limit to the amount by which x1 can be corrected without further correcting c2 and c3. This can be seen more clearly by considering equation (38), i.e. As P + ca then MP --* 0 and (pi1 MI) (I - O1L1 ] ) - w1 + H,' i=O The first term on the right-hand side of equation (38), M P E : ~ . ' )represents , the error in x1 which indicates how well the first block of equations are being satisfied. (1;Id M i ) ( Z - w l L 1 1 ) - 1 w l ( H 1 2 ~ : k+' oH) 1 3E("') ) represents the error in x1 induced by the errors in x2, x 3 , i.e. c:kVo)and c:kso),It is clear that this error does not necessarily reduce with an increase in K ;in fact, it converges to H ; ~ ' ( H ~ ~ E : ~ S O ' H 1 3 3 1. Once x1 has been corrected the updated values are used immediately in the subsequent correction of x2 (33) and x3 (34). If the errors in x1 have been markedly reduced then this will be beneficial in the subsequent correction of x2 and x3. The results in Tables I-IV suggest that the first two-three iterations on a particular block of equations yield a relatively large correction of the variables and that the rate of correction reduces with subsequent iterations. Thus, for examples of this nature, constraining the number of iterations per block of equations is beneficial. In the examples tested a restriction of two-three iterations per block was the most efficient. This allows a much larger correction of the variables than a single iteration, whereas little benefit is gained by performing more iterations. + ~ ( ~ 9 ' ) 3950 K. DAVEY AND I. ROSINDALE The second set of tests are performed to indicate the influence of domain ordering on convergence rates. Test case (2), the generalized multi-domain procedure with the maximum number of iterations per block set to 2, is utilized for these test as this scheme gave rise to the shortest CPU times for 2 out of the 4 examples considered previously. The relevant norms are calculated via equation (23).For the two-zone model both possible zone ordering sequences are investigated, whereas for the three-, four-, and nine-zone models the zone orders which give both the lowest and highest values of the norms are used. In addition, two zone orders which give intermediate values of the norms are examined. The results are presented in Table V. It can be seen on examination of this table that for the two- and three-zone examples ordering to minimize the co-norm gave the smallest value for this norm, as expected, in all the examples tested. This minimized value was smaller than the value obtained when the domains were ordered to maximize the co-norm and also smaller than the values obtained when domain orders yielding intermediate norm values (three-zone model) were employed. However, the differences in norm size did not give rise to corresponding differences in the performances of the test cases considered. For the two-zone model the benefit gained by ordering to minimize the a-norm was marginal (one iteration). In fact, this is expected since it is a simple matter to prove that for a two-domain problem ordering will have little influence. This is because in this case ordering will not alter the eigenvalues of the iteration matrix. For the three-zone model ordering made no difference to the convergence of the iterative scheme. For the four-zone model the domain orders which gave rise to the two lowest values of the co-norm converged in the least number of iterations and the shortest CPU times. Inspection of the oo-norm for these two cases reveals that ordering to minimize this norm did in fact give the smallest value for this norm, however the difference between the two norms was relatively small. For the nine-zone model ordering to minimize the oo-norm lead to convergence in the least number of iterations and the shortest CPU time. The results obtained for the two-and three-zone models indicate that in cases where the number of zones is small there is little scope for improvement and so block reordering will not necessarily improve convergence rates. However, the results obtained for the four and nine-zone models are more encouraging with smaller norms giving improved convergence rates. It is expected that the more domains the greater the importance of domain ordering. In many respects, an N domain problem has similar characteristics to those of a single-domain problem with n equations. Both the two-domain and two-equation problem are unaffected by ordering. As n increases equation ordering can be successfully used to improve convergence. Thus, since an N domain proble reduces to an n equation problem as the number of unknown in each domain approaches one, it is expected that domain ordering is more useful for larger values of N . 13. CONCLUSIONS This paper is concerned with the development of stationary iterative schemes of first degree for solving linear systems of equations derived from boundary integral equations and of the form H i j x j = ci, i = 1,2, . . . , N , where N denotes the number of domains, H i j are matrices, x j and ciare column vectors. In addition, for i # j , H i j is considered to be small in some sense. The first-order iterative techniques initially developed by the authors for solving single-domain boundary element systems have been adapted and extended to solve multi-domain problems. An iterative scheme incorporating the block Gauss-Seidel and the extended first-order schemes and restricting the iterations per block of equations in the range 2-3 was shown to be the most efficient scheme. It has been shown that the order in which the domains are considered becomes more important as the number of domains increases, and strategies for determining optimum domain order have been developed. In one particular example, a system with 7718 unknowns AN ITERATIVE SOLUTION SCHEME 395 1 converged in a time which was some 51 times faster than that taken for Gauss elimination, and 2.8 times faster than that taken for LU decomposition. ACKNOWLEDGEMENTS The authors are grateful to EPSRC and Dynacast International Ltd. for providing financial assistance to carry out this work which forms part of a project to develop Boundary and Finite Element programs to analyse the pressure die-casting process. REFERENCES 1. W. J. Mansur, F. C. Araujo and J. E. B. Malaghini, ‘Solution of BEM systems of equations via iterative techniques’, Int. j . numer. methods eng., 33, 1823-1841 (1992). 2. R. L. Mullen and J. J. Rencis, ‘Iterative methods for solving boundary element equations’, Comput. Struct., 25, 713-723 (1987). 3. A. Jennings, Matrix Computation for Engineers and Scientists, Wiley, New York, 1977. 4. L. A. Hageman and D. M. Young, Applied Iterative Methods, Academic Press, New York, 1981. 5. L. F. Richardson, ‘The approximate arithmetical solution by finite differences of physical problems involving differential equations with an application to the stresses in a masonry dam’, Philos. Trans. Roy. SOC.London A, 210, 307-357 (1910). 6. T. J. Urekew and J. J. Rencis, ‘The importance of diagonal dominance in the iterative solution of equations generated from the boundary element method‘, Int. j. numer. methods eng., 36, 3509-3527 (1993). 7. C. Lanczos, ‘An iterative method for the solution of the eigenvalue problem of linear differential and integral operators’, Nat. Bur. Std. J . Res., 45, 255-282 (1950). 8. C. Lanczos, ‘Solution of systems of linear equations by minimized iterations’, Nat. Bur. Std. J . Res., 49,33-53 (1952). 9. J. H. Kane, D. E. Keyes and K. Guru prasad, ‘Iterative solution techniques in boundary element analysis’, Int. j . numer. methods eng., 31, 1511-1536 (1991). 10. R. Fletcher, ‘Conjugate gradient methods for indefinite systems’, Lecture Notes in Mathematics, Vol. 506, Springer, Berlin, 1976. 11. Y. Saad and M. H. Schultz, ‘GMRES a generalised minimal residual algorithm for solving nonsymmetric linear systems’, SIAM J . Sci. Statist. Cornput., 7 , 856869 (1986). 12. J. A. Bettess, ‘Economical solution techniques for boundary integral matrices’, Int. j . numer. methods eng., 19, 1073-1077 (1983). 13. M. Rezayat, ‘Fast decomposition of matrices generated by the boundary element method’, Int. j . numer. methods eng., 33, 1109-1118 (1992). 14. K. Davey and I. Rosindale, ‘An iterative solution scheme for systems of boundary element equations’, Int. j. numer. methods eng., 37, 1399-1411 (1994). 15. G . H. Golub and C. F. Van loan, Matrix Computations, 2nd edn, The Johns Hopkins University Press, London, 1989. 16. J. M. Crotty, ‘A block equation solver for large unsymmetric matrices arising in the boundary integral equation method’, I n t . j. numer. methods eng., 18, 997--1017 (1982). 17. G. R. Tomlin, ‘Numerical analysis of continuum problems in zoned anisotropic media’, Ph.D. Thesis, University of Southampton, 1972. 18. J. C. Lachat, ‘A further development of the boundary integral technique for elastostatics’, Ph.D. Thesis, University of Southampton, 1975. 19. J. C. Lachat and J. 0. Watson, ‘Progress in the use of boundary integral equations, illustrated by examples’, Cornput. Methods Appl. Mech. Eng., 10, 273-289 (1977). 20. P. C. Das, ‘A disc based block elimination technique used for the solution of nonsymmetric fully populated matrix systems’, Proc. Int. Symp. on Recent Developments in BEM, University of Southampton, 1978, pp. 391-404. 21. J. M. Kane, ‘Optimisation of continuum structures using a boundary element formulation’, Ph.D. Dissertation, University of Connecticut, 1986. 22. M. M. Stabrowski, ‘A block equation solver for large unsymmetric linear equation systems with dense coefficient matrices’, Int. j. numer. methods eng., 24, 289-300 (1987). 23. M. J. D. Powell, Approximation Theory and Methods, Cambridge University Press, Cambridge, 1981. 24. W. H. Press, B. P. Flannery, S. A. Teukolsky and W. T. Vetterling, ‘Numerical Recipes, The Art ofScientific Computing (FORTRAN Version)’, Cambridge University Press, Cambridge, 1989. 25. K. Davey and S. Hinduja,’Modelling the pressure die casting process with the boundary element method: steady state analysis’, Int. j. numer. methods my., 30, 1275-1299 (1990). 26. R. A. Adey, ‘Electrostatics problems’, in C. A. Brebbia (ed.), Boundary Element Techniques in Computer Aided Engineering, Nato AS1 Series, Martinus and Nijhopl, Dordrecht, 1984.