close

Вход

Забыли?

вход по аккаунту

?

Steady -state methods for simulation of RF and microwave circuits

код для вставкиСкачать
INFORMATION TO USERS
This manuscript has been reproduced from the microfilm master. UMI films
the text directly from the original or copy submitted. Thus, some thesis and
dissertation copies are in typewriter face, while others may be from any type of
computer printer.
The quality of this reproduction is dependent upon the quality of the
copy subm itted. Broken or indistinct print, colored or poor quality illustrations
and photographs, print bleedthrough, substandard margins, and improper
alignment can adversely affect reproduction.
In the unlikely event that the author did not send UMI a complete manuscript
and there are missing pages, these will be noted.
Also, if unauthorized
copyright material had to be removed, a note will indicate the deletion.
Oversize materials (e.g., maps, drawings, charts) are reproduced by
sectioning the original, beginning at the upper left-hand comer and continuing
from left to right in equal sections with small overlaps.
Photographs included in the original manuscript have been reproduced
xerographically in this copy.
Higher quality 6" x 9" black and white
photographic prints are available for any photographs or illustrations appearing
in this copy for an additional charge. Contact UMI directly to order.
ProQuest Information and Learning
300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA
800-521-0600
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
STEADY-STATE METHODS FOR
SIMULATION OF RF AND MICROWAVE
CIRCUITS
A dissertation
submitted by
Madeline Kleiner
In partial fulfillment o f the requirements
for the degree o f
Doctor o f Philosophy
in
Electrical Engineering
TUFTS UNIVERSITY
May 2002
© 2002, Madeline Kleiner
ADVISER: Professor Mohammed Afsar
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
UMI Number: 3042933
Copyright 2002 by
Kleiner, Madeline Ann
All rights reserved.
___
®
UMI
UMI Microform 3042933
Copyright 2002 by ProQuest Information and Learning Company.
All rights reserved. This microform edition is protected against
unauthorized copying under Title 17, United States Code.
ProQuest Information and Learning Company
300 North Zeeb Road
P.O. Box 1346
Ann Arbor, Ml 48106-1346
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A b s tr a c t
Given the growing demand for today’s wireless communication devices, a more
efficient method is needed to determine the steady-state response of circuits frequently found
in microwave and RF applications. Problems with the available methods showed the need for
a low-cost and efficient alternative for steady-state determination of nonlinear circuits.
The shooting-Newton method for steady-state determination has been implemented in
SPICE, the most popular general- purpose circuit simulator. The new steady-state analysis
option incorporates standard Gaussian elimination and different Krylov methods for solution
of the linear system generated by the shooting-Newton algorithm. The steady-state analysis
in SPICE is shown to be much faster than SPICE transient analysis when finding the steadystate solution of strongly nonlinear circuits. Krylov methods show even greater
improvements in determining steady-state for larger circuits. In addition, a mathematically
consistent description of all numerical methods involved in the shooting-Newton method
used in circuit simulation is presented.
Computational issues of the algorithm are considered. The accuracy of the shootingNewton method and its accuracy with the different solver options are provided. Some
eigenvalues are examined for example circuits. A detailed comparison of the Krylov methods
along with their convergence properties is presented. In addition, a breakdown of the Krylov
methods BiCG and BiCGSTAB has been corrected by changing the initial guess.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
A c k n o w le d g e m e n ts
I wish to express my deepest appreciation to my dissertation committee, Professors
Mohammed Afsar, Misha Kilmer, Karen Panetta, and Arthur Uhlir for their wisdom,
guidance, and friendship. I especially wish to thank Misha Kilmer for helping me understand
and use numerical methods and Arthur Uhlir for his wonderful questions and discussions on
steady-state circuit theory.
I am grateful to Keith Nabors, Joel Phillips, Baolin Yang, Dan Feng, Ilya Yusim,
Ricardo Telichevsky, Sergie Pevchin, and Ken Kundert of Cadence Design Systems for their
willingness to share their knowledge of circuit simulation with me. They allowed me to make
a greater contribution to this field and their expertise will continue to inspire me.
I wish to acknowledge and express my gratitude to Paul Draxler, Thomas Brazil, and
K.C. Gupta from the Microwave Theory and Techniques Society for their interesting and
useful discussions on various aspects of circuit simulation.
I wish to thank my husband Scott Wurcer and my four children, Josh, Jonah, Rachael,
and Claire for their assistance and patience. Thank you Jonah for all that typing and help with
the word processing monster. I also want to also acknowledge all the friends, especially
Susan Santucci and Bobbi Tsukahara, who have believed in me. I want to thank the
community where I live, Cambridge Cohousing for all the support it has given me. Lastly, I
wish to thank my mother, Betty Lawson and my brothers and sisters, Bruce, Craig, Cynthia,
and Tami for their constant encouragement.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Table of Contents
Chapter 1 ............................................................................................................................................ 2
Introduction........................................................................................................................................2
1.1
Problems with available methods.............................................................................5
1.2
New method and its advantages...............................................................................6
1.3
Organization of the thesis......................................................................................... 8
Chapter 2 .......................................................................................................................................... 10
Steady-State Determination for RF and Microwave Circuits...................................................... 10
2.1
Introduction.............................................................................................................. 10
2.2
Transient analysis in SPICE................................................................................... 12
2.2.1
Transient analysis formulation............................................................................... 13
2.3
Shooting-Newton m ethod.......................................................................................16
2.3.1
Derivation of the shooting-Newton method..........................................................17
2.3.2
Forming the Jacobian by numerical differentiation.............................................18
2.3.3
Forming the Jacobian by transient analysis..........................................................19
2.3.4
Steps in the shooting-Newton m ethod..................................................................22
2.4
Harmonic balance method...................................................................................... 23
2.4.1
Phasor analysis........................................................................................................ 24
2.4.2
Simple example of harmonic balance m ethod.....................................................26
2.4.3
Extensions of the harmonic balance method........................................................ 30
2.4.4
Other m ethods......................................................................................................... 31
2.5
Conclusion................................................................................................................32
Chapter 3 .......................................................................................................................................... 34
Numerical Methods Background.................................................................................................. 34
3.1
Introduction............................................................................................................. 34
3.2
How the system of equations are generated......................................................... 36
3.3
Direct versus iterative methods............................................................................. 38
3.4
Iterative methods background............................................................................... 40
3.4.1
Stationary and non-stationary iterative methods................................................. 41
3.5
Krylov-subspace m ethods......................................................................................43
3.5.1
Classification of Krylov methods..........................................................................45
3.5.2
Conjugate Gradient (CG) method.........................................................................48
3.5.3
The Generalized Minimum Residual (GMRES) m ethod................................... 51
3.5.4
BiConjugate Gradient (BiCG) method................................................................. 55
3.5.5
Conjugate Gradient Squared (CGS) method........................................................60
3.5.6
Quasi-Minimal Residual (QMR) method.............................................................65
3.5.7
BiConjugate Gradient Stabilized (BiCGSTAB) m ethod................................... 67
3.6
Sum m ary................................................................................................................. 69
Chapter 4 .......................................................................................................................................... 73
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Implementation of the Shooting-Newton Algorithm Using Krylov-Subspace Methods
73
4.1
Implementation in SPICE...................................................................................... 74
4.2
The Jacobian matrix for the shooting-Newton method......................................76
4.2.1
Minimizing number of sensitivity matrix formations........................................ 76
4.2.2
Damping the sensitivity matrix.............................................................................77
4.2.3
Importance of integration method........................................................................ 78
4.2.4
Eigenvalues and sparsity pattern of the Jacobian............................................... 78
4.2.5
Implicit matrix options.......................................................................................... 82
4.3
Krylov-subspace linear solver optimizations...................................................... 83
4.3.1
Stopping criterion...................................................................................................83
4.3.2
Maximum number of iterations.............................................................................84
4.3.3
Breakdown of the method..................................................................................... 85
4.3.4
Computational issues............................................................................................. 87
4.4
Sum m ary................................................................................................................ 89
Chapter 5 ......................................................................................................................................... 90
Results and Comparisons.............................................................................................................. 90
5.1
DC supply test circuit............................................................................................ 91
5.2
Buck converter.......................................................................................................95
5.3
Class AB amplifier...............................................................................................101
5.4
Class C amplifier..................................................................................................104
5.5
Comparisons......................................................................................................... 109
Chapter 6 ........................................................................................................................................I l l
Sum m ary........................................................................................................................................I l l
Appendix A .....................................................................................................................................114
SPICE Input files........................................................................................................................... 114
Bibliography.................................................................................................................................. 121
Published Papers........................................................................................................................... 129
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Tables
Table 3.1 Computational aspects of Krylov-subspace methods...............................................71
Table 5.1 Steady-state initial conditions for the DC power supply example using different
Krylov methods inside the shooting methods. (The transient analysis values are at 1.98s.)...92
Table 5.2 Time to calculate steady-state for DC power supply. The size of the sensitivity
matrix is 4 x 4 ............................................................................................................................... 94
Table 5.3 Steady-state initial conditions for the buck converter example using different
Krylov methods inside the shooting-Newton algorithm.............................................................96
Table 5.4 Time to reach steady-state for the Buck converter example to .01 decimal places.
The initial guess for BiCG, BiCGSTAB, and QMR is not zero................................................99
Table 5.5 Different initial guess for the BiCG method............................................................ 100
Table 5.6 Different initial guess for the BiCGSTAB method................................................. 100
Table 5.7 Steady-state initial conditions for the Class AB amplifier example using different
Krylov methods inside the shooting-Newton algorithm........................................................... 102
Table 5.8 Summary of Krylov-subspace methods and transient analysis results of steadystate determination for a Class AB amplifier............................................................................. 102
Table 5.9 Summary of Krylov-subspace methods and transient analysis results of steadystate determination for a Class C amplifier................................................................................ 105
Table 5.10 Relative residual for the Class C amplifier............................................................ 105
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
List of Figures
Figure 2.1 Flow chart for harmonic balance simulation from Eesof manual..........................25
Figure 2.2 Example circuit [Rodriques98]..................................................................................27
Figure 2.3 Circuit divided into linear and nonlinear subnetwork............................................. 28
Figure 3.1 Classification of Krylov-subspace method............................................................... 47
Figure 4.1 Sparsity pattern of the sensitivity matrix for an example circuit of a multiplier...79
Figure 4.2 Sparsity pattern of the sensitivity matrix for a buck converter...............................80
Figure 4.3 Eigenvalue Spectrum for Sensitivity Matrix for Class C Amplifier......................81
Figure 4.4 Eigenvalue Spectrum for Buck Converter................................................................ 82
Figure 5.1 DC supply circuit........................................................................................................ 91
Figure 5.2 SPICE input file with steady-state command........................................................... 92
Figure 5.3 Transient analysis plots for the DC power supply circuit....................................... 93
Figure 5.4 Two periods of the transient analysis when the DC supply circuit is in steadystate. The triangle marker indicates the Krylov-subspace solutions.......................................... 94
Figure 5.5 Circuit diagram for the buck converter......................................................................95
Figure 5.6 Results from GMRES run on buck converter........................................................... 97
Figure 5.7 Transient analysis plots for the two variables, voltage across the capacitor Cf, and
current through the inductor L f .....................................................................................................98
Figure 5.8 Two periods of transient analysis for the buck converter. The Krylov-subspace
methods are marked with a triangle.............................................................................................. 98
Figure 5.9 Class AB amplifier [Meyer89]................................................................................. 101
Figure 5.10 Periodic time-domain response of the Class AB circuit...................................... 103
Figure 5.11 Class C amplifier [Colon73]................................................................................... 104
Figure 5.12 Periodic time domain response for the Class C amplifier example.................... 106
Figure 5.13 Convergence behavior of the residual norm of the BiCG solution algorithm as a
function of iteration number........................................................................................................ 106
Figure 5.14 Convergence behavior of the residual norm of the BiCGSTAB solution
algorithm as a function of iteration number................................................................................ 107
Figure 5.15 Convergence behavior of the residual norm of the CGS solution algorithm as a
function of iteration number........................................................................................................ 107
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 5.16 Convergence behavior of the residual norm of the GMRES solution algorithm as
a function of iteration number..................................................................................................... 108
Figure 5.17 Convergence behavior of the residual norm of the QMR solution algorithm as a
function of iteration number........................................................................................................108
Figure 5.18 Comparison of the shooting-Newton method to standard transient analysis in
SPICE for determination of steady-state (G.E.is Gaussian Elimination)................................110
Figure 5.19 Comparison of Krylov methods toGaussian elimination for determination to the
steady-state....................................................................................................................................110
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
STEADY-STATE METHODS FOR
SIMULATION OF RF AND MICROWAVE
CIRCUITS
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Chapter 1
Introduction
The digital revolution has created unprecedented growth in the size and complexity of
RF and microwave circuits. Communication devices such as cellular phones, PDAs, and
pagers use complex integrated circuits that operate in the RF and microwave frequencies.
Given this new growth, circuit designers need a more efficient method for determining the
steady-state response of non-linear RF and microwave circuits. When circuits are in steadystate, important quantities, including power, frequency response, noise, distortion, and
transfer characteristics such as gain and impedance can be evaluated [Steer91a].
Traditionally, the two methods used to determine the steady-state response of circuits
to periodic input have been the harmonic balance method and the transient analysis in SPICE
[Kundert90]. Although harmonic balance is the most widely used method for these types of
circuits, it has difficulty with strongly non-linear circuits that are often needed for RF and
microwave applications. Harmonic balance cannot represent the transient behavior of circuits
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
3
and, therefore, chaotic or unwanted oscillations are not observed when circuits are simulated
only using harmonic balance. The second most popular approach is transient analysis. The
transient analysis is allowed to run until all transient signals have died out and only the
steady-state response remains. Two disadvantages of transient analysis are that certain types
of circuits take a very long time to reach steady-state, and it is sometimes difficult to
determine the exact moment when steady-state is reached.
Newer methods, first introduced for use in circuit simulation by Aprille and Tricke
[Aprille72a, Aprille72b, Colon73] and later used by Telichevesky [Kundert90,
Telichevesky95, Telichevesky96a, Telichevesky96b], include the shooting-Newton
algorithm for direct steady-state determination. In the shooting-Newton method, the values of
the capacitor voltages and inductor currents are calculated so that when used as the initial
state for simulation in the time-domain the circuit is in steady-state. Information from the
transient analysis in circuit simulators such as SPICE is used for the calculation. The
shooting-Newton algorithm iteratively constructs the solution to the steady-state response to
that of a two-point boundary value problem [Bailey68, Press92]. The Newton method is used
within the shooting-Newton to construct a guess for the initial condition. The original method
of Aprille and Tricke is inefficient for large circuits because it used the standard Gaussian
elimination method to solve for the iterate formed during the shooting-Newton method.
Telichevesky et al utilize Generalized Minimum Residual (GMRES) method and not other
Krylov methods as a more efficient method for determining the steady-state.
This dissertation presents an expanded and more efficient method for achieving
steady-state using the shooting-Newton algorithm. The new steady-state analysis option
incorporates standard Gaussian elimination and different Krylov-subspace methods,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
4
including Generalized Minimum Residual (GMRES), BiConjugate Gradient (BiCG),
Conjugate Gradient Squared (CGS), BiConjugate Gradient Stabilized (BiCGSTAB), and
Quasi-Minimal Residual (QMR) for solution of the system of equations generated by the
shooting-Newton algorithm. The vehicle for this implementation is the popular circuit
simulator, SPICE [Nagle75], whose code is freely available from UC Berkeley. The steadystate analysis and the Krylov-subspace methods are added to SPICE 3f5, the current version
of the popular analog integrated circuit simulator.
In addition, a mathematically consistent description of all numerical methods
involved in the shooting-Newton method used in circuit simulation is presented. The steadystate analysis in SPICE is shown to be much faster than SPICE transient analysis when
finding the steady-state solution of strongly nonlinear circuits, which harmonic balance
simulators have difficulty with. Krylov methods show even greater improvements in
determining the steady-state for larger circuits.
This dissertation also considers the computational issues of the algorithm and
examines the accuracy of the shooting-Newton method and its accuracy with the different
solver options. Some eigenvalues are examined for example circuits for the purpose of
convergence analysis of the Krylov methods. A detailed comparison of the Krylov methods
along with their convergence properties is presented. In addition, a breakdown that typically
occurs with the Krylov methods BiCG and BiCGSTAB applied to these types of matrices
with a standard initial guess has been corrected by changing the initial guess.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
5
1.1
Problem s w ith available methods
Determining the steady-state response for integrated circuits that are lightly damped
and have large time constants is very time consuming. RF circuits used in communication
devices often process carriers having a low frequency signal embedded on a high frequency
carrier wave. This situation causes problems in transient analysis because the high frequency
wave determines how many time steps need to be taken while the low frequency signal
determines the interval in which the simulation needs to take place. Long simulation times
result and make it difficult to thoroughly analyze them in steady-state. Also, high Q circuits
are difficult to simulate in SPICE, and these types of circuits occur regularly in
communication systems.
Four methods are commonly used to determine the steady-state response
[Kundert90]: transient analysis, finite difference, shooting-Newton, and harmonic balance. In
SPICE, the most popular analog circuit simulator, the steady-state is determined by allowing
the time-domain transient analysis to run until all the transient signals have died off and all
that remains is the periodic steady-state. With many integrated circuits, this simulation
involves long computer run times. A direct method computes the initial states for a circuit
that when applied result in the steady-state response. The finite difference method and the
shooting-Newton method both so-called direct methods [Kundert90] meaning the timedomain steady-state is determined as a solution to a two-point boundary value problem and
direct matrix solvers are used to update the solution guess. However, the computation time
using the standard matrix methods of solution is still too expensive. Finally, the harmonic
balance method is used extensively and works well for weakly nonlinear circuits, but
becomes computationally expensive for strongly nonlinear circuits. The transient analysis
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
6
option in SPICE can simulate these strongly nonlinear circuits, albeit with poor computer
efficiency.
1.2
N ew m ethod and its advantages
An improved and expanded method has been developed to determine the steady-state
response for analog and RF integrated circuits in the time-domain [KleinerOO, KleinerOl].
The method uses the shooting-Newton algorithm to directly determine the steady-state.
Unlike Telichevesky, different Krylov-subspace algorithms have been coded and
implemented that can be used to solve the resulting linear equations generated by this
method.
The expanded steady-state approach is a more efficient method for determining the
steady-state response of analog and microwave circuits that are difficult to simulate by
traditional methods. This class of circuits includes strongly nonlinear communication
circuits. With the implementation of the Krylov-subspace methods, greater computational
efficiency is gained and larger, more complex circuits can be simulated. Different Krylovsubspace algorithms are available to compute the solution so as to take advantage of the
particular matrix being solved.
The shooting-Newton method, along with the other important aspects of the approach
using Krylov-subspace methods, is implemented and tested in SPICE. Quantities computed
during the transient analysis in SPICE are reused in the calculation of the direct
determination of the steady-state response. The shooting-Newton method used to determine
the steady-state response is efficient for strongly nonlinear circuits that are frequently found
in RF and microwave systems. This is due to its relationship with standard transient analysis.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
7
In transient analysis, time steps can be nonuniform and can, therefore, follow the abrupt
transitions that are often encountered in RF and microwave circuits. The shooting-Newton
method uses these same time steps and is therefore insulated from the strong nonlinearity of
the circuit.
Results from simulations are compared with standard transient simulation, standard
shooting-Newton with Gaussian elimination, and shooting-Newton using the Krylov methods
for solution of the iterate generated. The convergence behavior of the different Krylov
methods including BiCG, CGS, BiCGSTAB, GMRES, QMR, and GMRES, are presented in
Chapter 5. The matrix sparsity and some eigenvalues are examined in Chapter 4.
This approach is shown to be much more efficient than transient analysis when
finding the steady-state solution of certain classes of circuits, like self-biasing amplifiers,
narrow-band filters, and circuits with high Qs, and lightly damped circuits with long time
constants. Because the approach is computationally efficient, it allows for simulation of
larger circuits. This method also shows greater efficiency than standard Gaussian elimination
solution of the shooting-Newton iterate. Implemented in SPICE, this approach is accurate
and efficient for the steady-state response for nonlinear circuits that are found in
communication devices.
For SPICE-based circuit simulators, the time-domain shooting-Newton method is
relatively easy to implement [Ashar85, Quarles89a, Quarles89b, Quarles89c, Quarles89d,
Quarles89e]. The direct shooting-Newton method of steady-state determination was proposed
by Aprille and Tricke [Aprille72a, Aprille72b, Ashar89]. Because of the computation costs,
this limited the use of the algorithm to relatively small circuits. The newer Krylov-subspace
methods can solve these equations generated by the steady-state response determination with
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
8
much greater efficiency. The Krylov-subspace method of GMRES has been implemented
with the direct shooting-Newton method and implicit matrix methods [Telichevesky96a].
However, it is not implemented in public-domain software and does not include other
Krylov-subspace methods for comparison.
1.3
O rganization o f the thesis
This dissertation presents a more efficient method for finding the steady-state
response of nonlinear integrated circuits with periodic inputs. There are two fundamental
parts to this method. First, the direct approach to steady-state determination is implemented
using the shooting-Newton method in SPICE. Second, several different Krylov-subspace
methods, including BiCG, CGS, GMRES, BiCGSTAB and QMR, are used to solve the linear
equations generated. The steady-state method results in the ability to analyze RF and
microwave circuits that are difficult to examine with existing methods.
Chapter 2 presents three primary methods for determining the steady-state response.
These are transient analysis, shooting-Newton, and harmonic balance. Their derivation,
formulation and typical implementations are also described. The advantages and
disadvantages of the different methods are considered. Some rarely used methods are also
introduced.
In Chapter 3, the numerical linear algebra background is presented. Methods of
solution for the linear system of equations that are generated by the shooting-Newton
algorithm are detailed. Chapter 3 begins with a brief overview of the formulation of the
system of equations for the determination of the steady-state response. Then a background is
given on direct and stationary methods. The main part of the chapter presents the derivation,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
9
computational resources, algorithms, and theory behind the Krylov-subspace iterative
methods used for the efficient solution of the linear system of equations solution needed for
steady-state determination.
Chapter 4 describes the practical algorithms for the implementation of the shootingNewton algorithm for steady-state determination using Krylov-subspace methods. These
algorithms are implemented and tested in SPICE 3f5, the most popular freely available
analog integrated circuit simulation program. Two main aspects of implementation are
discussed: the implementation details for the Jacobian needed for the shooting-Newton
iteration and the use of the Krylov-subspace methods for solution of that shooting-Newton
iteration. The chapter concludes with a discussion of important issues for the Krylovmethods, such as stopping criteria, robustness, breakdowns, non-convergence and memory
and operation counts.
Finally, Chapter 5 presents the results of the implementation of the shooting-Newton
method on typical examples of analog integrated circuits. These circuits were chosen as
examples because their steady-state response has been difficult to determine using traditional
methods. The accuracy and efficiency is demonstrated and compared to that of standard
transient analysis in SPICE. The different Krylov-subspace iterative methods are
implemented and their results compared and analyzed. Chapter 6 summarizes and concludes
the dissertation. It also points out future work that would supplement and extend the present
work, including distributed components and taking advantage of the special structure of the
Jacobian.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
10
Chapter 2
Steady-State Determination for RF and Microwave
Circuits
This chapter presents three primary methods for determining the steady-state
response. These are transient analysis as found in SPICE, the shooting-Newton method, and
the harmonic balance method. Their derivation, formulation and typical implementations are
also described. The transient analysis and shooting-Newton sections start with a
mathematical description of their derivations and give the same derivations consistent with
circuit simulation literature. The advantages and disadvantages of the different methods are
considered. Some rarely used methods are also introduced.
2.1
Introduction
The determination of the steady-state response of non-linear RF and microwave
circuits is of utmost importance. When circuits are in steady-state, important quantities
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
11
including power, frequency response, noise, distortion, and transfer characteristics such as
impedance and gain can be evaluated. Further, RF and microwave circuits have become
larger and more complex in response to the digital revolution, making the efficient
determination of steady-state increasingly important [Maliniak95].
Traditionally, two methods have predominantly been used to determine the steadystate response of circuits to periodic input: the harmonic balance method and the transient
analysis in SPICE [Gupta81, Steer9la, Maas99]. Harmonic balance is the more widely used
method for these types of circuits. It assumes that the steady-state response contains a
fundamental component of known frequency, plus several harmonics that are believed to be
dominant. The method evaluates the linear part of the circuit in the frequency domain and the
nonlinear part in the time domain. Harmonic balance can have difficulty with strongly non­
linear circuits that are often needed for RF and microwave applications. It cannot represent
the transient behavior of circuits, and, therefore, chaotic or unwanted oscillations are not
observed when circuits are simulated only using harmonic balance.
The second most popular method is the transient analysis in SPICE. The transient
analysis is allowed to run until all the transient responses have died and only the steady-state
response remains. Unfortunately, transient analysis presents two problems. First, depending
on the type of circuits simulated, it can take a very long time to reach steady-state. Secondly,
it can be very difficult to determine the moment when steady-state is reached.
Newer methods include the shooting-Newton algorithm for direct steady-state
determination. This method was first introduced by Aprille and Tricke [Aprille72a,
Aprille72b] and later expanded upon by Telichevesky [Telichevesky 95, 96a, 96b, 96c], In
the shooting-Newton method, the steady-state is determined using information from a small
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
12
portion of the transient analysis data. The data is used to construct a two-point boundary
value problem. The original method of Aprille and Tricke used standard Gaussian
elimination methods to solve the system of equations generated, but the method was not
efficient for large circuits. Following their work, Telichevesky introduced the Krylovsubspace method GMRES as a reliable and efficient method for steady-state determination.
Before the detailed explanation of the methods of steady-state determination, the term
steady-state is defined. In general terms, the steady-state solution of the differential equations
which are formed to represent the integrated circuit is that solution which is asymptotically
approached as the effect of the initial conditions decays.
2.2 Transient analysis in SPICE
The transient analysis mode in SPICE calculates the time-domain response of the
circuit over a user specified interval [Nagel7l, Pederson84], SPICE does this by first forming
a set of nonlinear equations that represent the components of the circuit along with its
connections, power supplies, and inputs and then solving these equations. The
interconnection equations use Kirchoff s Current Law (KCL) and Kirchoff s Voltage Law
(KVL). KCL states that the sum of all currents flowing out of a node at any instant is zero
and KVL states that the algebraic sum of all branch voltages around a loop at any instant is
zero. Components such as resistors, capacitors, diodes, and inductors are represented by
mathematical models that describe their electrical behavior. The equations constructed are a
set of nonlinear ordinary differential equations (ODEs) that describe the circuit. Then an
input stimulus is applied along with initial conditions specified, and the equations can be
solved for the time-domain response. Because of the nonlinear nature of the ODEs, a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
13
discretization must be applied to arrive at a numerical solution. Steady-state is reached when
all the transient responses of the circuit have died off and only the steady-state response
remains.
2.2.1 Transient analysis formulation
The equation formulation and solution for a non-linear circuit is detailed below. For
the sake of simplicity only nodal analysis is done with resistors, capacitors and current
sources [Pederson84, Nagel71]. The methods presented are easily expanded to modified
nodal analysis to include other devices, such as inductors and voltage sources. The steadystate solution can be found by having the transient analysis run long enough so that all the
transient responses have died out. For some circuits the time required is exceptionally long.
The transient analysis used by SPICE is briefly described with an emphasis on the
aspects that are relevant to steady-state determination [Banzhaf89, Quarles89c,
Vladimirescu94]. In SPICE, the equations are formed (in this discussion, only capacitors,
resistors and current sources are considered) using KCL. The system of differential algebraic
equations generated is as follows:
/ ( . r ( t ) , 0 = -7-^ (-v(O )+ i(-Y (O )+ «(O = 0 ’
at
(2.1)
where x(t) is the vector of node voltages, i(x(t)) is the vector of resistive node currents,
q(x(t)) is the vector of node charges, and u(t) is the vector of input currents. Each equation
in this system of equations applies KCL to each node in the circuit and using the
mathematical models of the elements, expresses the currents in terms of the node voltages.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
14
Equation (2.1) states that at each node, the current from the resistors, i(x(t)) the current
from the capacitors, — q(x(tj), and the current from the source, u(t ), connected to that node
dt
must equal zero. The solution in the interval specified by the user r e [0,r] is obtained by
discretizing this interval and computing a solution at each time point. At these chosen time
points, a method such as backward-Euler or the trapezoidal method is used to replace the
derivative. Because of the stability of the trapezoidal method and its use in the shootingNewton method implemented in this dissertation, the trapezoidal algorithm is used to
illustrate the transient analysis solution.
For an Ordinary Differential Equation (ODE) of the form:
^ - y = g{yd),
dt
(2.2)
the trapezoid rule estimates the solution, y , at time tk + h by [Heath97]:
+ g ( y ..O + g(yt ., ■>»„),,
'
2
where tk+l = tk +h and yk , y k+l denote y(rk) and y(tk + h ) , respectively. From (2.2),
however, g{yk,tk) = y'(tk) . Rearranging and using this fact, we see:
(yt+, - y j j - y ' ( 0 - £ ( v i+,,r,+i) = 0 .
h
(2.3)
Now (2.1) can be rewritten as:
dt
q{x{t)) = -i{x{t)) - u{t)
(2.4)
Applying the trapezoidal Rule (2.3) to (2.4) with —i(x(r)) —u(t) playing the role of g and q
playing the role of y gives:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
15
^-[q(x(t + h ) ) - q(x(t))]~ q'(x(t)) + i(x(t + h)) + u(t + h) = 0
(2.5)
(The subscript & has been dropped for simplicity.) The quantities cl(x it )) and *7 MO) in
(2.5) are known. Thus the goal at time t + h js to solve (2.5) for the vector x(* + ^ ) ,
assuming = is replaced by = . This gives using (2.4):
F(.x(t + h)) = q(x(t + h)) - q{x{t)) + [/'MO) + “ (0 + *'M' + h)) + "(t + h )\^ = 0
We have an equation of the form:
F{z) = 0
(2.6)
where F : R" —» R " . Newton’s method [Heath97] updates the solution estimate, z k, to (2.6)
according to:
j { z k)Azt
= -F M )
(2.7)
zk*1 = z k +Az k
where z is being used to denote the vector x(t + h) , and J denotes the Jacobian matrix of
the vector-valued function F . Now,
F(z) = q{z)-q(x(t))+[i(x(t)) + u(t) + i(z) + u{t + h)]^
(2.8)
so:
=
dZj
=1 ^ - 4 1 ^ )
dZj
2 dZ]
The method in (2.7) is continued until
z*+1- z i < £
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(2.9)
16
where e > 0 is a user specified tolerance and f(z* +i ) is close to zero. When this step is
complete, a single time step of transient analysis has been calculated. Newton’s method can
be shown to converge quadratically if the initial guess is close to the solution. This process of
iteration for solution must be done at each of the time steps. For simplicity of presentation,
for now on we refer to zk as the kth iterate.
In order for the transient analysis in SPICE to calculate the steady-state response the
simulation must be continued as long as it takes to have a response with no transients. This
may be a very long time for circuits with high-Q or slowly decaying transients. For many
circuits used in communication devices, the simulation time required by SPICE’s transient
analysis to reach steady-state is too long to be of practical application.
2.3
Shooting-N ew ton method
The shooting-Newton method is an iterative method used to determine the solution of
a two-point boundary value problem [Press92], A two-point boundary value problem exists
when ordinary differential equations are required to satisfy boundary conditions not just at
the initial starting condition but also at some other boundary. In our steady-state analysis
problem, the two boundaries are the end and the beginning of the period. Transient analysis
is an example of an initial value problem. Given an initial starting point, the solution of the
circuit equations marches along from one time point to the next. By determining the initial
starting values of the steady-state solution, we end up at the same point at the end of period.
The circuit is in steady-state and its output is periodic.
The shooting-Newton method is one of many methods that can be used to solve twopoint boundary value problems. In a shooting method the solution is started from a guessed
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
17
initial condition, and integrated for one period. Hopefully the result is the same conditions as
specified by the original guess. Usually, this is not the case, and the Newton method is used
to make a better guess. This process of calculating an initial guess, simulating, and checking
the final solution is continued until the circuit is in steady-state.
2.3.1 Derivation of the shooting-Newton method
Aprille and Tricke [Arprille72a, 72b] first proposed the use of the shooting-Newton
method to solve the system of equations for the steady-state response. They posed the
problem of finding the steady-state response as a two-point boundary value problem solved
by the shooting-Newton method. If the circuit equations are represented as the system of
equations
f( x ( t ) , t) = ^-q(x(t)) + i{x{t)] + u(t) = 0
(2.10)
then a constraint for achieving steady-state is that the transient effects have died off. This is
represented by:
x(0) = x{T)
(2.11)
In other words, the solution at the end of the period is the same as the condition at the
beginning of the period, which means that the circuit is in steady-state. The state transition
function can be used to define the two-point boundary value problem according to:
*(0) - <f>(x(to),to,T) = 0
(2.12)
where (j) is the state transition function. The state transition function isimplicitly derived and
determined by integrating the circuit equations (2.10) over T seconds from the initial state
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
18
*(r0). It is dependent on the initial state, *(r0), the period of the response, T, and the starting
time,r0. Because equation (2.12) is nonlinear, the shooting-Newton method is used to solve
the boundary value problem. This results in the following iteration:
*0(/)‘+1 = *„(/)* - [ i - y J - ' l V - 0 ( x { t o),to,T)\,
(2.13)
where J 0 is the Jacobian, also called the sensitivity matrix, and is represented by:
= ^ ( . v ( r 0),r0, r )
ax
(2.14)
The Jacobian is called the sensitivity matrix because it shows the sensitivity of the
initial state to the final state over one period T.
2.3.2 Forming the Jacobian by numerical differentiation
It is necessary to determine the Jacobian matrix before equation (2.13) can be
evaluated. The Jacobian matrix in (2.14) is given by:
3*,(7\*0)
3*,(7\*0)
a*oi
3*0 2
d x 2( T , x 0 )
3*2(r,*0)
3*0,
3*0,
3*,(7\*0)
dxo„
(T, * 0
3*2
)
dxo„
(2.15)
Joi .x o ) =
dxn(T’Xo)
dXa(T ’Xo)
dXn(T ’Xo )
3*n
3*0,
dxn
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
19
To calculate the elements of this matrix with the initial condition x(t0) = x0, first the
circuit equations in (2.10) are solved from t = 0 to t = T with initial states xQJ . Next, they
are solved with the initial states .v0y + Ar0, which is the change in the initial conditions for the
circuit. This change in the initial conditions must be as small as possible in order to increase
accuracy. The ik element of the Jacobian is approximated by the following equation:
dx,(r,x0)
_ x , ( t , x 0j + & x 0 ) - x , ( t , x qj )
&xok
(2.16)
where .r, (7\.v0y) and .v, (7\.r0; + Ax0) are solutions derived at the end of a period T. This is a
forward difference approximation to the first derivative [Heath97].
To form the Jacobian, the equations are solved twice with the two different initial
conditions: x0J and x 0J + Ax0 from t = 0 to t = T . There are n capacitors and inductors. All
n states of the system must be perturbed separately in order to compute the n columns of
the Jacobian matrix. This requires nz solutions to the circuit equations. Equation (2.13) can
then be solved. A more efficient composition of the Jacobian can be done by using sensitivity
networks [BrambiilaOl, D’Amore91, D’Amore93, D ’Amore95].
2.3.3 Forming the Jacobian by transient analysis
The Jacobian or sensitivity matrix can be calculated more efficiently than the method
explained in the previous section. In fact, it can be calculated step by step along with the
transient analysis because quantities used for Jacobian formation are needed by the transient
analysis. After one period of simulation, the state transition function and the Jacobian are
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
20
formed. One can then solve the iterate in equation (2.13) and determine if another iteration
should be done.
The dual formation of the state transition function and the Jacobian are best explained
by the derivation of the equations during the actual simulation [Heath97]. At time step m ,
during transient analysis, the following discretization and application of the trapezoidal
method result in equation (2.4) restated below in an easier-to-understand form with
A lm) = xm')1“ dq^X”~' ^ + /(-Vm) + U„ = 0 .
dt
f
(2.17)
Again, the derivative is known from the previous time step:
dq_
dt
(2.18)
Substituting (2.18) into (2.17) gives:
fhf o t a ) “
) ] + ' t a - i ) + «„-i +
i{xm) + um = 0 .
(2.19)
The derivative of the jth equation with respect to the kth component, x to , of -vo is by the
chain rule (note that all quantities are column vectors except for - I h ):
W q j (-Ym
+ V i j ( xm_{ )—
) ^ — -^,-1
dx.to
to
y
dx.to
Xm_l + V i j (Xm) - ^ ~ X m = 0
dx.t o
( 2 .20 )
We can rearrange equation (2.20):
r
dx.to
-x_ +
dx.k 0
,=0
- x m -i
Looking at equation (2.21) reveals that the transpose of the first term in square
brackets on the left-hand side is the jth row of the Jacobian for the transient analysis
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
( 2 . 21 )
21
solution at step lm, while the second is the jth row at the previous time step. Combining
equations J =
J{x. Y S - x.
dx0k
=
dx0k
■
( 2 .22 )
so the vector — xm in equation (2.22) is solved for by first computing the matrix-vector
oxQk
product on the right and then solving the m -vector equation using the factorization of J{xm).
Note that
a
xm is the kth column of
in (2.22).
Ok
The above method can also be looked at using the sensitivity circuit approach
[Colon73, Tricke, Aprille72a]. There is a sensitivity circuit for each state variable (capacitor
or inductor). The original circuit plus n (number of state variables) sensitivity circuits are
simultaneously integrated to determine the derivatives with respect to initial conditions. For
example, at time t , the Newton-Raphson iterations for the original circuit are completed.
The sensitivity circuits have the same circuit matrix but with different forcing functions. The
LU decomposition of the matrix is already computed in order to obtain the solution of the
original circuit. To obtain the solution to the sensitivity circuits, one multiplies this factored
matrix by the different forcing vectors in the sensitivity circuits that result from the n
different initial states. For n state variables, n right hand side vectors are created, and the
equations are solved. From their solutions, the columns of the sensitivity matrix are created.
All past stored variables of the original and incremental circuits are updated, and the
algorithm proceeds to the next time point. The process is continued until time t = T , the end
of the period is reached. Now the sensitivity matrix and the solution of the original circuit
can be used to determine the next guess for the initial conditions. The initial condition that
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
the shooting-Newton method computes is a set of capacitor voltages and inductor currents for
the circuit so that if these voltages and currents are used as the initial conditions for the
transient analysis, the circuit is directly in steady-state. A comparison is made to see if the
steady state response is reached, and the process is stopped if it has been. If not, then the
circuit can be simulated in transient analysis for a few periods, and then the shooting-Newton
algorithm is applied to search for another set of initial conditions that will put the circuit in
steady-state.
2.3.4 Steps in the shooting-Newton method
The following five-step shooting-Newton method can be used to determine the
steady-state response of nonlinear circuits with periodic inputs [Aprille73].
1. Compute the transient response x(t) of the circuit from t = 0 to t = T with initial
state .t(r) = .v0. At each time step, we solve a nonlinear system of equations by the
Newton method. We use the calculated states as the initial conditions for the next
time step until we reach the end of the period, x(T) is estimated (also referred to as
the state transition function).
2. Compute the sensitivity matrix. This is done at the same time as the transient
response is calculated. We now have all the parts needed to solve (2.13).
3. Compute the Newton-Raphson iteration (2.13) using the values for the Jacobian and
original circuit determined by the two previous steps. Here we are computing the
initial conditions that when applied to the circuit put the circuit into steady-state.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
23
4. Return to step 1 unless ||.t(7\ .r0(y)) - x0(y')j| < £ and |jc0(y + 1) - „r0( y | < 8 , where
sand 8 are (user-specified) arbitrary small positive numbers and y refers to the
Newton iteration for computation of the iterate.
5. Stop.
The shooting-Newton algorithm for steady-state determination is accurate and
efficient. Because the Jacobian used for the calculation is dense, standard Gaussian
elimination is used to solve the iteration and the computational costs can be prohibitive. It
works well for strongly nonlinear circuits, and circuits with transients that die off slowly.
SPICE transient analysis typically has difficulty with the second type.
2.4
H arm onic balance method
The harmonic balance method is an iterative method. It is based on the assumption
that the steady state solution for a circuit with a sinusoidal input can be approximated using a
linear combination of sinusoids to build a solution [Gilmore91a, Gilmore91b]. If the steadystate response contains only a few sinusoids, then harmonic balance can find them very
efficiently. The basic nonlinear circuit is divided into two parts, the linear part in the
frequency domain using phasor analyzes and the nonlinear part that needs to be analyzed in
the time domain. The time domain voltages and currents are then transformed into frequency
representation using the forward Fourier transform. According to KCL, the currents should
sum to zero at all nodes. Usually, the currents do not sum to zero and an iteration to the
correct solution must be done. An error function is used to calculate adjustments to the
voltage amplitudes and phases. When the error is small enough, the steady state solution has
been obtained. A major advantage of harmonic balance is its ability to incorporate frequency
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
24
domain descriptions of distributed components, such as transmission lines. The flow chart
shown in Figure 2.1 demonstrates the procedure used in a commercial harmonic balance
analysis simulator [HP-Eesof94],
2.4.1 Phasor analysis
Harmonic balance is a generalization of phasor analysis [Feldmann96, Filicori88,
Gilmore91a, 91b, Maas99, Rodrigues98, Steer91a, 91b]. Phasor analysis is used in SPICE for
steady-state sinusoidal analysis of linear circuits, linearized as small-signal analysis. The
waveforms are represented in the following form:
jt(f) = Acos(<wr + <f>) = Re[X£?'" ]
(2.23)
where X e jaM is a complex number called a phasor. This form of the equation is then plugged
into the ODEs that describe the circuit behavior, resulting in a system of linear algebraic
equations for the unknown complex phasors. In non-linear circuits often found in RF and
microwave applications, several frequencies can exist even if all time-varying sources are
sinusoids of the same frequency. Contrary to what occurs in linear circuits, different
frequencies interact, producing new output frequencies in non-linear circuits. Steady-state
waveforms in a nonlinear circuit can be expressed as a truncated (generalized) Fourier series:
K
K
x(t) = X(0) + ^ ( X C(k)coscokt + X s (k)sincokt) = £ X(/t)<?"u‘'
k=\
(2.24)
k= -K
where co_k =-cok , X (—k )= X ‘(k), and the set of frequencies is {a)0 = 0
,cok} where
appreciable power is expected. It is assumed that all the waveforms in the nonlinear circuit
have this form. This representation of the response as a sum of basis functions is
characteristic of all frequency domain methods. As in phasor analysis, the next step is to
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
25
Circuit Sim ulator Overview
S a m cfe P oints
N um oer o f H arm onics
Sim ulation Frequency
M e asu re I n e a r
Circuit C urrents
M easu re N onlinear
Circuit Voltage
Fretjjency-O cm am
Frequency-C om am
inverse Fourier Transform
Nonlinear V d tag e
C alculate N c n in e e r
C urrents
Fourier Transform
Nonlinear C urrents
Frequency-Com am
Error > Tolerance
Another Pow er
Yes
No
Yes
Another Frequency
No
St CO
Simulating and Testing
5-19
Figure 2.1 Flow chart for harmonic balance simulation from Eesof manual.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
26
substitute into the ODE’s and obtain a system of nonlinear algebraic equations for the Fourier
coefficients of the waveforms. The problem of solving a system of ODE’s is transformed into
solving a system of algebraic equations.
The general harmonic balance method solves the system of equations of the form:
f(x(t),x(t),t) = 0
(2.25)
where / is a vector of n nonlinear functions of 2n +1 variables representing the ODE’s,
.t(r) is a vector of n variables (state variables) to be determined, and x{t) = — . The
dt
dependence on t , time, accounts for the periodic forcing terms. The steps involved in a
general harmonic balance solution include:
1. Determine the frequency set (required to describe the solution).
2. Truncate the frequency set (cannot use infinite series of frequencies, so choose a
finite number).
3. Expand (expand forcing terms as truncated Fourier series: u(t) ->U).
4. Express x(t) as a truncated Fourier series (same as 4. for x(t)).
5. Substitute into system equations.
6. Solve.
2.4.2 Simple example of harmonic balance method
The application of the harmonic balance method for integrated circuits is presented
through a simple example. In Figure 2.2, vs(t) is an independent voltage source. R l, LI, and
C l are linear elements; the voltage controlled nonlinear cap C2 is described by its charge-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
27
R1
C1 i
LI
-Q£££2£_
i(t)
R2
Vs(t) Q )
C2
Linear Subnetwork
Nonlinear Subnetwork
Figure 2.2 Example circuit [Rodriques98].
voltage relationship: q = %(vn), and a voltage controlled non-linear resistor is described by
i f = f ( v n)-
First the nonlinear circuit is partitioned into a linear and nonlinear subnetwork. The
currents in each of these subnetworks is represented as:
m = - i L(t) = iNL(t)
ilVL(t) + iL(t) = 0
Independent sources may be included into the linear subnetwork. Breaking up the circuit in
this manner allows the method to take advantage of linear techniques to analyze the linear
subnetwork, while keeping the nonlinear subnetwork as small as possible.
The problem of solving the original circuit is now equivalent to solving the two
subnetworks in Figure 2.3 with the constraints on the currents shown in equation (2.26).
Expressing the currents as functions of the voltage allows us to be able to solve for the
unknown voltages. We can then analyze each of the subnetworks individually. Since one
subnetwork is linear, phasor analysis in the frequency domain can be used. The other
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
28
il(t) inl(t)
C2
Vs(t)
Vn(t)
o o
Linear Subnetwork
R2
Vn(t)
Nonlinear Subnetwork
Figure 2.3 Circuit divided into linear and nonlinear subnetwork.
subnetwork is nonlinear and must be solved in the time domain. The Fourier transform
provides the bridge between the two analyses. We can now proceed through the steps
outlined above for the harmonic balance analysis.
1. Determine the frequency set. vs(t) is a combination of a finite number of sinusoids.
2. Truncate the set of frequencies. The steady-state response is the truncated set of
waveforms:
{"o —
COg }
where A Kis the set of frequencies.
3. Expand vs (t ) as a Fourier series:
K
vs(t) = Vs (0) + £ CVsc (k ) cos ojkt +V”S5(k ) cos cokt)
k=1
Usually only a few Fourier coefficients are different from zero with independent
sources.
4. Expand the unknown voltage as a generalized Fourier series:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
29
v„0) = Vn(0) + £ (Ync (k) cos cokt W * (k) cos a;*/)
i= t
Here we must develop the equations for the linear and for the nonlinear subnetwork.
5. Substitute into equation :
iNL(/) + iL{t) = 0 to solve.
In step I the frequency set is determined from the input signals and combinations of
the signals. The truncation of the frequencies in step 2 can be done incorrectly and an
important output signal could be lost. The set of equations resulting from this expansion
method are solved by using Newton’s method. This sparse linear system can also be solved
by direct, stationary methods or by non-stationary techniques such as the preconditioned
Krylov subspace methods (CGS, Bi-CGSAB, BiCGSTAB, TFQMR) [Brachtendorf95a,
Brachtendorf95b]. The convergence of stationary iterative methods for linear systems
associated with simulation of circuits is inefficient. However, the non-stationary techniques
usually converge in a finite number of iterations that involves less computational resources.
This convergence can be met more easily if preconditioning is applied. Instead of the solving
the original system Av = b , a preconditioned system is solved. A preconditioner is an
auxiliary matrix in an iterative method that approximates the coefficient matrix or its inverse
[Templates94], The preconditioner, or preconditioning matrix, is applied in every step of the
iterative method and aids in convergence. The Jacobian matrix, also called the node­
admittance matrix, is preordered using the Markowitz preordering technique [Heath97],
Comparing the time to simulate a standard operational amplifier that was strongly nonlinear
demonstrated a five to six faster simulation time for steady state determination than when
using the Gaussian elimination direct method [Feldman96, Filicori88, GadOO, Gourary97,
GouraryOO].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
30
2.4.3 Extensions of the harmonic balance method
Another way to perform steady state analysis of nonlinear circuits is a technique that
is a modification of the standard harmonic balance [Gugleilmi96]. The original algebraicdifferential equations that describe the nonlinear circuit are changed to algebraic equations in
the frequency domain. Instead of using the standard Newton’s iteration to solve these
equations exactly, an inexact method was used. The node-admittance matrix was
preconditioned using the positional dropping strategies and numerical dropping based
methodologies. The inexact methods such as GMRES, with and without preconditioning
produced steady state determinations at least ten times faster. This new approach to harmonic
balance simulation, which is based on inexact Newton methods and iterative system-solving
techniques, avoided the storage and factorization of the Jacobian matrix [Rizzoli96]. This
allows for very large (>1000 nodes) circuits to be analyzed very efficiently.
An extension of the harmonic balance method was implemented using the standard
setup [Feldmann96]. Krylov subspace methods were used to solve the resulting nonlinear set
of equations. Also detailed is the use of a problem-specific preconditioner for inverting the
harmonic balance Jacobian matrix. A comparison of two Krylov subspace methods, GMRES
and QMR, as applied to the harmonic balance algorithm [Gourary97] showed higher
simulation speed as compared to standard direct methods such as Gaussian elimination. The
GMRES method was clearly more efficient than the QMR method.
An index method was used to generate a mixed frequency index [deCarvalho97] so
that the harmonic balance method could evaluate multitone signals. However, the method
still needed large amounts of memory, was very computer intensive and could not be used
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
31
with irregular input spectrums. Another harmonic balance modification [SheIkovnikov92]
used multidimensional formulation of the problem. In this approach, the circuit behavior was
described with a multidimensional spectrum. Some spectral components were assumed to
have negligible values and were ignored. The system was then divided into real and
imaginary parts and solved with Newton’s method.
Harmonic balance is an efficient and accurate method for computing mildly nonlinear
circuits. Strongly nonlinear circuits need special adaptations of the method that limit its
efficiency and sometimes its accuracy. The disadvantages of the harmonic balance method
include the difficulty in the analysis of oscillators. In time-domain simulation, the frequency
of oscillation can be determined, but in harmonic balance the procedure is very complex. In
addition, harmonic balance does not represent the transient response and, therefore, could not
show chaotic and other non-steady-state behavior.
2.4.4 Other methods
Another method that is rarely used is the perturbation method. After an initial guess
of some linearized equations, the solution is perturbed until the correct answer is obtained.
The Volterra-series and the Picard-iteration are examples of this method. The drawbacks of
this approach are that not only do these often fail to converge, but they can also be
inaccurate. Using parallel processing for circuit simulation can allow transient simulation to
reach steady-state in faster times [Duff90, Buch94, Larcheveque96].
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
32
2.5
C onclusion
The three main methods of steady-state determination were presented: standard
transient analysis in SPICE (also called the brute force method), the harmonic balance
method, and the shooting method [Chua81]. In transient analysis, the differential equations
representing the circuit are allowed to integrate until all the transients have died out. Usually
one starts with some arbitrary initial conditions and keeps solving until there is only the
steady state solution. However, there are major drawbacks to this method. In lightly damped
circuits, the time constants are large and the transient component may last for an extremely
long time and may involve many calculations for signals with high frequency. The widely
used analog circuit simulator SPICE uses the transient analysis method to determine steady
state. SPICE, however, encounters numerous problems with an input signal that has both a
high frequency and a carrier wave with a low frequency. The high frequency signal
determines the time step taken which makes for a very short integration time step during
transient analysis. The low frequency signal determines the interval over which the solution
must be carried out. This means that the solution must be carried out for a long interval
before all the transients have died out. It is sometimes difficult to determine when they have
died out because they can change very slowly [Brachtendorf95a, Brachtendorf95b].
The most common method used to determine the steady-state for communication
circuits is the harmonic-balance method. Harmonic balance simulators are found in Agilent’s
ADS software and in Microwave Office. Harmonic balance is an expansion frequency
domain method, which approaches the problem of finding the steady state by a trigonometric
polynomial. With this method, the approximate solution is represented as a finite,
trigonometric series. The terms with the same frequency are balanced by a method such as
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
33
Galerkin’s. A significant disadvantage to this method is that it uses multidimensional Fourier
analysis to determine the frequency components. Nonlinear signals are represented in the
frequency domain as combinations of a large number of linear signals, also called harmonics.
Nonlinear circuits can have many harmonics and, unfortunately, the simulation computation
increases exponentially with the number of harmonics. Therefore, strongly nonlinear circuits
are almost impossible to simulate using the standard harmonic balance method. Because of
these reasons, specialized, computationally expensive methods must be used for large
nonlinear circuits (in large circuits nonlinear elements such as transistors can each generate
many harmonics). Steady state solution is slow, is extremely memory-intensive, and may
calculate the incorrect solution.
The third common method for steady-state determination is called the shootingNewton method. The steady-state solution of the differential equations representing the
circuit is solved by posing the problem as a boundary value problem whose solution is the
steady-state response. The shooting method is an iterative method, where one calculates the
initial conditions to put the circuit in steady-state. The initial states used are calculated by
many methods including the Newton [Aprille72a, Aprille72b] or extrapolation [Skelboe70].
This method works well for small, strongly nonlinear circuits. For large nonlinear circuits,
however, the shooting-Newton method has large computational and storage costs.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
34
Chapter 3
Numerical Methods Background
In this chapter, an overview of the formulation of the system equations for the
determination of the steady-state response is presented, followed by a brief background of
direct and stationary methods [Demmel97, Dongarra96, Greenbaum97, Kelley95, Saad96,
Trefethen97]. The main part of the chapter presents the derivation, computational resources,
algorithms, and theory behind the Krylov-subspace iterative methods which are used in this
thesis to find an efficient solution of the linear system of equations needed to determine
steady-state.
3.1
Introduction
In the previous chapter on circuit simulation, a system of equations was generated by
the shooting-Newton method to determine the steady-state response of an integrated circuit.
The system of equations is used to calculate a set of voltages and currents that when used as
the initial conditions for the circuit results in the circuit’s steady-state response. The
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
35
determination of the steady-state response using the shooting-Newton method is an iterative
method. Iterative methods mean that the equations must be formed and solved numerous
times over the course of the steady-state determination.
In the original paper by Aprille and Tricke [Aprille72a, Aprille72b], the equations
were solved using the direct algorithm of Gaussian elimination. If the standard Gaussian
elimination type direct method is used to solve the iterate formed in the shooting-Newton
method, the number of equations that can be in that system of equations is limited. If n is the
number of equations describing the system, then these direct methods use <9(«3) operations
to solve the system of equations and 0 ( n 2) computer storage. This becomes much too costly
for larger systems of equations that are generated by large circuits. Using a direct method,
with no round off or other errors, the exact solution is obtained only when all the operations
are completed. When round-off errors arise and large systems of equations are being solved,
the errors increase, often making the results unacceptable. With iterative methods, the
solution is obtained at each step, and the iteration is stopped when the solution is deemed
accurate enough. In a direct method, the entire method must be completed before there is an
answer. One cannot stop the algorithm before the end with a “less” accurate solution. If the
matrix has a special structure, it is not always preserved with a direct method. For example, a
sparse matrix is filled in during Gaussian elimination.
An alternative approach is to use a non-direct (iterative) method. An iterative method
starts with an approximate solution and uses a recurrence formula to calculate another
approximate solution. The formula is applied repeatedly until a solution is deemed “good
enough.” Although the iterative methods can solve the system of equations in a finite number
of steps in exact arithmetic, they can be stopped before they reach the exact answer with a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
36
solution that is deemed good enough. Their cost in terms of computer operations and storage
is typically much lower than the cost for direct methods. Depending on the iterative method
used and the accuracy desired, the costs could even be lowered to O (n ). Further, iterative
methods are less likely to propagate errors.
3.2
H ow the system o f equations are generated
For a more detailed presentation of the derivation of the steady-state system of
equations presented here, refer to Chapter 2. There, the equations generated for the transient
response of circuits are explained along with the derivation of the shooting-Newton method
of steady-state determination. The system of equations formed for the transient analysis of a
circuit that includes resistors, capacitors and current sources (for simplification of the
discussion) is represented as:
/ { j : ( r ) , r ) = i ( j r ( r ) ) + ^ - ( j f ( r ) ) + M(/) = 0
dt
n m
v• /
where x(t) is the vector of node voltages, /(.t(0) is the vectorof resistive node currents,
<?(■*(») is the vector of node charges, and u(t) is the vector of inputs. Steady-state is
achieved when the transient effects have died off. This is represented by
.t(0) = x(T)
(3.2)
In other words, the solution at the end of the period is the same as the condition at the
beginning of the period. This means that the circuit is in steady-state. The state transition
function can be used to define the two-point boundary value problem, thus:
*(0) - <p(x(tQ), r0, r ) = 0
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.3)
37
where <j) is the state transition function. The state transition function was implicitly derived; it
was calculated at each time point until the end of the period. It is dependent on the initial
state, .v0, the period of the response, T, and the starting time, tQ. We applied the shootingNewton method to solve the boundary value problem that results in the following iteration:
.r0(/)i+l = .t0(r)‘ - [ i - y J - ' U / -0 (.r(/o),to,r)J.
(3.4)
where J 0 is the Jacobian, also called the sensitivity matrix, and is represented by:
dx
.
(3.5)
The sensitivity matrix is computed at the same time as the state transition function.
Quantities needed for the calculation of the sensitivity matrix are already available at each
time point from the transient analysis. It is formed using sensitivity circuits [Aprille72b] and
its calculation is computationally expensive. The equation in (3.4) is solved. A new initial
condition is determined for the steady-state response. This result is checked to see if the
circuit is in steady-state. If not in steady-state, the shooting-Newton procedure is repeated
and another initial condition guess is calculated [Aprille72a, Aprille72b]. This process is
continued until the steady-state is reached. The initial condition that the shooting-Newton
method computes is a set of capacitor voltages and inductor currents for the circuit so that if
these voltages and currents are used as the initial conditions for the transient analysis, the
circuit is directly in steady-state.
A drawback is that the solution of this iterate equation is computationally expensive.
Forming and solving the dense sensitivity matrix is the most expensive. If a method like
Gaussian elimination is used, then the costs are very high and the speed-up of the direct
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
38
method of steady-state determination is limited. The newer Krylov-subspace iterative
methods result in much greater computational savings. The computation cost of forming the
sensitivity matrix can be avoided if instead of forming the sensitivity matrix at each transient
analysis step, the quantities that are used to form it are stored and then accessed when they
are needed for the solution of the iterate in the Krylov-subspace method [Telichevesky95a,
Telichevesky95b, KleinerOl, KleinerOO].
3.3
D irect versus iterative m ethods
The equation (3.4) can be looked at in the familiar form:
Ax = b
where | / - y J = A ,
(3.6)
(x0( r / +1- x Q(t)k) = x , and (x0k~(p(x(t0 ),r0,T) = b).
The equation in (3.6) is solved for .v, and then the nextiterate is found by subtracting
the two quantities. In this thesis, the matrix A is always presumed real, square, and nonsingular, so that, for each real right-hand side vector b , there is a unique real solution .v
given by
x = A ' lb
(3.7)
For our problem the matrix A , also referred to as the sensitivity matrix, is not assumed to be
sparse.
A direct solver is a linear solver that constructs an explicit operator v i—> A”'v by
computing the inverse A-1 itself or a factorization of the matrix, and then applies this
operator to b to find the solution .vas in the explicit for (3.8) [Strang76]. A common
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
39
example of a direct linear solution method is Gaussian elimination. There are three distinct
stages in the solution process using direct Gaussian elimination. They are:
1. The LU factorization of A , where L is a lower triangular matrix and U an upper
triangular matrix. This decomposition is performed by direct Gaussian elimination,
no pivoting being used.
LUx = b
(3.8)
2. Transformation of (3.8), by elimination into:
Lc - b
(3.9)
Equation (3.9) is solved by forward substitution. This stage can be performed at the
same time as the LU factorization.
3. Back-substitution to form x : that is, the solution of
Ux = c
(3.10)
If the matrix A has a special structure then special adaptations of Gaussian
elimination could be used. This can save many operations and storage space [Kundert88].
Unfortunately, for our problem the general structure cannot be predicted and therefore, these
special methods cannot be applied.
Floating-point operations or flops, are the basic arithmetic operations of a computer.
These operations include addition/subtraction and multiplication/division. Gaussian
elimination, QR factorization, and most other algorithms of dense linear algebra Fit the
following pattern: there are 0(m) steps, each requiring 0 ( m 2) flops, fora total work
estimate of 0 ( m 3) . For example, in solving the n x n matrix equation Ajc = b by the
Gaussian elimination method, the augmented matrix (a|&) is First reduced to upper-triangular
form. Taking into account the increasing number of zeros as the method goes row by row, the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
40
reduction of the matrix to zeros below the diagonal will require (n - 1)2 + (n —1)~ +... + 22 +1
multiplications and the same number of subtractions. The next step of back substitution
requires n(n - 1)/2 multiplications and the same number of additions. Combining these two
operation counts, the number of flops required in the Gaussian elimination method is:
—
6 n(n
v —lX4n - 1)/ = —3— .
This is actually a minimum amount because usually row pivoting must occur if small
pivots are encountered. Assuming that a swap occurs for every step, the maximum number of
flops is asymptotic to 2/j3/3 [Strang76].
3.4
Iterative m ethods background
The non-direct methods are often called iterative methods. Iterative methods begin
with an initial guess and compute a sequence of approximate solutions that converge to the
exact solution. An iterative solver is a linear solver that produces a sequence .r0,.r,,.. ,,xk,...
of approximations to the solution. The step going from
to x k is called the kth iteration
of the solver. .t0 is the initial guess and is usually set to the zero vector. At step k of an
iterative method the iterate is updated as:
x k = x k-i+<XkPt
(3-11)
where a k is a scalar and pk is a direction vector.
The accuracy of an iterative method solution depends on the number of iterations
performed. An iterative solver usually terminates if the approximation is “close” enough to
the solution as determined by the user [Barrett94]. The original matrix A is not transformed
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
41
meaning that if it is sparse it is not filled in or if it has some special structure that structure is
preserved. The amount of work involved in the solution using an iterative method depends on
the convergence rate and the desired accuracy. Since the most expensive operation is matrixvector multiplication, the work is proportional to the flops required to perform a matrixvector product times the number of iterations. For some iterative methods convergence can
be very slow or irregular. The convergence rate for an iterative solver depends on the
spectrum of the coefficient matrix and on the condition number. The rate of convergence can
be helped by the technique of preconditioning.
There are many iterative methods. Some are stationary and some are non-stationary.
A very useful and efficient class of iterative methods is called Krylov-subspace methods. The
primary advantage of Krylov-subspace methods is that they can substantially reduce the
operation count and storage for linear system solutions.
3.4.1 Stationary and non-stationary iterative methods
Iterative methods can be more computationally efficient for the solution of the system
of equations generated by the shooting-Newton steady-state determination. In the iterative
methods a possible solution is calculated, checked, and recalculated until the answer is close
enough. This process can involve far fewer flops and less computer storage than direct
methods. Because of this efficiency, much larger circuits can be analyzed and the steadystate response determined by the shooting-Newton method.
There are two general classes of iterative methods, stationary and non-stationary
[Freund92, Rheinboldt98, Strang76, Steward98]. In stationary iterative methods the
transition from calculated solution, x k to x t+l does not depend on the history of the iteration.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
42
In non-stationary methods, the present iterate is dependent on previous iterates. Stationary
iterative methods can be expressed in the form:
x k = B x k~'+c
(3.12)
where x k is the is the guess at step k , B and care quantities that do not depend on the
iteration step, and x k~l is the previously computed iterate. There are four main stationary
methods: 1) the Jacobi method, 2) the Gauss-Seidel method, 3) the Successive Over­
relaxation (SOR) method, and 4) the Symmetric Over-relaxation (SSOR) method. In the
Jacobi stationary method, strong diagonal dominance is needed for convergence. A system
where the equations can be ordered so that each diagonal entry is larger in magnitude than
the sum of the magnitudes of the other coefficients in that row is called diagonally dominant.
The Gauss-Seidel algorithm uses the most recent values of solution approximation, x k, to
improve the convergence rate. The Gauss-Seidel method’s convergence depends on the order
in which the equations are examined and computed while obtaining the new iterate. It has
faster convergence than Jacobi, but slower convergence than non-stationary methods. SOR
and SSOR are adaptations of Gauss-Seidel method. They can be used to accelerate
convergence of the Gauss-Siedel method or to converge to a solution when Gauss-Seidel
fails.
Although the stationary iterative methods can use fewer operations and computer
storage than non-stationary ones, they converge for a small subset of problems where there is
diagonal dominance. Since the system of equations generated for the shooting-Newton
method are not usually diagonally dominant, nor can that property be known ahead of time,
the non-stationary iterative methods are a better choice.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
43
3.5
K rylov-subspace m ethods
The Krylov subspace is defined as the space spanned by successively larger groups of
vectors:
(3.13)
Krylov methods are based on Krylov subspaces and form a class of iterative methods for the
solution of linear systems. Krylov algorithms include the Amoldi-based Generalized
Minimal Residual (GMRES) algorithm, and the Lanczos-based Conjugate Gradient (CG),
Conjugate Gradients Squared (CGS), BiConjugate Gradient squared Stabilized
(BiCGSTAB), and Quasi-Minimal Residual (QMR) algorithms (Figure 3.1). Krylovsubspace methods are non-stationary iterative methods [Ipsen98, Skewchuk94]. This means
that the solution to the system Ax = b is iterated, and at each iteration the information used to
calculate the iterate changes. The general form of a non-stationary iterative method is:
xk =**-, + a kp k
(3.14)
where p k is a search direction and a k is scalar and found by some criteria according to the
exact method used. The scalar and direction vector are usually computed by taking inner
products of residuals or other vectors arising from the iterative method used. Non-stationary
methods differ from stationary methods in that the computations involve information that
changes from one step to another. In other words, information changes as the iteration index
changes.
Krylov-subspace methods start with an initial guess, ,r0 and generate successive
approximations to the answer. Given matrix A , and vector b , the associated Krylov
sequence is the set of vectors b ,A b ,A 2b,A 2b ,.... During the iteration at step k , a Krylov-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
44
subspace method produces the approximate solution from the Krylov space at that step that
was generated by the vector b . In all the Krylov algorithms, the solution/ iterates can be
thought of as matrix polynomials. For example, in GMRES, CG, BiCG, and QMR:
xk E span{b,Ab,...,Ak~lb} if x0 = 0 .
(3.15)
(For simplicity we will assume that x0 = 0 , else .xk E span{r0, Ar0,..., A*"‘r0}.) Then
.rt E span\b, Ab,..., A*'1/?} means xk is a linear combination of the vectors
b ,A b ,A 2b,...,A"~lb , i.e.
X* = a0b + al(Ab)+ a2{A2b)-\----- 1-ak_l[Ak~{b)
(3.16)
where O o,...,^,, are scalars. Now if we factor out the b on the right we observe:
xk = (a0 +alA + azA 2 + ■■■+ ak_xAk~x)fr = pk_l(A)b
where
(3.17)
is the polynomial of degree k —1
pk_t{t) = aQ+ a{t + a2t~ +... + ak_lt k '.
(3.18)
Since xk can be written in this form, the residual can also be written as a polynomial:
rk = b —Axk = b - Apk^ (A)b
= { l - A p k_l(A))b
= qk(A)b
(3.19)
where qk is the k degree polynomial qk (r) = 1- a0t - a,/2 - a2t3 - . . . - ak_xt k with the
property that qk(0) = 1. If we had taken xQ ^ 0 , then xk = pk_i(A)r0 and rk = qk(A)r0. One
can think of thedifferences among CG, GMRES, BiCG, and QMR in how the coefficients of
the polynomial are chosen. For GMRES, the qk(t) is the minimalresidual polynomial:
qk{t) = argmin||/?(A)r0||2
p ( /) £ n *
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.20)
45
The difference between the methods such as CGS and BiCGSTAB is more easily
understood by thinking of the Krylov methods as using different polynomial manipulation.
This chapter will first present a general background of Krylov-subspace methods, including
the original Krylov method, CG, followed by GMRES, and the group of similar methods of
BiCG, CGS, QMR, and BiCGSTAB. We note that CG is not applicable to most matrices
since these are not SPD. However, we use CG as a jumping off point to introduce the other
Krylov methods.
In most cases the Krylov-subspace methods converge to a solution much faster than
stationary solvers and in less time than a direct solver. In addition, because they require much
less storage, Krylov-subspace methods offer an efficient method for solving the system of
equations generated by the shooting-Newton method of steady-state determination.
3.5.1 Classification of Krylov methods
This section starts by defining important terminology used in describing the Krylov
methods [Strang76, Greenbaum97].
1. Span: if the vector space V consists of all the linear combinations of particular
vectors vw,,...,w/ these vectors are said to span the space V . Every vector v in V
can be expressed as some combination of w 's
v = c,w, + ... + c,w,
for some coefficients c ,.
2. Basis: the basis for a vector space is a set of vectors that are linearly independent and
span the space.
3. Orthogonal: Two vectors are orthogonal if their dot product is zero. They are said to
be perpendicular.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
46
4. Linear independence: if non-zero vectors v,,...,vt are mutually orthogonal (every
vector is orthogonal to every other vector), then they are linearly independent.
5. Orthogonal subspace: two subspaces of the same space are orthogonal if every vector
in one subspace is orthogonal to every vector in the other subspace. For example for
subspace V containing vectors v and subspace W containing vectors w :
vr vv = 0
for all v and w.
6. Gram-Schmidt orthogonalization: Any set of vectors av ...,a n can be converted into a
set of orthogonal vectors by the Gram-Schmidt process. First, v, = a ,, and then each
v, is orthogonal to the preceding v,,...,v(_,:
v .a
T ' v,_,. For the current vector the components in the
direction of the previous vectors are subtracted. For every choice of I, the subspace
spanned by the original al,...,an is also spanned by v,,- -,v,. The final vector is then
normalized.
7. Upper Hessenberg matrix: a matrix is said to have Hessenberg form if all the
diagonals below the first subdiagonal are zero.
8. Tridiagonal matrix: a matrix is tridiagonal if all the nonzeros entries lie on the main
diagonal and the two adjacent diagonals.
9. Biorthogonalization: W TV = D = V TW where D is a diagonal matrix. . If we let v Vj
be the orthogonal basis for the span: {r0, Ar0,...,A i_Ir0} and w}'s be the orthogonal
basis for the span: {/^, A7,..., An ~‘r0}. Then biorthogonalization : VkUWk+i =Diagonal
of size k +1 x k +1 and reduction to tridiagonal form is related to the fact
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
47
AVt = vk+irt +l- Therefore, WkT+lAVk = Wk\ lVk+lTk+l = DTk^ where Tk+l is tridiagonal.
Then, D -lWkT^A V k =Tk+l.
10. A-orthogonal: for example .t7Ay = 0 .
11. Conjugate: the complex conjugate of a + ib is a - ib .
There are two general classifications of the Krylov methods (Figure 3.1). The original
Krylov method of CG produces a tridiagonal matrix in the process of orthogonalization. The
two classifications result from the adaptation of the CG method for nonsymmetric and /or not
positive definite matrices. GMRES is based on the Amoldi Gram-Schmidt process and
results in a matrix that is not tridiagonal but preserves the orthogonalization of the basis
vectors. Other Krylov-subspace methods give up the orthogonal basis. In CGS, BiCG, QMR,
and BiCGSTAB, the tridiagonal matrix is formed by the biorthogonalization of the system.
In the following subsections, equation (3.6): A t = b was solved. The coefficient matrix, A
being referred to is the sensitivity matrix, I - J 0.
Conjugate Gradient (CG)
tridiagonal
orthogonalization
GMRES
Hessenberg
orthogonalization
CGS, BiCG, QMR
BiCGSTAB
tridiagonal
biorthogonalization
Figure 3.1 Classification of Krylov-subspace method.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
48
3.5.2 Conjugate Gradient (CG) method
The CG method is the oldest and best known of the Krylov-subspace methods
[Saad95, Skewchuk94]. It calculates successive approximations to the solution, the residual,
directions vectors and scalars used in the updating of the approximations to the solution. The
CG method is used to solve symmetric positive define (SPD) systems which is usually not
the case for the systems of interest here. Matrix A is symmetric if:
A = A T.
(3.21)
Matrix A is positive definite if
.rr A t > 0
for all .v
0.
(3.22)
The CG method, described below, is the basis for the iterative non-stationary Krylovsubspace methods. It can only be used when the coefficient matrix is symmetric positive
definite, a condition that is generally not met for the coefficient matrix generated by the
shooting-Newton iteration. The CG method is explained here because it exemplifies the other
Krylov-subspace methods that can be used in circuit simulation. The tridiagonal form of the
matrix makes for relatively easy solution of the linear system. The other methods are Krylovsubspace methods similar to the CG method which abandons orthogonalization for
biorthogonalization in the CGS, QMR, BiCG, and BiCGSTAB methods. In the GMRES
method the orthogonalization is preserved but the matrix is no longer tridiagonal but of
Hessenberg form.
CG is based on the symmetric Lanczos algorithm, which is a special case of the
Amoldi algorithm for symmetric systems. In the Amoldi method, the matrix A is reduced to
Hessenberg form by an othogonalization process. When the system is symmetric, the
resulting Hessenberg matrix is tridiagonal. This makes for short recurrences in both the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
49
Amoldi method and in the solution process. This tridiagonal matrix also means that only a
few vectors need to be stored.
In the Lanczos method, an orthogonal basis for the Krylov-subspace is constructed
and the coefficient matrix undergoes a similarity transformation to tridiagonal form [Saad96].
In this process, the residuals created during the Lanczos method are orthogonal to each other.
Also, the search directions (or pointer vectors) generated are A-orthogonal or conjugate. We
can use these two conditions to show the derivation of the CG algorithm from the Lanczos
method. The vector of the iterated approximation to the solution can be represented as:
-r,+, = .r j + a jPj
(3.23)
Then, the residual vectors must satisfy the recurrence:
rJ+[ = rt - a j Ap ]
(3.24)
If the ry ’s are orthogonal, then:
(rJ - a j ApJ,rJ) = 0
(3.25)
And, therefore, the following scalar relationship must be true:
[r , r )
a ‘ mv £
t r
It is known that the next search direction p y+1 is a linear combination of rJ+[ and p ; . and
again using a scalar to guarantee the required orthogonality and conjugacy:
P j +i =
r ;+ i
+ fijPj
( 3 -2 7 >
Thus, because Apj is orthogonal to p ,:
(•AP j ’rj ) =( ( AP j ' P j ) - P j- i P j-i ) = (■APj ’ P j )
Now we can rewrite (3.26) to a more familiar form:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.28)
50
(3.29)
« Also, since p J+l is orthogonal to Apj we set the scalar P as follows:
* _
(rj ^ APj)
(3.30)
From equation (3.24) we can write:
AP j = ~ b i ~ rj)
(3.31)
And substitution of (3.31) into (3.30) yields:
q
'
1
a,
~ ^ ) ) _ (r7+i’r;+i)
(Apr Pj)
[rr r])
These relations compose the CG method.
The Conjugate Gradient Algorithm [96]
1.
Compute rQ= Ax0 - b , and set p 0 = rQ.
2.
For y = 0 ,l,..., until convergence Do
a-
^ ’ r'^
4.
x l„ = x l + a l p l .
5•
rJ+i = rj - a JApJ.
* _ ( r ,+Pr;J
6-
p ‘
7.
Pj+l ='-J+i+/3jPj-
8.
EndDo
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.32)
51
Importantly, in the CG method the jth iterate of the approximate solution, x J+l is
constructed as an element of the following Krylov-subspace:
-Vi 6 -ro + sPan{ro,... , A Jr0} so that (xy+I - .v)r a ( x j+1 - x)
is minimized. In this minimization x is the exact solution of Ax = b . This minimum is
guaranteed to exist in general only if A is symmetric positive definite (SPD).
The CG algorithm must store matrix A , with m rows and m columns, or be able to
form the matrix-vector products on the fly, vectors x , p , A p , and r with length m . The
computational steps are of order of 0 ( m z) for the one dense matrix-vector product, 0(m)
for the three saxpys vector updates, and O(m) for the two inner products. This means that if
there are m steps per iteration with the CG method, then the algorithm uses 0 ( m 3) flops
which produces no significant improvement over the direct methods. The computational
savings only come into play when the number of steps involved in solution is less than m or
if A is sparse or has some other special structure.
3.5.3 The Generalized Minimum Residual (GMRES) method
GMRES is the generalized minimum residual algorithm used to solve non-symmetric
linear systems of equations. It is based on the conjugate gradient (CG) algorithm which can
only be used efficiently for coefficient matrices that are symmetric and positive definite. CG
uses the symmetric Lanczos method to reduce the coefficient matrix to upper Hessenberg. In
the CG method, the basis vectors of the Krylov subspace are orthogonal, and the Hessenberg
matrix becomes tridiagonal. With methods such as BiCG, CGS, BiCGSTAB, the CG method
is changed to solve non-symmetric coefficient matrices. The orthogonalization is abandoned
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
52
for biorthogonalization basis vectors produced by the non-symmetric Lanczos process
(Figure 3.1) and the tridiagonal property of the Hessenberg is maintained. In GMRES, the
tridiagonal property is abandoned and the othogonalization is kept. Unfortunately, this means
that instead of needing to save a few of the basis vectors, all of them must be saved. A real
positive aspect of the GMRES algorithm is that the norm of the residual is minimized at
every iteration and, therefore, the convergence behavior is monotonic.
In GMRES, an orthonormal basis of the Krylov subspace is constructed and the
resulting Hessenberg matrix and basis vectors are used to construct a solution that minimizes
the residual of the system Ax = b [Saad86, Barrett94]. The Amoldi method is used to
construct these orthonormal vectors and reduce the coefficient matrix A to upper Hessenberg
form. In the Amoldi method at the rnth step, the following equation is formed:
AVm =
m+I
m
(3.33)
v
'
where Vmis a matrix whose columns are [v ,,v ,,...,v m+l ] and H m is a (m + l)x»i matrix. Vm
is an orthonormal system. The H m matrix has one more row than the H m matrix. The only
nonzero entry in this extra row is the element hm+l m in the ((/n + l),m) position. The vectors
that make up the columns of the Vm matrix and the matrix H m satisfy an important relation
that will be used to derive the GMRES algorithm.
Any vector .x can be written as:
x = x0 +Vmy
(3.34)
where y is an m length vector and Vm is the Krylov subspace vector described above. The
residual of a system of equations is defined to be r = Ax —b . We now define a norm of the
residual to be a function of y :
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
53
J (y ) H Ib - ^
|| 2= ||b ~
(3.35)
+ v my ) L
We can define /? =|| r0 ||2 and v, = rQ/ j 3 . Then we can use the relationship in
equation (3.34) to obtain:
b - A x = b - A ( x 0 +Vmy)
(3.36)
where v, is the first orthonormal basis vector and can be represented as v, = Vm+let , and
is the first unit (m + 1) vector. Now since the column vectors of VmH are
orthonormal by construction:
J (y) =|| b - A(.r0 + Vmy) ||2=|| fiex- H my ||2.
(3.37)
The GMRES approximation to the solution .v is in the Krylov subspace and is chosen
to minimize the norm in equation (3.35). So we construct the approximation as in equation
(3.33) and make y at that step to minimize the norm in equation (3.37):
where j ( y ) = argmin v || fiex - H my ||2. The determination of y m is an (m + l)x/?z matrix
least squares problem with a Hessenberg structure. It can be solved by a QR factorization in a
specialized manner using Givens rotations. Givens rotations are used to annihilate single
nonzero elements of matrices in reduction to triangular
form. The factorization of H mmis
O
updated progressively as each column appears during the Amoldi process. This allows the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
54
residual norm to be calculated without calculating the approximation, xm. Therefore, we can
choose when this norm is small enough to perform the calculation of x m. The following
algorithm results:
GMRES Algorithm [Saad96]
1. Compute r0 - b - Av0, /? =|| r0 | |, , and v, =
2. Define the (m
matrix
' + l)x/«
'
f
m ={/i„}
l tj Ji<,<m+|.i<y<m-Set H m
m =0
3. For y = l,2,...,m Do:
4.
Compute w = Avy
5.
For / = 1,2,..., j Do:
6.
ht]=( W], V])
7.
wi = w 1 - h llvi
8.
EndDo
9-
h j + i =11 w j I I - I f
= 0
s e t m = J a n d g o to 12
w,
1°.
y+i-y
11. EndDo
12. Compute
the minimizer of || @ex - H my ||2 and x m = x0
The computation costs for the GMRES depend on the number of steps involved in the
solution. At each step m , there is one matrix-vector product, m +1 inner products and m + 1
saxpys. The storage also increases with the step number because all of the orthogonal basis
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
55
vectors must be saved. This means that for a matrix of order n , the storage requirement is the
storage for the matrix plus (m + 5)*n .
GMRES is the only Krylov method discussed here that minimizes the true residual at
each step of iteration. It, therefore, has monotonic convergence. GMRES involves only one
matrix-vector product with the coefficient matrix, instead of the two required by the previous
algorithms. The main disadvantages of GMRES are that the amount of storage and operation
count increase as the iteration progresses. A solution to this is to restart the GMRES method
after a fixed number of steps and use the final answer to the first series as the starting guess
of the next. Unfortunately, when to restart GMRES is a difficult to determine.
3.5.4 BiConjugate Gradient (BiCG) method
When the coefficient matrix of a system of equations is not symmetric positivedefinite, the CG algorithm cannot efficiently solve the system of equations. In the CG
method, an orthonormal basis for the Krylov-subspace is generated during the reduction of
the coefficient matrix to tridiagonal form. Two basic approaches to using a conjugategradient-like method are possible (Figure 3.1). The first is to abandon the tridiagonal
formation and form an upper-Hessenberg matrix where all the basis vectors need to be saved.
This approach is used in GMRES. The second approach to use is to abandon the orthogonal
basis for a biorthogonal one. The latter approach is used for the BiCG, QMR, CGS, and
BiCGSTAB algorithms, which are all based on the CG method.
The BiConjugate Conjugate Gradient (BiCG) algorithm is based on the reduction of
the coefficient matrix A to tridiagonal form. The basis vectors formed to accomplish this are
generated by the non-symmetric Lanczos method. In the non-symmetric Lanczos algorithm,
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
56
two sets of residual and direction vectors are generated. In CG, the residual vectors were
orthogonal to all the previous residual vectors, and the direction vectors were A-conjugate to
the previous direction vectors. Now, with a non-symmetric coefficient matrix, we cannot
maintain these relationships and produce a tridiagonal matrix for A. Instead, another set of
residual and direction vectors are generated. They are called the “pseudo-residuals” and the
“pseudo-direction” vectors. The following relationship is insured by the non-symmetric
Lanczos method:
(3.38)
where ry is the “pseudo-residual” vector and r] is the residual vector. Equation (3.38) means
that the pseudo-residual is orthogonal to the previous residual vectors and that the residual
vectors are orthogonal to the previous pseudo-residuals. This is called the biorthogonality of
the residuals.
Another relationship that the non-symmetric Lanczos also maintains is the
biconjugacy of the search directions. This means that:
(3.39)
In other words, the pseudo-direction vector is A -conjugate to the previous direction
vectors and the direction vector is A -conjugate to the previous pseudo-direction vectors.
Using the two relationships in equations (3.38) and (3.39), the BiCG method can be
derived in much the same way as the CG method. The vector of the iterated approximation to
the solution can be represented as:
(3.40)
Although not usually solved for there is another system that can be solved for and is included
here for the derivation:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Then, both of the residual vectors can be represented as:
(3-42)
rj+i = r j ~ a j AP,
= 7j - a , ATp, -
<3-43)
If the two residual vectors are to be orthogonal then that means that:
((r, -
<3-44)
a , A P j l ?, ) = Q
(rj ' 7j - a JAP j ^ j ) = °
Therefore, the following scalar relationship must be true:
(3.45)
'
( * P r 7j)
We also know that the next search directions are a linear combination of the residual
and previous search direction. This relationship is represented below with the proper scaling
of the previous direction:
p J+l =rJ+l
(3.46)
PJ+X =rj+l+/3Jp J
(3.47)
A consequence of the above relations is that:
(■A P j ' 7j ) = (•A P j ’ ( P j ~ P j - \ P j - i ))
where equation (3.47) is transformed to the r residual. Expanding this equation:
{APj’Pj ~ A p r p ]_lp J_l ) = 0.
We note that:
(a P ] ' P , ) =( aT P,iP,) = 0
f°r j <i ■
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3.48)
and therefore,
{AP r 7i ) = ( AP r P j )
Then:
M)
a. =
'
(APj,Pj)
In addition, writing that the relationship in (3.48) that p J+i is orthogonal to A Tp j
{pJ+i^ TPj) = 0
where p J+l = r , + f i t p j . Substituting in (3.51):
((rj+i + P s Pj \ p , ) = (r;+I, A Tp ; )+ (/?, p J, A Tp J) = 0
so that:
From equation (3.47) we find:
a
TP j
~ 7, ) -
and also looking at the denominator of equation (3.52) we can see:
( / V ^ r Py)=((r, + 0 j - \ P, - \ \ a TP i )
With the condition in equation (3.39), we see that:
(p] , A Tp ] )=(rJ, A Tp ] )
We can now substitute (3.53) and (3.54) into (3.52) to yield:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Combining these relationships results in the following algorithm for BiCG.
Biconjugate Gradient (BiCG) Algorithm [Saad96]
1.
Compute rQ :=b —Av0. Choose r0 such that (r0, r0) * 0 .
2.
Set, p Q:= r0, p 0 := r0
3.
For j = 0,1...., until convergence Do:
(APj'P,)
5.
x J+l= X j + a ]Pj
6.
r]^ = r J - a i Ap]
7.
rJ+l = r] - a ]A Tp ]
P> ~ T r ^ T
9.
p jyl = ry>l + P j P j
10-
P j+i = 7j+i + P j P,
11.
EndDo
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
60
The storage used for the BiCG algorithm includes the coefficient matrix A and its
transpose, along with vectors of length m: .r, p , p , Ap, A Tp, r, r . The computational
resources used are two matrix-vector products of 0 ( n 2) , five vector product updates
(saxpys) of cost 0 ( n ) , and two inner products of O(n) flops.
There are two main advantages to BiCG. First, a tridiagonal matrix is generated, and
the system is relatively easy to solve. Second, the tridiagonal coefficient matrix requires only
a few of the generated vectors to be kept in storage. One disadvantage of the BiCG method,
however, is that breakdowns sometimes occur in the solution. Because there is no
minimization of a norm as in the CG method, the convergence behavior is irregular. While
methods like GMRES require one matrix-vector product per iteration, BiCG requires two
matrix-vector products. The biggest disadvantage is that the transpose of the coefficient
matrix A is needed, and this matrix is not always readily available.
3.5.5 Conjugate Gradient Squared (CGS) method
Based on the BiCG method, the conjugate gradient squared algorithm (CGS) is used
to solve systems of equations where the coefficient matrix is not required to be symmetric,
positive definite as in the CG algorithm. The CGS method is based on the generation of a
sequence of iterates whose residual norms r' are formed by the squaring of the residual
polynomial:
r ' j = 0 j ( A )ro
( 3 -5 6 >
The CGS method does not require a matrix-vector product with the transpose of the
coefficient matrix. However, it still requires two matrix-vector products per iteration step as
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
61
in BiCG, but both products are with the standard coefficient matrix. Convergence behavior
with CGS is often much better than the behavior seen with the BiCG method. The residual is
not minimized for this method, and, therefore, the convergence can be very erratic.
The conjugate gradient squared algorithm is a Krylov-subspace method that was
developed to overcome some of the shortcomings of the BiCG algorithm. The shortcomings
that are addressed include erratic convergence and needing to have the transpose of the
coefficient matrix. CGS is based on the non-symmetric Lanczos process where we write the
residual vector at step j of the BiCG process as the product of r0, the initial residual, and a
jth degree polynomial in A as [Saad96]:
rj =<Pj {A)r 0
(3.57)
where Qj is a certain polynomial of degree j satisfying the constraint <p] (o) = 1.
This same polynomial satisfies the pseudo residual of BiCG:
Similarly, the conjugate-direction polynomial ^ry(f) is given by:
Pj = n J(A)r0,
(3.58)
in which 7tJ is a polynomial of degree j . We can then write the pseudo-search direction as:
(3.59)
The derivation of the CGS method relies on simple algebra. It is based on the BiCG
method with polynomial representation of the residual and search direction vectors. First, the
recurrences for the squared polynomials are derived. If we define the polynomials by
recurrences, we would write:
(3.60)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
62
x ]+x{t) = 0J+l (t) + /3J7TJ(t).
(3.61)
We can then square equations (3.60) and (3.61) to get:
0;-. (0 = 0) (0 - 2a Jt7r, i{)0, (0 ■+
f'2A
(0
(0 = 0,' (')■+20J0J+1 ( 'K (0 + / ? > ; (O'■
(3-62)
(3-63)
If we introduce one of the cross terms, 0 x{t)n, (0 as a third recurrence, then the
equations in (3.62) and (3.63) would form an updateable recurrence system. For the other
cross term, we use the following:
0, (<hj (0 = 0, {t)(<p, (0 +P , - Xn,_i (0) = 0j(') + Pj-iQ] ('K-. (0
(3-64)
Then if we combine all the terms (leaving out the variable (t) for clarity):
0U = 0) - & A 20 j + 2PJ-X0!71-j-i - 0Cjt7c) )
(3.65)
0 J ^ J = 1 + ^ , - i ^ y - i ~ a jtA
(3-66)
A
(3-67)
h
= A*i + 2
+A A
The recurrences defined above are the basis of the CGS algorithm. Now, if we define:
rj = 0 ] ( A Vo
(3-68)
P , = A ( A )ro
(3-69)
<7, = 0j+x { A 71} iA )ro
(3-70)
If we write the recurrences for the equations (3.68) - (3.70), then we have:
V> = rj - c C j A 2rj + 2Pi-rtj-i ~ a , AP j )
(3-71)
4j = rj
~ a j APj
(3-72)
p J+l = rJ+l + 2 P jq j + A P,
(3-73)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
63
The auxiliary vector is defined to be:
d l =2rl +2 f i r i q l^ - a l Apl
(3.74)
We can then use the following operations to compute the approximate solution, starting with
r0 := b - Ar0, p0 := r0 , qQ:= 0 , /30 := 0.
a . W
'
(APj’ Pj)
d j = 2r] + 2 P J_xq i_i - a ]A p J
q, = r, + / ? , _ , - a ]A p j
'V . = 'r ; + a jd j
rj +i =r>~ a j A d J
a _ ( r ,+1, r , J
p nl = rJ+l + P ; {2q ] +
Usually, a simplification of the algorithm is constructed by using another auxiliary vector:
i i j = rJ + P ]_lq ]_l
Then the following relationships are true:
d ,
=
u j
+ < J j
qj = Uj ~ a j APj
p ]+l = u J+l +/3J(qj + f i Jp J),
so that vector d ] is no longer needed. The algorithm is listed below.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
64
Conjugate Gradient Squared Algorithm [Saad96]
1.
Compute r0 := b - Ax0; r0 arbitrary.
2.
Set Pq := r0 := u0.
3.
For j = 0,1,2,..., until convergence Do:
1
-r
-
a‘ ~ W
^
5.
q j = Uj —CC] A p J
6-
Xj+x = X j + a j {uj + q , )
7-
rJ+l = r t - a j A(uI +<?,)
„ _ ( r >+I,ry>1)
9.
*<y+l = ry+1 + P jqj
10-
Pj +l = « y+I
11.
+Pj P>)
EndDo
A matrix-vector product with the transpose of A is not required which is important if
the transpose of A is not readily available. Instead, two matrix-vector products with A are
computed at each iteration. One step of the CGS algorithm uses about the same number of
arithmetical operations as one step of the BiCG algorithm. Again, as in BiCG, the residual is
not minimized and, consequently, produces erratic convergence. In some cases the error is
even magnified instead of reduced. Besides the two matrix-vector products with A , there are
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
65
13n flops for vector updates and two inner products. Memory requirements are storage for
x ,b , r , and A , and in addition four additional n -vectors r , p ,u ,q .
As mentioned above, there is no matrix-vector product with the transpose of A
preformed. Instead, two matrix-vector products are done with the matrix A . One can
compare this to the GMRES algorithm that has only one matrix-vector product. The CGS
algorithm converges twice as quickly as the BiCG algorithm. Unfortunately, it also diverges
twice as quickly. The residual vector is calculated from the squaring of the polynomial and,
hence, the rounding errors can be greater, resulting in an inaccurate answer.
3.5.6
Q uasi-M inim al Residual (QM R) method
Two essential parts of the Krylov-subspace methods involve the construction of a
suitable basis for the Krylov-subspace and the formation of the solution iterates. In QMR, the
non-symmetric Lanczos method is used to form the basis (Figure 3.1) [Bucker94, Barrett94].
The Lanczos algorithm produces the matrix Vn = [v,v2 •••vn]e |C" whose columns are n
basis vectors spanning K„(r0, A). The QMR iterates are of the form:
x n = x Q+ Vnz„,
zn e C"
(3.75)
for some parameter vector zn. From the definition of biorthogonalization, we have
AVn = Vn+xTn, so the corresponding QMR residual is:
rn =Vn+l( f a - H nzn)
where J3 = ||Z?||, e = (l,0,...,0)r is used for the first unit vector in R n+l , and H n e
(3.76)
c(n+l)xn
denotes an upper Hessenberg matrix generated by the Lanczos algorithm. Rather than
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
66
minimizing the norm of the residuals, which is desirable but too expensive in terms of work
and storage, the QMR approach merely considers the expression in parenthesis from (3.75).
The QMR iterates are then defined by (3.74) where the parameter vector x n e C" is chosen
as the solution of the least squares problem:
=
(3.77)
where z „ s C " . Note that a true minimization of the residual norm does not take place, but a
different, data-dependent norm is minimized, which hopefully is not much different that the
true norm. The general method for QMR is shown below.
Quasi-Minimal Residual (QMR) algorithm [Saad96]
1. Compute r0 := b - Ax0 and y 0 :=||r0|| , vv, := v, := —
yi
2. For m = 0,1...., until convergence Do:
3.
Compute CCm,S m+l and vm+I,wm+1 by Lanczos biorthogonalization
4.
Update the QR factorization of Tm
5.
Apply rotation Q m, to Tm and g m
6-
Pm = ( vm~ Z " P ' ) / r ™
7.
x m = x m_l + y mPm
8-
if
|rm+[| is small enough then Stop
9. EndDo
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
67
The choice of an adequate stopping criterion is an important aspect of any
implementation of QMR or other method. Usually, the decision whether an iterate xn is close
enough to the solution is based on the residual r0 = b —Ax0. QMR does not compute the true
residual at each iteration, so the “other” residual is watched and when small enough the true
residual is calculated.
Some disadvantages of BiCG can be overcome by using the quasi-minimal residual
method. QMR shows smoother convergence and can be implemented with look-ahead to
avoid the breakdown that can occur (with the LU factorization) in the BiCG method. Like
BiCG, each step of QMR computes two matrix-vector products, one with the coefficient
matrix and the other with its transpose. It also has two inner products, and 12 saxpy
operations.
3.5.7
B iC onjugate G radient Stabilized (BiCG STAB) method
The BiCGSTAB method is similar to CGS in that it is a transpose-free variant of the
BiCG method [van der Vorst92], In the CGS method, the residual polynomial is squared
while in BiCGSTAB the residual polynomial is the product of an arbitrary polynomial and
the residual polynomial used in CGS with the initial residual:
r' = Vrj (A)0/ (A)ro
where
is the residual polynomial associated with the BiCG algorithm and
(3.78)
is the
new polynomial. This new polynomial is defined to smooth out the convergence of the CGS
method and describes a steepest descent update. The new polynomial, y/j (r), is defined by
the following recurrence:
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
68
0Vi(O = (l - a ; / V y(0
where the scalar
Q)}
is determined to achieve a steepest descent step in the residual direction.
The derivation of the recurrence relations is similar to the CGS method. [Saad96].
BiCGSTAB algorithm [Saad96]
1. Compute r0 : = b - A x 0; rQ is arbitrary;
2.
p 0 := r0
3. For y = 0 ,l,..., until convergence Do:
4.
5.
s J =rJ - a j ApJ
6 .
^
7.
x J+l = Xj + a jPj + cojSj
8.
r J+l =
sj
-
coj A s j
9.
Vj ’ ^0/
10-
/V i = V ,
<3-79>
+ fiJ ( p J ~ (O j A P j )
11. EndDo
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
69
For the BiCGSTAB method there are two matrix-vector products with the matrix A ,
four inner products (two more than the CGS method), and 12 linear computational operations
on vectors (i.e. vector updates). The storage needed is the n x n matrix A and 10« more.
The BiCGSTAB algorithm does not use the transpose of the matrix A in the
calculation of the recurrences. This is advantageous for cases where the transpose of the
matrix A is not readily available. It is constructed to smooth out the convergence of the CGS
method. The method can also be thought of as a product of the BiCG method and the
GMRES method. The residual vector is minimized locally by this GMRES local step.
However, when the GMRES step stagnates the residual vector is not minimized and the
algorithm breaks down.
3.6 Summary
If the standard Gaussian elimination type direct method is used to solve for the iterate
formed in the shooting-Newton method, the number of equations that can be in that system is
limited. These direct methods use 0 ( n l ) operations to solve the system of equations and
0 ( n 2) computer storage, which becomes much too costly for larger systems of equations that
are generated by large circuits. Using a direct method, with no round off or other errors, the
exact solution is obtained after a finite number of operations. When round-off errors arise
and large systems of equations are being solved, the errors increase and can make the results
unacceptable. All the steps in a direct method such as Gaussian elimination must be done
before a solution is obtained. The algorithm cannot be stopped before the end with a “less”
accurate solution. If the matrix has a special structure, it is not always preserved with a direct
method. For example, a sparse matrix is filled in during Gaussian elimination.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
70
An alternative approach is to use a non-direct (iterative) method. An iterative method
starts with an approximate solution and uses a recurrence formula to calculate another
approximate solution. The formula is applied repeatedly until a solution is deemed “good
enough”. Although the iterative methods can solve the system of equations in a finite number
of steps, they can be stopped before they reach the exact answer. Their cost in terms of
computer operations and storage can be much lower than the cost for direct methods which
involve 0 ( n 2) storage and 0 ( n 3) flops. Depending on the iterative method used and the
accuracy desired, the costs could be lowered to O(n) . Further, iterative methods are also
more insensitive to the propagation of errors [Greenbaum97].
The two types of iterative method are stationary and non-stationary. The stationary
methods are efficient for diagonally dominant coefficient matrices. This property is not
generally found in the coefficient matrices for steady-state determination. A more attractive
method is the Krylov-subspace methods [Saad86, Vuik92]. These include CGS, BiCG,
QMR, GMRES, and BiCGSTAB, discussed above. The computational aspects of the Krylovsubspace methods are presented (Table 3.1). GMRES has the least operations when the
number of iterations is low. QMR and BiCG need the transpose of the matrix, while CGS and
BiCGSTAB do not need the transpose, but require two multiplications of the original matrix.
The storage of GMRES increases as the iteration number increases because the orthogonal
vectors must be stored.
The convergence behavior varies for the different Krylov methods. For some
methods, such as CGS and BiCG, the residual is not reduced at each step of the iteration. The
GMRES, QMR and BiCGSTAB algorithms can show fast convergence. CGS can have fast
convergence, but its convergence can be very irregular.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
71
METHOD
INNER
PRODUCT
SAXPY
CG
GMRES
BICG
QMR
CGS
BICGSTAB
2
k+ 1
2
2
2
2
3
k+ 1
5
8+4
6
6
MATRIXVECTOR
PRODUCTS
1
1
2
2
2
2
STORAGE
REQUIREMENT
Matrix
Matrix
Matrix
Matrix
Matrix
Matrix
+
+
+
+
+
+
6n
(k + l)n
lOn
16n
1In
lOn
Table 3.1 Computational aspects of Krylov-subspace methods.
The Biconjugate Gradient (BiCG) method generates two sequences of vectors that are
made mutually orthogonal (i.e. biorthogonal). One of the sequences is generated by the
original coefficient matrix, while the other is generated by the transpose of that matrix. BiCG
uses much less storage than GMRES, but its convergence can be irregular. It does not
minimize the residual or the error at each iteration as does GMRES. BiCG also needs two
matrix-vectors products at each iteration.
The Quasi-Minimal Residual (QMR) method applies a least-squares solution and
update to the BiCG residuals. QMR uses less storage than GMRES and does not minimize
the residual or error, but minimizes a different data-dependent quantity that is usually not so
far from the residual. Typically, QMR has better convergence than BiCG and it requires
matrix-vector multiplications of the original matrix and its transpose at each iteration step.
The Conjugate Gradient Squared algorithm (CGS) is used to solve systems of
equations where the coefficient matrix is not required to be symmetric, positive definite as in
the CG algorithm. Although CGS is derived from the BiCG method, it does not need the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
72
transpose of the coefficient matrix. The CGS method is based on the generation of a
sequence of iterates whose residual norms r ' are formed by the squaring of the residual
polynomial (3.55). The CGS method does not require a matrix-vector product with the
transpose of the coefficient matrix. However, it still requires two matrix-vector products per
iteration step as in BiCG, but both products are with the standard coefficient matrix.
Convergence behavior with CGS, because of the squaring of the residual polynomial in
(3.55), is twice as fast (or twice as slow). The residual or error is not minimized for this
method and, therefore, the convergence can be very erratic.
The BiCGSTAB method is similar to CGS in that it is a transpose-free variant of the
BiCG method. In the CGS method, the residual polynomial is squared, while in BiCGSTAB
the residual polynomial is the product of an arbitrary polynomial and the residual polynomial
used in CGS with the initial residual (3.77). This new polynomial is defined to smooth out
the convergence of the CGS method and describes a steepest descent update.
The GMRES method abandons the tridiagonal matrix for true orthogonalization.
Because of this, the residual is minimized at each step. It only requires one matrix-vector
product per iteration step, but requires the storage of all the orthogonalization vectors up to
the present iterative step. When the number of iterations is small, this storage issue is not a
problem and the GMRES method is the method of choice.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
73
Chapter 4
Implementation of the Shooting-Newton Algorithm
Using Krylov-Subspace Methods
This chapter describes the practical algorithms for the implementation of the
shooting-Newton algorithm for steady-state determination using Krylov-subspace methods.
These algorithms have been implemented and tested in SPICE 3f5, the most popular analog
integrated circuit simulation program. Two main aspects of implementation are discussed
below, along with the implementation details for the Jacobian needed for the shootingNewton iteration and the use of the Krylov-subspace methods for solution of shootingNewton iteration. The addition of an analysis to SPICE is well documented [Nagle78,
Ashar89, Quarles89c, Quarles89d, Cohen] and is expedited by the modular structure of the
SPICE code.
The implementation of the shooting-Newton algorithm depends on the efficient
formation of the Jacobian, also called the sensitivity matrix. Using quantities already
available during transient analysis, the matrix can be easily formed [Colon73]. The structure
of that matrix, along with its eigenvalues, can give insight into the nature of the nonlinear
system and the ease of solution. The integration method used during transient analysis also
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
74
impacts the steady-state determination. The importance of using a method like the
trapezoidal method is shown.
Another important implementation issue is the use of Krylov-subspace methods. The
important issues include stopping criteria, robustness, breakdowns, non-convergence and
memory and operation counts.
4.1
Implementation in SPICE
The implementation in SPICE is relatively easy. An augmented version of SPICE 3f5
in which algorithms for the computation of the periodic steady-state including Krylovsubspace methods for solution are implemented. The new analysis method is modeled after
earlier work [Ashar89]. When using the steady analysis, one can specify the type of circuitautonomous or nonautonomous, the time period, the minimum number of time points per one
period of transient analysis, and the desired accuracy with which the convergence to steadystate is tested.
When the steady-state analysis is called, an initialization routine is called to set up the
data structures and initialize the variables used in the analysis. The user may specify the
minimum number of time points per simulation period, the accuracy desired, the amount of
damping, if any, of the Jacobian and the method of solution of the iterate: Gaussian
elimination, BiCG, QMR, BiCGSTAB, CGS, or GMRES.
The program will then perform a few periods of transient analysis so that the initial
transients can die down some. Four periods is sufficient for the examples used. This also
allows for observation of the transient response for chaotic or unstable responses. Next, the
error, or the difference between the states of the capacitors and inductors is calculated. If the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
75
PSEUDOCODE FOR IMPLEMENTATION IN SPICE
1. Initialize and set up data structures, run three periods of the transient analysis
2. For j = I...maximum iterations
3.
if (CalculateSensitivityMatrix = TRUE)
4.
do one period of transient analysis
5.
calculate sensitivity matrix at the same time
6.
else (CalculateSensitivityMatrix = FALSE)
7.
do one period of transient analysis
8.
Compute error and determine if in steady-state
9.
if (Check = TRUE)
10.
if (steady-state) return
11.
else if (error > threshold)
12.
Check = FALSE
13.
CalculateSensitivityMatrix = FALSE
14.
else if (Check < threshold)
15.
calculate new initial state using state-transition function and Jacobian
16.
apply shooting method for calculation
17.
use different Krylov methods to solve iterate
18.
CalculateSensitivityMatrix = FALSE
19. if (Check = FALSE) & (error < threshold )
20.
Check =TRUE
21.
CalculateSensitivityMatrix =TRU E
22. End For loop
error is small enough, then the next loop will calculate the sensitivity matrix at the same time
the transient analysis is done. Note that the state transition function is merely the result of
one period of transient analysis using a specified initial condition. The Jacobian, also called
the sensitivity matrix, is computed at the same time using sensitivity circuits. Since the same
quantities are required for its computation as are generated by the transient analysis, the
calculation occurs step by step with the state transition function. Once the Jacobian and the
state transition function is available, then the functions that solve for the iterate are called.
After the initial conditions are determined (conditions that may put the circuit into steadystate) the circuit is simulated for another period and again the error is calculated. It is
checked to see if the steady-state response is reached, whether it is too large to calculate a
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
76
new iterate for a new initial guess, or if it needs to be simulated for another period and the
error checked again. This algorithm is looped through until there are too many iterations, or
the circuit is in steady-state.
4.2
The Jacobian m atrix for the shooting-N ew ton m ethod
Calculation of the Jacobian matrix, also known as the sensitivity matrix, is done using the
efficient method described in Chapter 2. It is calculated step by step along with the transient
analysis because quantities used for Jacobian formation are needed by the transient analysis.
After one period of simulation, the state transition function and the Jacobian are formed. One
can then solve the iterate in equation (3.13) and determine if another iteration should be
done. Although this method is more efficient than using the integration method, the cost of
forming the sensitivity matrix for the shooting-Newton iteration is still 0 ( n 3) .
4.2.1 Minimizing number of sensitivity matrix formations
Since the formation of the sensitivity matrix is so computationally expensive,
practical implementations must be used to make sure the sensitivity matrix is computed only
when needed [Colon73]. The shooting-Newton algorithm has quadratic convergence if the
initial guess is close enough to the solution. If the initial guess is not close to the solution,
then overshoot may occur or if many solutions exist, the method can find the wrong one. The
shooting-Newton algorithm implemented in SPICE has strong convergence properties
because it is based on transient analysis which can choose nonuniform time steps to control
the error and hence convergence [Kundert90]. This shields the outer iteration from the non-
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
77
linearity of the problem. To ensure that the guess is close to the solution a number of
practical adaptations of the shooting-Newton algorithm is made.
First, the transient analysis is run for four periods before it is checked to see how
much difference there is between cycles. This will allow fast transients to die off. It also
allows the observation of the transient analysis response for chaotic or unstable responses. If
the error is small enough then the sensitivity matrix is calculated for the next transient
analysis cycle along with the state transition function. The iteration of the shooting-Newton
method can proceed with the error being checked and repeating transient analysis and
sensitivity calculations where dictated.
Secondly, the sensitivity matrix is only formed when the error between transient
analysis cycles is smaller than a threshold value. The transient analysis simulation continues
until the threshold value between cycles is reached and then during the next transient analysis
cycle the sensitivity matrix is computed at each time step. The shooting-Newton method is
then used to calculate a new set of initial values and checked by another transient analysis
cycle to see if the circuit is in steady-state.
4.2.2 Damping the sensitivity matrix
Another important practical implementation in the shooting-Newton method in
SPICE is to dampen the shooting-Newton Jacobian so that a guess cannot overshoot the
solution. Damping reduces the effect of the Jacobian on the iteration. [Fan75, Colon73] The
dampening coefficient is calculated by:
(4.1)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
78
where T] is the dampening coefficient, m is user specified and e is an error measure
specified as:
e = max(||0(.t(ro), t0, T) -
.y ( o |)
(4.2)
The error measure gives an indication of the how close the initial condition guess is to
the steady-state response. Inspection of equations (4.6) and (4.7) show that with a large error
the effect of the Jacobian is small and with a small error the full Jacobian is used in the
calculation of the iterate.
4.2.3 Importance of integration method
An important aspect of the transient analysis and shooting-Newton algorithm is that
the method of integration assures the accuracy of the solution. If the backward or forward
Euler is used, the error can be unacceptable [Nagle75]. Therefore, the implementation uses
the trapezoidal integration method exclusively for the transient analysis and shooting-Newton
method. The user may also set the minimum number of points to be simulated in each period.
This sets the maximum time step which can aid in convergence and accuracy. Setting the
time step too small can cause convergence problems because of the chances of trying to
analyze a discontinuous region.
4.2.4 Eigenvalues and sparsity pattern of the Jacobian
The structure and eigenvalues of the sensitivity matrix can give insight into the
difficulty involved in solving for the iterate formed at each shooting-Newton application.
Sparsity patterns for many circuits were examined (Figure 4.1 and Figure 4.2).
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 4.1 Sparsity pattern of the sensitivity matrix for an example circuit of a
multiplier.
As can be seen from the examples in Figure 4.1 and Figure 4.2, there are a variety of
nonzero patterns for the circuits examined. This pattern of the sensitivity matrix is repeated
for each iteration of the shooting-Newton method. Techniques to use this repeated pattern for
faster matrix-vector multiplications was implemented, where the matrix entries were stored
in the sparse matrix format.
Eigenvalues were determined on the sensitivity matrices arising from some of the
typical circuits that benefit from the steady-state shooting-Newton analysis. The eigenvalues
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
80
0|
1
.
.
j
.
.
•
1
•
•
•
1
.................................................................................................... *
•
•
•
•
•
•
5k
!
!
1
-
............................................................................ .........
•
•
•
•
•
•
•
•
•
•
•
j
•
• • • • • • • • • • • • • • • • • • • •
• • • * • • •
i
-j
1111111
!!!!«!!!!!!!
• • •
10k
• • • • • • • • • • • •
• •
• • • • • • • • • • • •
• •
• • • • • • • • • • • •
i
15k
-!
• • • • • • • • • • • • • • • •
• • • • • • • • • • • • • • •
'' m i y.iy.y. ii
20-
• •
25-
•
•
•
•
•
•
•
•
•
-
• • • • • • • • • -
30-
ii •
•
•
• • • • • • • • • • • • • • • • • • • • • • • • • • • • •
0
5
10
15
nz = 474
20
25
•
• •
[
30
Figure 4.2 Sparsity pattern of the sensitivity matrix for a buck converter.
were found to have a negative real part (Figures 4.3 and 4.4). Figure 4.3 describes a circuit
with eleven states and a sensitivity matrix of dimension 11x11. In Figure 4.4, the buck
converter has a 32x32 sensitivity matrix.
The Krylov-subspace solution exists in a lower dimension if the minimum
polynomial of A is smaller. The minimum polynomial q(t) of A is constructed from the
eigenvalues of A [Ipsen98]. The smaller the degree of the minimal polynomial, the smaller
the dimension needed for the description of the inverse of A . Therefore, the faster the
method will converge.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
81
Eigenvalues for Sensitivity matrix for C lass C Amplifier (11 sta te s)
0 .1
,
1
;
:
1
,
1
:
1
1
*-
0.08 I-
-
I
!
0.06r
I
:
;
~ -
0 . 0 4 !-
i
;
0 .0 2 r
of
! •i
0 .0 2
■
h
!
-0.04 r
•0.06 j-
_
-
-0 .0 8 1-
-0.1-11
;
-0.9
;
-0.8
:
-0.7
;
-0.6
;
-0.5
1
-0.4
:
-0.3
1
-0.2
:
-0.1
^
0
Figure 4.3 Eigenvalue Spectrum for Sensitivity Matrix for Class C Amplifier.
For the limited number of examples examined, the clustered spectrum of the
eigenvalues shows that we will not need a preconditioner to aid with convergence. The wellbehaved eigenvalues result from the shooting-Newton being a two level Newton algorithm.
The variable time step in the inner Newton protects the outer shooting-Newton from the
nonlinearities present in the original circuit. The circuit looks almost linear and, therefore, the
shooting-Newton iteration converges very rapidly.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
82
Eigenvalues for Sensitivity matrix for Buck Converter (32 states)
1
1
1
:
:
:
1
:
0 .8 -
-I
0.6 fi
0.4 -
0.2 -
-!
0-
0.2
-*
r
-
-
-0.4 r
I
I
- 0 .6 r
-i
;
!
:
•0.8ij-1
I
-1.4
'
!--------------------------------------:-------------------------------------- 1--------------------------------------;--------------------------------------:--------------------------------------:--------------------------------------- !
-1.2
-1
-0.8
-0.6
-0.4
-0.2
0
Figure 4.4 Eigenvalue Spectrum for Buck Converter.
4.2.5 Implicit matrix options
The sensitivity matrix is generally dense. A sparse matrix structure would make its
formation cost less. An alternative method of solution is to not form the sensitivity matrix
and to store all the components needed for its formation, the capacitance values, and the
factored transient analysis Jacobian [Telichevsky96a]. Then, when the solution of the iterate
is carried out, the required matrix vector multiplications are done with the same sparse
method used in transient analysis. This can reduce the operation count to 0{ k n ) (k is the
number in the sequence) instead of the 0 ( n 2) . One drawback of the method, however, is that
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
83
all the needed items must be stored for each time step, and the amount of storage can be very
prohibitive. For the shooting-Newton implementation here, the sensitivity matrix is
calculated at each transient analysis time step.
4.3
K rylov-subspace linear solver optim izations
The implementation of a Krylov-subspace linear solver can be accomplished with a
relatively small amount of code. What is really needed is an efficient implementation of these
methods. The main concerns of efficient implementation are the stopping criteria for ending
the iteration, and issues around convergence and non-convergence within the maximum
number of iterations. It is also important to handle breakdowns of a method, if they occur. An
iterative method can stagnate with two successive iterations the same. This problem must be
dealt with or no progress will be made in the iteration. The computational issues of memory
usage and operation counts must also be addressed. This section will consider these issues
and more to arrive at an efficient implementation of the Krylov-subspace iterative methods in
the determination of the steady-state response of integrated circuits with periodic inputs.
4.3.1 Stopping criterion
The choice of an adequate stopping criterion for the Krylov-subspace methods is an
important aspect of efficient implementation. A good stopping criterion should [Barrett94,
Greeen, Kelley] identify when the error is small enough to stop. It should also stop if the
error is no longer decreasing or decreasing too slowly. Lastly, the number of iterations should
be limited, for there is no guarantee that the Newton method will converge. In fact, the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
84
shooting-Newton method is only locally convergent and, in general, its convergence depends
on the initial guess being close to the solution. Here, the decision whether an iterate xn is
close enough to the solution is based on the residual:
rn = b - A x n.
(4.3)
Unfortunately, in QMR the quasi-residual is calculated. The real residual shown in
equation (4.8) must be calculated separately to test for convergence. This adds another
matrix-vector multiplication to the algorithm. An efficient implementation is to calculate the
real residual only when the quasi-residual is smaller that a specified value.
The iteration is ended when the real residual is smaller that the user specified
tolerance for all the Krylov-subspace methods that were implemented, BiCG, CGS,
BiCGSTAB, QMR, and GMRES. Importantly, for GMRES, the calculation of the iterate is
only done when the residual is less than the specified value. This saves the expensive
operations involved in its calculation.
4.3.2 Maximum number of iterations
Another stopping criterion for the Krylov-subspace methods is the restriction of the
maximum number of iterations that can occur. This was set at n because the size of the
matrix is n x n . The GMRES algorithm has been shown [Saad86] to converge in at most n
steps. Since the Krylov-methods were being compared, the worst case of n iterations for an
n x n matrix was allowed. When restarts were implemented the maximum number of
iterations was reset to n .
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
85
4.3.3 Breakdown of the method
BiCG, CGS, BiCGSTAB, and QMR all have the possibility of a breakdown of the
method. This means that some scalar quantity has been calculated that is either very close to
zero or is extremely large. Looking at the pseudo-code in Chapter 2 for BiCGSTAB, the
breakdowns occur because of a divide by zero.
BiCGSTAB algorithm
12. Compute r0 := b - Ar0; r0 is arbitrary;
13. p Q:= r0 .
14. For y' = 0,l,..., until convergence Do:
- MJ
„
'
16.
(APj^o)
s J = r ] - a ]ApJ
a.
[ASj, As ] )
18.
x j+l = x ] + a Jp J +co]S]
19 •
ri ^ = s j - Q)j AsJ
20.
v j ’ ^0 /
21.
<»,
Pj+i =r J+l + P ] {pj -C0 JAp J)
22. EndDo
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
86
There are four types of breakdowns [Greenbaum97]. The first is because of an
invariant subspace. The Krylov space K ^ A 7,r0) is A r invariant, so that the new dual
residual rt in the method is a linear combination of all the previous dual residuals, and thus
orthogonal to ry. The denominator of the line search parameter ory is zero. The second
breakdown occurs because the dual residuals are orthogonal. The biorthogonal methods
breakdown at a division by zero in the computation of the biorthogonalization scalar
. The
other types of breakdown occur when no one-dimensional or two-dimensional minimization
is needed and the line search parameter co or
is zero. The breakdown for the examples in
the dissertation presented with (Oj = 0. The breakdown was avoided in these experiments by
implementing an initial guess not equal to zero. Once the different initial guess was
implemented the methods did not use the available restart. Usually, this restart will allow the
algorithm to converge. BiCGSTAB, if interpreted as the product of BiCG and repeated
GMRES, is also subject to the same kind of breakdown as BiCG. CGS has the same potential
for breakdowns as BiCG and the method is restarted with the current approximation. Full
QMR employs look-ahead strategies to avoid the Lanczos process breakdowns. A simpler
version of QMR was implemented here and is, therefore, still susceptible to the Lanczos
process breakdown. Again, the algorithm is restarted and usually converges. Another fix that
was implemented in BiCGSTAB was a restart with an initial guess that was not the usual
zero vector. On simulations where the breakdown of the method was occurring this nonzero
initial guess allowed the iteration to continue to convergence.
GMRES is not subject to the kind of breakdowns that plague the bi-orthogonal
methods BiCG, CGS, BiCGSTAB, and QMR. It does have a problem with the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
87
orthogonalization vectors, which may become non-orthogonal because of accumulated
roundoff errors [Kelley]. If this happens, the residual, and hence the solution, could be
inaccurate. To remedy the problem, the modified Gram-Schmidt variant is implemented.
Unrestrarted GMRES was used.
4.3.4 Computational issues
The most time-consuming operation for the Krylov iterative methods is the matrixvector multiplications. The sensitivity matrix is multiplied with a vector for each GMRES
iteration. For the other Krylov-subspace methods, there are two matrix-vector
multiplications, so the matrix-vector multiplication implementation takes on an even greater
importance. The most efficient procedure would take advantage of some special structure of
the sensitivity matrix. Unfortunately, the specific structure of the sensitivity matrix varies and
is generally unpredictable.
Although the sensitivity matrices do not generally exhibit a sparse structure, some do.
It was felt that the added computational costs of representing these matrices as sparse for
dense matrices would be offset by the savings for the matrices and circuit systems that did
produce a sparse matrix structure. Some of the tested circuits revealed a sparse structured
Jacobian during the shooting-Newton method. A common format to handle sparse matrices is
the compressed row storage scheme[Barrett94], This was implemented in the enhanced
steady-state version of SPICE. The compressed row storage format makes no assumption
about the sparsity structure of the matrix. No zeros are stored, but pointers are used to
address the needed values for the matrix-vector multiplication. The nonzero entries of each
row are in contiguous memory cells and three one-dimensional arrays are created. One array
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
88
contains the nonzero values of the matrix, and one array contains integer indices of the
columns of these values, and there is one array of pointers to indicate the beginning of the
row in the original matrix. The pseudo-code for this kind of matrix-vector product is shown
below for a nxn matrix c = A x :
For i = l,...,N
initialize c(i) = 0
For k = row_Pointer(i), ..., row_Pointer(i+l) - I
c(i) = c(i) + value(k) * x(col_index(k))
BiCG and QMR also need a matrix-multiplication with the inverse of the sensitivity matrix
d = A tx and the pseudo-code is shown below:
For i = 1,...,N
initialize c(i) = 0
for k = 1
N
For k = row_Pointer(k), ..., row_Pointer(k+l) - 1
d(col_index(i)) = d(col_index(i)) + value(i) * x(k)
For both matrix-vector multiplications, the storage is much less than the n 2 required
for the full matrix. The compressed row storage format needs only (2) * (# nonzeros + n + 1)
storage locations. This format also makes for a reduction in the 2n 2 operation count and is
reduced to around n[5 because of no multiplies by the zero elements.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
89
However, the sensitivity matrix is not always sparse (see Figures 4.1 and 4.2). For the
dense matrices, implementing the sparse data structure involving creating indices to identify
the location of the non-zero entries for use in the shooting-Newton iterate matrix
multiplication. For matrices that are not sparse, this structure imposes a complicated and
much less efficient structure than simply a two-dimensional array. The sparsity of the
shooting-Newton sensitivity matrix cannot be predicted. However, the structure does stay the
same for each shooting-Newton iteration and the structure frame can be used again.
4.4 Summary
Efficient implementation of the shooting-Newton method for the determination of the
steady-state response of integrated circuits for periodic inputs involves issues around the
main time-consuming parts of the algorithm, the formation of the Jacobian for the iterate and
the solution of the iterate formed. The approach of sensitivity circuits was implemented to
form the sensitivity matrix at the same time as transient analysis is done. Also, efficient ways
were used to implement the Krylov-subspace methods.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
90
Chapter 5
Results and Comparisons
This chapter presents the results of the implementation of the shooting-Newton
method for finding the steady-state response of integrated circuits. The method is tested with
four different circuits which have difficultly obtaining the steady-state response with
traditional methods. These circuits include: a DC power supply, a buck power converter, a
Class C amplifier, and a Class AB amplifier. The shooting-Newton method is shown to be
more efficient on these circuits than standard transient analysis in SPICE. The different
Krylov-subspace iterative methods including BiCG, CGS, GMRES, QMR, and BiCGSTAB,
are compared to standard Gaussian elimination for solution of the iterate generated by the
shooting-Newton method. Then an analysis of the level of accuracy is given for the Class C
amplifier. The chapter closes with a detailed analysis of the convergence behavior of the
different Krylov methods for the Class C amplifier example.
The shooting-Newton method for determining the steady-state response of the simple
DC supply circuit [Skeboe80] in Figure 5.1 is both efficient and accurate. The SPICE input
file, along with the steady-state commands used for the simulation, is shown in Figure 5.2.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Figure 5.1 DC supply circuit.
5.1
DC supply test circuit
For this example, Gaussian elimination or one of the Krylov-subspace methods
(GMRES, CGS, BiCG or BICGSTAB) were used to solve the system of equations generated
by the shooting-Newton method. The standard SPICE transient analysis was run until steadystate was achieved and all the transient outputs died out.
Table 5.1 compares the capacitor voltages and inductor current values of the
shooting-Newton variants in steady state to the values of transient analysis. The data reveals
that the shooting-Newton methods and the transient analysis are accurate to at least four
digits. Plots for the same variables from the transient analysis run in SPICE are shown in
Figure 5.3. Two cycles of the transient analysis along with plots of the Krylov-subspace
variants are shown in Figure 5.4. The results for the Krylov-subspace methods are almost
identical to the transient analysis output, showing that the non-stationary iterative methods
obtain the initial conditions for the state variables that put the circuit directly into steadystate. Table 5.2 shows the amount of time saved when the direct shooting-Newton method is
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
92
DC power supply from Skelboe’s paper
vin I 0 sin(0 20 50)
d l 1 2 m odi
.model m odi d (is= 1e -16 cjo=2pf)
c l 1 2 luF
c2 3 0 lm F
c3 4 0 lm F
11 3 4 0.1 H
rl 2 3 5
r2 4 0 Ik
“"commands for steady-state analysis:
"steady ge non_auto 20ms 50 .01 uic 10.0
"steady gmres non_auto 20ms 50 .01 uic 3.0
"steady bicgstab non_auto 20ms 50 .01 uic 10.0
"steady cgs non_auto 20ms 50 .01 uic 10.0
.end
Figure 5.2 SPICE input file with steady-state commands.
V(C3)
volts
V(C2)
volts
V(C1)
volts
V(L1)
amps
TRANSIENT
ANALYSIS
17.87730
GAUSSIAN
ELIMIN
17.86965
BICG
BICGSTAB
CGS
GMRES
17.86965
17.86965
17.86965
17.86965
17.75859
17.74859
17.74859
17.74859
17.74859
17.74859
-17.79250
-17.77805
-17.77805
-17.77805
-17.77805
-17.77805
.01765934
.01779229
.01779229
.01779229
.01779229
.01779229
Table 5.1 Steady-state initial conditions for the DC power supply example using
different Krylov methods inside the shooting methods. (The transient analysis values
are at 1.98s.)
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
93
V(cap 3) vs
20 [
V(cap 2) vs
20r
19
vo
<wVyVy -?
r£
vo
Ita 10f
Ita 10
0.5
1
(T
0
0.5
1
1.5
V(cap 1) vs
0.5
1
1.5
second
Figure 5.3 Transient analysis plots for the DC power supply circuit.
used for obtaining the solution. With a small circuit such as the DC power supply, Krylovsubspace methods do not show significant improvement over the regular Gaussian
elimination method to solve the Newton-Raphson iterate. Importantly, the direct, shootingNewton method is much more efficient than standard transient analysis in SPICE. Because of
excellent convergence properties of the shooting-Newton method when close to the solution,
higher accuracy is easily obtained at little computational cost.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
94
METHOD USED FOR
STEADY-STATE
DETERMINATION
TIME
(SECONDS)
TO REACH
STEADYSTATE
Transient analysis
Gaussian elimination
GMRES
CGS
BiCG
BiCGSTAB
.1833
SHOOTINGNEWTON
ITERATIONS
TOTAL
TRANSIENT
ANALYSIS
CYCLES
67
-
.06667
.0500
.06667
.050
.1000
2
19
2
2
19
19
2
2
19
19
T able 5.2 Time to calculate steady-state for DC power supply. The size of the
sensitivity matrix is 4 x 4 .
V(cap 3) vs Time
V(cap 2) vs Time
17.89
18.1
17.88
18
1
:
K~
^
i
;
;
'
!_ _ _ _ _ _ i
17.87!--'.-
\ '
,
\
.
v
\
V
.
!~
1
'
17.8617.85 i------
\
-t-----------
\
I
1 7 .8 4 1
1.96
1.97
1.98
n
17.7
17.6
1.96
1.99
V(cap 1) vs Time
0.025
" '1
c
.1
'
_J
-30
-4 0 !
1.96
1.97
" a
.
1.98
1.99
2
r-
!
i
‘
-20
:
l(inductor) vs Time
10
-10
;
7 -
__________
'
\
/
i
\
-
-
-
i
\---U
\ J :
1.98
seconds
-\\ -
V
1.99
-
r
1
/
\
N
A
;i
/1
1.97
0.02f
0.015;
1j
1
0.01
1.96
1.97
1.98
seconds
1.99
2
Figure 5.4 Two periods of the transient analysis when the DC supply circuit is in
steady-state. The triangle marker indicates the Krylov-subspace solutions.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
F ig u re 5.5 Circuit diagram for the buck converter.
5.2
Buck converter
A buck converter, shown in Figure 5.5 is more complex than the DC supply circuit
[Trajkovic98, Lau97]. This circuit’s high number of inductors and capacitors make it very
difficult to simulate the steady-state response using standard SPICE transient analysis. Buck
converters are used in many communications systems, including cellular phones and PDA’s.
The buck converter includes thirty-two state variables, capacitors and inductors.
Figure 5.6 shows the SPICE output for a GMRES run. Values for two of the more interesting
state variables for transient and iterative analysis are given in Table 5.3. The accuracy of the
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
96
V(cf)
volts
1(10
amps
T R A N S IE N T
A N A LY S IS
GAUSSIAN
E L IM IN A ­
T IO N
BICG
3.367924
3.364743
3.364743 3.370084
3.35289 3.350897
0.7085792
0.7030570
0.703057 0.7024051
0.70974 0.702582
BICGSTAB
CGS
GMRES
Table 5.3 Steady-state initial conditions for the buck converter example using
different Krylov methods inside the shooting-Newton algorithm.
voltages and currents is within .001 percent. Figure 5.7 provides plots of the state variables,
the voltage across the capacitor Cf, and the current through the inductor Lf from a transient
analysis. It is important to note that the transient responses die off slowly and produce a
steady-state response. Figure 5.8 shows two periods of the steady-state waveforms obtained
from the transient analysis (the Krylov-subspace results are marked with a triangle). The
plots reveal that the shooting-Newton method is as accurate as the transient analysis in
SPICE.
Table 5.4 illustrates that the shooting-Newton algorithm takes significantly less time
to calculate the steady-state than standard transient analysis found in standard SPICE. To
solve for the steady-state, transient analysis requires over 2000 cycles, while the Krylov
methods use less than 100, GMRES and BiCG being the most efficient. The sensitivity
matrix used in the solution of the shooting-Newton iterate was sparse (Figure 4.2) and the
eigenvalues are clustered (Figure 4.4). The biorthogonal Krylov methods, CGS, BiCG, and
BiCGSTAB, all had breakdowns in their solution and were fixed by restarting the iteration
with an initial guess of 0.1, instead of the usual initial guess of zero. Without this adaptation
these methods took over twice the amount of time to calculate a solution.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Spice I -> steady gmres non_auto 500ns 100 .01 uic 10.0
Warning: vsq: no DC value, transient time 0 value used
Using gmres algorithm.
this is the value o f start_time : 0.000000
the ckt has 32 states.
The Steady-State has been reached
Values o f the state variables in the circuit are the following
cf
: 3.350897e+00 Volts
cx
: 3.807514e+00 Volts
c:u9:inv:out
: 4.999996e+00 Volts
c:u8:inv:out
: -6.047986e-04 Volts
c:u7:b2:out
: 1.582466e-06 Volts
c:u7:a2:out
: 1.956966e-06 Volts
c:u7:bl:out
: 4.999996e+00 Volts
c:u7:al:out
: 4.999996e+00 Volts
c:u712:inv:out
: 4.999997e+00 Volts
cout71
: 3.224653e-06 Volts
c:u71:inv:out
: 3.224653e-06 Volts
c:u6:b2:out
: 2.999999e+00 Volts
c:u6:a2:out
: 2.999999e+00 Volts
c:u6:bl:out
: -9.999978e-01 Volts
c:u6:al:out
: -9.999978e-01 Volts
c:u612:inv:out
: 1,689046e-06 Volts
cout61
: 4.999998e+00 Volts
c:u61:inv:out
: 4.999998e+00 Volts
c:u5:inv:out
: 2.762276e-06 Volts
c:u4:inv:out
: 4.999125e+00 Volts
c5
: 7.278655e-01 Volts
c:u3:inv:out
: 1.110857e-02 Volts
c:u2:b2:out
: -9.778298e-01 Volts
c:u2:a2:out
: 3.99606le+00 Volts
c:u2:bl:out
: 1.981769e+00 Volts
c:u2:al:out
: 1.981769e+00 Volts
c:ul3:inv:out
: 4 .9 7 1984e+00 Volts
co u tl2
: 2.398492e-06 Volts
c:ul2:inv:out
: 2.398492e-06 Volts
coutl
: 4.999998e+00 Volts
c:ul:inv:out
: 4.999998e+00 Volts
If
: 7.165829e-01 Amps
The period of waveform: 5.000000e-07
The number o f NR iterations required : 8
Number o f cycles required : 37
CPU time for the steady-state analysis = 19.966667 sec
Figure 5.6 Results from GMRES run on buck converter.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
V(cf) vs Time
98
4.5
vol
tag
3.5
0
0.5
1.5
1
x 10
l(lf) vs Time
se c o n d s
x 10
Figure 5.7 Transient analysis plots for the two variables, voltage across the capacitor
Cf, and current through the inductor Lf
V(cf) vs Time
3 .3 6 5
'/
3 .3 6
b
0 .5
\
{ \
-A/':
3 .3 5 5
l(lf) vs Time
,
\
f- 0 .5
3 .3 5
b
\
3 .3 4 5 1
1 .4 9
1 .4 9 5
1 .5
- 1 .5 1
1 .4 9
/
1 .4 9 5
seconds
1 .5
x
10
Figure 5.8 Two periods of transient analysis for the buck converter. The Krylovsubspace methods are marked with a triangle.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
99
METHOD USED FOR
STEADY-STATE
DETERMINATION
Transient analysis
Gaussian elimination
GMRES
CGS
BiCG
BiCGSTAB
QMR
TIME
(SECONDS)
TO REACH
STEADYSTATE
350.0
73.6500
14.5333
18.1833
18.5000
18.6833
18.3833
SHOOTINGNEWTON
ITERATIONS
TOTAL
TRANSIENT
ANALYSIS
CYCLES
63
17
17
17
19
17
1750
286
60
60
78
58
59
Table 5.4 Time to reach steady-state for the Buck converter example to .01 decimal
places. The initial guess for BiCG, BiCGSTAB, and QMR is not zero.
Originally, the Gaussian elimination method did not use pivoting. This resulted in
very long simulation times. When pivoting was added the method was able to converge to the
steady-state solution. The close values of the times for the GMRES and Gaussian elimination
methods may point to the need for a pre-conditioner. GMRES took too many steps to
converge.
An important problem with the Krylov methods was investigated on the buck
converter example. The BiCGSTAB and BiCG algorithm had a breakdown, a divide by zero,
during the iterations. The change of initial guess for the iterative method was implemented as
described in Chapter 4. Table 5.5 shows the results of the simulation using the Krylov
method BiCG for solution inside the shooting-Newton method. The same information is
presented in Table 5.6 for BiCGSTAB.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
100
INITIAL GUESS
BICG
0.0
0.1
0.01
0.001
0.0001
TIME
(SECONDS)
TO REACH
STEADYSTATE
171.56
9.83
63.05
18.50
79.98
TOTAL
TRANSIENT
ANALYSIS
CYCLES
SHOOTINGNEWTON
ITERATIONS
100
21
41
17
70
536
54
208
78
310
Table 5.5 Different initial guess for the BiCG method.
INITIAL GUESS
BICGSTAB
0.0
0.1
0.01
0.001
0.0001
TIME
(SECONDS)
TO REACH
STEADYSTATE
53.28
43.25
16.40
18.68
15.75
SHOOTINGNEWTON
ITERATIONS
TOTAL
TRANSIENT
ANALYSIS
CYCLES
40
34
15
13
16
189
178
67
83
63
Table 5.6 Different initial guess for the BiCGSTAB method.
The data indicates that an initial guess of .001 allows the methods to efficiently
converge to solution without the previous breakdown problems. The breakdown in both
methods occur because the two residuals that are calculated in BiCG and BiCGSTAB are
already orthogonal. This causes the denominator of the line search parameter to take the zero
value, and this division by zero causes the breakdown of the method.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
101
Vc c I 7 - I 2 V 1
.—W V
so a
O '.
35a'
u
joa
Figure 5.9 Class AB amplifier [Meyer89],
5.3
Class AB am plifier
To test the validity of the shooting-Newton solutions, a transient analysis was run on
a Class AB monolithic power amplifier (Figure 5.9) until the steady-state was reached. The
plots from this transient analysis run are shown in Figure 5.10. Table 5.7 presents the values
of the capacitor voltage and inductor current from the transient analysis and compares them
to the values for the initial values calculated by the shooting-Newton. The shooting-Newton
approach of steady-state determination showed accuracy to seven digits. In Table 5.8 the
time and iterations required are presented. It took only 1 shooting-Newton iteration with 5
periods of transient analysis to reach the steady-state. The standard transient analysis showed
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
102
V(Cout)
volts
V(Cin)
volts
CGS
GMRES
TRANS
ANAL
GAUSSIAN
ELIMINATION
BICG
BICGSTAB
-7.50480
-7.504807
-7.504807
-7.504807 -7.50480
-7.50480
-8.37212
-8.372127
-8.372127
-8.372127 -8.37212
-8.37212
Table 5.7 Steady-state initial conditions for the Class AB amplifier example using
different Krylov methods inside the shooting-Newton algorithm.
METHOD USED FOR
STEADY-STATE
DETERMINATION
Transient analysis
Gaussian elimination
GMRES
CGS
BiCG
BiCGSTAB
TIME
(SECONDS)
TO REACH
STEADYSTATE
2125+
0.41667
0.43333
0.38333
0.43333
0.41667
SHOOTINGNEWTON
ITERATIONS
-
1
1
1
1
1
TOTAL
TRANSIENT
ANALYSIS
CYCLES
1000+
5
5
5
5
5
Table 5.8 Summary of Krylov-subspace methods and transient analysis results of
steady-state determination for a Class AB amplifier.
a response with transients dying off very slowly and required over 1000 periods of transient
analysis to reach the same accuracy. The accuracy of the steady-state solution compared to
standard transient analysis is shown in Figure 5.10. All of the Krylov-subspace methods
showed the same improvements in accuracy. Because the circuit has few state variables, only
two, the size of the system of equations created for the shooting-Newton iteration is small
( 2 x 2 ) and more efficiency from the Krylov methods is not expected or seen for such a small
system.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
103
Vout(Volts)
— solution by transient an a ly sis
- solution by shooting-New ton:GE
Vout(Volts)
x_10
solution by transient analysis
solution by shooting-New ton:GM RES
1 !-
X
-1
Vout(Volts)
2.1
2.11
2 .1 2
2.13
solution by transient an aly sis
solution by shooting-NewtonrBiCG ' j
1
0
2.1
2.11
2 .1 2
Time(Sec)
2.13
.•6
x 10
Figure 5.10 Periodic time-domain response of the Class AB circuit.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
104
u»
R1
C 3 7 r^
L*
LU
!
Iln
Figure 5.11 Class C amplifier [Colon73].
5.4
Class C am plifier
The Class C amplifier shown in Figure 5.11 contains many state variables, and the
convergence behavior of the Krylov-subspace methods can be observed. In SPICE 3f5, the
direct steady-state determination using the shooting-Newton method resulted in significant
savings in computational time and resources (Table 5.9). In the traditional SPICE transient
analysis, the circuit was simulated for over 3000 cycles and still was not in steady-state. The
shooting-Newton methods required only 35 cycles of transient analysis and 26 shootingNewton iterations. A further advantage of the shooting-Newton method is that it provides an
accurate method of determining the steady-state response (Figure 5.12). The Krylovsubspace methods for solution of the iterate generated by this method showed slightly more
efficiency than the regular Gaussian elimination method (Table 5.9).
The convergence behavior of the solution within the shooting-Newton algorithm is
examined for GMRES, CGS, BiCG, BiCGSTAB, and QMR. Plots for the convergence
behavior and the data are provided in Figures 5.13-5.17 and Table 5.10, respectively. The
BiCG, BiCGSTAB, and CGS have somewhat erratic convergence with the relative residual
norm increasing, instead of decreasing at each step. They are still able to proceed with the
next iteration and finally to converge. QMR and GMRES demonstrate monotonic
convergence.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
105
METHOD USED FOR
STEADY-STATE
DETERMINATION
Transient analysis
Gaussian elimination
GMRES
CGS
BiCG
BiCGSTAB
QMR
TIME
(SECONDS)
TO REACH
STEADYSTATE
0.24233
0.18333
0.11667
0.13333
0.15000
0.15000
0.13333
SHOOTINGNEWTON
ITERATIONS
TRANSIENT
ANALYSIS
CYCLES
24
14
14
14
14
14
14
-
4
4
4
4
4
4
Table 5.9 Summary of Krylov-subspace methods and transient analysis results of
steady-state determination for a Class C amplifier. The size of the sensitivity matrix is
11x11.
BICG
BICGSTAB
CGS
GMRES
QMR
ITERATION
1.000000e-00
1.000000e-00
1.000000e-00
1.000000e-00
1.000000e-00
1
2
2.946002e-01
2.910341e-01
7.255579e-01
2.827648e-01
7.232270e-01
3
2.677884e-01
5.074055e-02
2.103980e-01
1.989180e-01
4.519170e-01
4
6.66579 le-02
1.473556e-01
6.42575 le-02
4.739450e-02
5.191843e-02
5
1.869852e-01
8.361029e-03
3.061802e-01
4.734953e-02
2.960304e-03
6
7.523472e-03
2.355485e-05
6.346682e-04
9.724925e-04
1.466368e-03
7
5.114317e-05
1.924730e-05
3.709610e-08
7.632758e-07
9.746117e-07
8
1.453166e-08
1.857176e-09
2.569748e-14
1.134104e-09
1.277015e-08
9
4.463113e-10
3.890520e-13
-
-
Table 5.10 Relative residual for the Class C amplifier.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
1.143564e-09
106
10
— S o lu tio n with tra n s ie n t a n a ly s is
- - so lu tio n with s h o o tin g -N e w to n :B iC G S ta b
5
o
0
>
•5
1
1.02
1.01
1 .03
10
S o lu tio n with tr a n s ie n t a n a ly s is
- - so lu tio n with sh o o tin g -N e w to n :C G S
5
J[
U
0
-5
1
10
1.02
1.01
1 .03
,
S o lu tio n with tra n s ie n t a n a ly s is
so lu tio n with sh o o tin g -N e w to n :Q M R
T im e (S e c )
x 10
Figure 5.12 Periodic time domain response for the Class C amplifier example.
Relative Residual
1.E+00
1.E-03
1.E-06
1.E-09
7
1
1
Iteration Count
Figure 5.13 Convergence behavior of the residual norm of the BiCG solution
algorithm as a function of iteration number.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
107
Relative Residual
1.E+00
1.E -03
1.E -06
1.E -09
1
2
3
4
5
6
7
8
9
10
Iteration Count
Figure 5.14 Convergence behavior of the residual norm of the BiCGSTAB solution
algorithm as a function of iteration number.
Relative Residual
1 .E +00
1.E -03
1.E -06
1 .E-09
1
2
3
4
5
6
7
8
9
10
Iteration Count
Figure 5.15 Convergence behavior of the residual norm of the CGS solution
algorithm as a function of iteration number.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
108
Relative Residual
1 .E+00
1.E -03
1.E -06
1.E -09
1
2
3
4
5
6
7
8
9
10
Iteration Count
Figure 5.16 Convergence behavior of the residual norm of the GMRES solution
algorithm as a function of iteration number.
Relative Residual
1 .E +00
1.E -03
1.E -06
1.E -09
1
2
3
4
5
6
7
8
9
10
Iteration Count
Figure 5.17 Convergence behavior of the residual norm of the QMR solution
algorithm as a function of iteration number.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
109
5.5
C om parisons
The shooting-Newton method for the determination of the steady-state response for
periodic inputs was implemented in SPICE 3f5. The iterate generated by the method could be
solved using standard Gaussian elimination, or the Krylov-subspace methods of GMRES,
BiCG, QMR, CGS, and BiCGSTAB. This method was shown to be efficient and accurate
when analyzing circuits that have difficulty reaching steady-state in the standard transient
analysis available in SPICE 3f5. This greater efficiency means that the method provides
shorter simulation times and requires significantly less computer storage needs.
The Krylov-subspace methods provided even greater computer efficiency. All the
Krylov methods are significantly faster than the transient analysis approach of simulation
until all the transients have died out. The Krylov-subspace methods give mixed results in
terms of efficiency over the standard solution method of Gaussian elimination. The GMRES,
QMR and BiCGSTAB algorithms show the fastest convergence to solution. The biorthogonal
methods of BiCG and BiCGSTAB must have a non-zero initial guess in the iteration in order
to consistently find a solution. This is due to the fact that these methods do not reduce the
residual at every step and have breakdowns that occur.
The results are summarized in Figure 5.18 and Figure 5.19 for the shooting-Newton
method of steady-state determination. Using GMRES, BiCG, QMR, CGS, and BICGSTAB
iterative methods produces the expected significant improvement over regular transient
analysis determination of steady-state. Using the shooting-Newton algorithm for direct
calculation of the steady-state response shows improvement over the transient analysis. For
small circuits the gains made by using Krylov-subspace methods over Gaussian elimination
are not readily apparent. The larger the circuit, however, the more benefit derived.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
110
Power Supply
Buck Converter
Class AB amp
Class C amp
Figure 5.18 Comparison of the shooting-Newton method to standard transient
analysis in SPICE for determination o f steady-state (G.E. is Gaussian Elimination).
1.6
Power Supply
Buck Converter
Class AB amp
Class C amp
Figure 5.19 Comparison o f Krylov methods to Gaussian elimination for
determination to the steady-state.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Ill
Chapter 6
Summary
This dissertation presented an improved and expanded method for determining the
steady-state response of analog RF and microwave integrated circuits in the time-domain.
The approach, which is more efficient than previous methods, used the standard shootingNewton algorithm to directly determine the steady-state response of the circuit with a
periodic input. It is implemented by adding a steady-state analysis option in the popular
circuit simulator SPICE. The method also involves the implementation of different Krylovsubspace algorithms that can be used to solve the resulting iterative equations generated by
this method. Importantly, the steady-state solver in SPICE is freely available to the general
public.
The improved steady-state method was developed to efficiently determine the steadystate response for specific analog and microwave circuits that have difficulty with standard
methods presently available. This class of circuits includes strongly nonlinear
communication circuits. Standard simulators, such as SPICE, used transient analysis to
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
112
determine the steady-state response by simulating until all the transients died out.
Unfortunately, for the type of circuits used in communication devices, this takes much too
long for a detailed analysis to be made. Harmonic balance methods are widely used for
steady-state determination but are not optimal for strongly nonlinear circuits. This
dissertation found the shooting-Newton method a good choice for steady-state determination.
The shooting-Newton algorithm is a direct approach that was used to determine the
steady-state of integrated circuits. It posed the formulation of the steady-state response as a
two-point boundary value problem. The solution of the two-point boundary value problem
was the set of capacitor voltages and inductor currents that when used as initial conditions for
the circuit being analyzed resulted in the steady-state response. The calculation involved the
solution of an iterative equation that was generated by solution of the nonlinear system, using
the Newton-Raphson approach. Historically, this system has been solved by methods such as
Gaussian elimination, which can use as much as 0 ( n 2) computer storage and
0 ( n 3) computation cost. Therefore, this limited the method to be used on relatively small
circuits. Implementation of the Krylov-subspace methods was found to produce greater
computational efficiency and the ability and to simulate larger, more complex circuits.
Readily available Krylov-subspace algorithms were used to compute the solution so as to
take advantage of the particular matrix being solved. Reusing of the structure of the matrix
was also implemented to take advantage of any special structure that was present. The
implementation of the Krylov-subspace methods and with special structure can reduce the
storage to O(n) and the computation costs to 0 ( n 2) or even O(n).
The shooting-Newton method along with the other aspects of the approach was
implemented and tested using the most popular analog circuit simulator, SPICE. Quantities
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
computed during the transient analysis in SPICE were reused in the calculation of the direct
determination of the steady-state response. The shooting-Newton method was efficient for
strongly nonlinear circuits that were often encountered in RE systems. Results from
simulations were compared with standard transient simulation, standard shooting-Newton
with Gaussian elimination, and shooting-Newton with solution by the Krylov-subspace
methods. The results showed that the shooting-Newton method has considerable time and
computation savings compared to the transient analysis method in standard SPICE. Further,
the results showed that the Krylov-subspace methods are usually more efficient than
Gaussian elimination solution. The convergence behavior of the different Krylov methods
BiCG, CGS, BiCGSTAB, GMRES, QMR, and GMRES were presented along with matrix
characteristics. The sensitivity matrix was shown to be well conditioned and needed few
Krylov-subspace iterations to reach solution.
This dissertation presented a more efficient and expanded approach for finding the
steady-state solution of certain classes of circuits, including self-biasing amplifiers, narrow­
band filters, circuits with high Qs, and lightly damped circuits with long time constants.
However, a drawback of the method is that it cannot include transmission lines, which are
critical for microwave and RF applications. The inclusion of transmission lines to this
method is a significant direction for future work. Another important direction would be the
investigation into the structure and numerical properties of the Jacobian, sensitivity, matrix
used during the shooting-Newton method.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
114
Appendix A
SPICE Input files
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
115
DC power supply from skelboe's paper
vin 1 0 sin(0 20 50)
d l 1 2 modi
.model modi d (is=le-16 cjo= 2 pf)
c l 1 2 luF
c2 3 0 ImF
c3 4 0 ImF
II 3 4 0 .1 H
rl 2 3 5
r2 4 0 Ik
.options reltol=le-9
.tran 1ms 2
*my attemps
*plot (4)
*steady horn non_auto 20ms 50 le-3 duic le-2
*plot v(4)
*commands for steady-state analysis:
*steady act non_auto 20ms 50 .01 uic 10.0
*steady ext non_auto 20ms 50 .01 uic 3.0
*steady tran non_auto 20ms 50 .01 uic 10.0
.end
Figure A .l SPICE input file for the DC Power Supply
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
116
Wai’s buck converter with MOSFET switch models
* This model does not have the vco
* Period of input is 500ns
.model
.model
.model
.model
.model
.model
.model
dd
npn NPN (TR=l0p TF=10p)
pnp PNP (TR=10p TF=10p)
swml SW (VT=2.5 RON=lg ROFF=lu)
swm2 SW (VT=2.5 RON=lu ROFF=lg)
swml2 SW (VT=0.995 RON=lg ROFF=lu)
nmos NMOS (LEVEL= 1 VTO=0.2 KP=1.20)
.subckt inv in out
xinv in out 0 nandl
.ends inv
.subckt nandl in outp outn
cout outp outn .Olp
rout outp outn 1 .0 k
b_gn outn outp 1=0.0025-0.0015929*atan(300*v(in)-750)
.ends nandl
.subckt andl in outp outn
cout outp outn .Olp
rout outp outn 1.0 k
b_gn outn outp 1=0.0025+0.0015929*atan(300*v(in)-750)
.ends andl
.subckt nand2 ina inb out
vdd vdd 0 dc 5
xal ina vdd out andl
xbl inb vdd out andl
xa2 ina out nab nandl
xb 2 inb nab 0 nandl
.ends nand2
vdd vdd 0 dc 5
vsq sq 0 pulse(0 5 0 2ns 2ns 248ns 500ns)
** narrow pulse gen
xul sq outl inv
coutl outl 0 5p
x u l 2 outl o u tl 2 inv
coutl2 outl2 0 5p
xul3 outl2 outl3 inv
xu2 sq out 13 out2 nand2
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
117
* 175ns 1-shot
xu3 out2 out3 inv
q5 vdd out3 e5 npn
r5 out3 e5 253.6k
c5 e5 0 lp
xu4 e5 out4 inv
xu5 out4 out5 inv
* deadtime controllers
xu61 out5 out61 inv
cout61 out61 0 6 p
xu612 out61 out612 inv
xu6 out612 out5 out6 nand2
xu71 out4out71 inv
cout71 out71 0 16p
xu712 out71 out712 inv
xu7 out712 out4 out7 nand2
xu 8 out6 Is inv
xu9 out7 hs inv
* buck converter
m_gsw3 vin hs x x nmos L=1 W=1
dsw3 x vin d
m_gsw4 x Is 0 0 nmos L=1 W=1
dsw4 0 x d
cx x 0 400p
If x out 292n
cf out cf lOu
ref cf 0 10 m
rl 1 out 0 1k
vin vin 0 dc 8
.tran In lOu
.end
Figure A.2 SPICE input file for the Buck convertor
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
118
* A DC to 1GHz class AB monolithic power amplifier
* designed by R.G. Meyer, U.C. Berkeley
* Sept. 1988
* input = 101
* output = 1 0 0
* vcc = 99
*
* R. G. Meyer and W. D. Mack.
* A wide-band class AB monolithic power amplifier".
* EEEE Journal of Solid-State Circuits, vol. 24, no. 1, Feb 1989.
.opt acct lvltim=2v it 15=5000
.width out=80
.model
+
+
+
+
+
+
nb 2 2 0 ew npn
is=3.38e-17 bf=205 nf=0.978 vaf=22 ikf=2.05e-2 ise=0
ne=1.5 br=62 nr=l var=2.2 isc=0 nc=1.5 rb=l 15
rbm=28.3 re=l rc=30.5 cje=1.08e-13 vje=.995 mje=0.46 tf= le-l 1
xtf=l itf=1.5e-2 ptf=0 cjc=2.2e-13 vjc=0.42 mjc=0.22 xcjc=0.1
tr=4e-10 cjs=l.29e-l3 vjs=0.65 mjs=0.31 xtb=1.5 eg= 1.232
xti=2.148 fc=0.875
* irb=4.8e-5
* Sources
Vin
102
Rin
101
Rout 100
Vpos 99
0
102
0
0
sin 0 1.0 lOOMegHz
50
50
9.5
* input and output matching
Cin
101
1
luF
Cout 100
4
luF
4
R
3
35
R4
2
3
525
* Level shifter, input
R13
1
5
Q18 5
5
Q17 6
6
R12 7
0
R14
I
8
Q16 8
7
stage, lower path
35
6
0
nb2 2 0 ew
7
0
nb220ew
2k
35
0
0
nb 2 2 0 ew
* amplifier, input stage, lower path
Q ll
99
99
15
0
nb220ew
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
119
Q12
Q13
Q14
Q15
R ll
Q4
R6
15
14
13
15
14
13
12
11
10
12
10
8
0
9
14
13
12
11
40
9
60
0
0
0
0
nb220ew
nb220ew
nb220ew
nb220ew
0
nb2 2 0 ew
*amplifier, output stage, lower path
nb2 2 0 ew
Q2
9
26
3
0
0
R2
4.5
26
* amplifier, imput stage, upper path
2
R3
1
35
24
R5
99
400
2
nb220ew 3
24
23
0
Q3
nb 2 2 0 ew
23
0
23
0
Q5
* amplifier, output stage, upper path
nb2 2 0 ew
24
25
99
0
Qi
2
R15
3
25
* bias
RIO
Q10
Q9
R9
Q8
Q7
R8
R16
Q6
R7
network
16
99
16
16
17
17
21
99
21
18
20
18
0
19
20
2
22
0
20
0
3
3
3
3
6
6
6
2 .8 k
17
18
9k
0
0
nb2 2 0 ew
nb2 2 0 ew
20
0
0
nb2 2 0 ew
nb2 2 0 ew
0
nb2 2 0 ew
19
240
3.4k
22
80
.op
.tran .Ins 1 00 ns
.plot tran v( 1 0 0 ) v( 1 0 1 ) v( 102 )
.end
Figure A.3 SPICE input file for the AB amplifier
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
120
class C amplifier from trick's paper
rl 1 0 50
r2 6 0 50
cl 1 2
c2 2 0
c3 3 0
c4 4 0
c5 5 0
c6 5 6
c7 7 0
lOOpf
lOpf
lOpf
lOpf
lOpf
lOOpf
lOOpf
18 2 3 0.025uh
19 3 0 1.2uh
110 4 5 ,025uh
1115 7 1.2uh
q l 4 0 3 modi
.model modi NPN(IS=le-16 BF=80 RB=100 VA=50 tf=.lns)
vdc 7 0 30v
fin 1 0 sin(0 0.1 lOOMEGHz)
.options reltoI=le-9
♦plot v(6 )
♦commands for steady-state analysis:
♦steady act non_auto .01 us 50 .01 uic 10.0
♦steady ext non_auto .01 us 50 .01 uic 3.0
♦steady tran non_auto .01 us 50 .01 duic 10.0
.end
Figure A.4 SPICE input file for the Class C amplifier
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
121
Bibliography
[Antognetti84]
P. Antognetti, D. O. Pederson, and H. de Man editors. Computer
Design Aids for VLSI Circuits. Martinus Nijhoff Publishers, 1984,
pp. 29-112.
[Aprille72a]
Thomas J. Aprille, Jr. and Timothy N. Trick. Steady-state analysis
of nonlinear circuits with periodic inputs. Proceedings of the
IEEE, vol. 60, no. 1, January 1972, pp. 108—114.
[Aprille72b]
Thomas J. Aprille, Jr. and Timothy N. Trick. A computer
algorithm to determine the steady-state response of nonlinear
oscillators. EEEE Transactions on Circuit Theory, vol. CT-19, no.
4, July 1972, pp. 354-360.
[Ashar89]
P. N. Ashar. Implementation of algorithms for the periodic
steady-state analysis of nonlinear circuits, Masters Thesis,
University of California Berkeley, March 1989.
[Bailey 6 8 ]
Paul B. Bailey, Lawrence F. Shampine, and Paul E. Waltman.
Nonlinear Two Point Boundary Value Problems. Academic Press,
1968.
[Banzhaf89]
Walter Banzhaf. Computer-Aided Circuit Analysis Using Spice.
Prentice Hall Press, 1989.
[BarmadaOO]
Sami Barmada and Marco Raugi.Transient Numerical Solutions of
Nonuniform MTL Equations with Nonlinear Loads by Wavelet
Expansion in Time or Space Domain. IEEE Transactions on
Circuits and Systems. Vol. 47, No. 8 . August 2000.
[Barrett94]
Richard Barrett, Michael Berry, Tony F. Chan, James Demmel,
June Donato, Jack Dongarra, Victor Eijkhout, Roldan Pozo,
Charles Romine, Henk van der Vorst. Templates for the Solution
of Linear Systems: Building Blocks for Iterative Methods. Society
for Industrial and Applied Mathematics, 1994.
[Borchers96]
Carsten Borchers, Lars Hedrich, and Erich Barke. Equation-Based
Behavioral Model Generation for Nonlinear Analog Circuits.
University of Hanover, Germany, 1996.
[Brachtendorf95a]
H. G. Brachtendorf, G. W elsch, and R. Laur. A simulation tool
for the analysis and verification of the steady state of circuit
designs. In International Journal of Circuit Theory and
Applications, vol. 23. John Wiley & Sons, 1995, pp. 311-323.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
122
[Brachtendorf95b]
H. G. Brachtendorf, G. Welsch, and R. Laur. Simulation of the
steady-state of circuits using an expansion technique and Krylov
subspace methods for solving the resulting linear indefinite
systems. In SAMS, vol. 18-19. Overseas Publishers Association,
1995, pp. 603-606.
[BrambillaOl]
Angelo and Dario D'Amore. M ethod for steady-state simulation
of strongly nonlinear circuits in the time domain. EEEE
Transactions on circuits and Systems— I: Fundamental Theory and
Applications, vol. 48, no. 7, July 2001, pp. 885-889.
[Brown90]
Peter N. Brown and Youcef Saad. Hybrid Krylov methods for
nonlinear systems of equations. SIAM J. Sci Stat Comput, Vol. 11,
no. 3, pp. 450-481, May 1990.
[Buch94]
Premal Buch, Shen Lin, Vijay Nagasamy, and Ernest S. Kuh.
Techniques for Fast Circuit Simulation Applied to Power
Estimation of CMOS Circuits. Electronics Research Laboratory,
University of California, Berkeley 1994
[Bucker94]
Martin Bucker and Achim Baserman. A Comparison of QMR,
CGS and TFQMR on a Distributed Memory Machine. Central
Institute for Applied Mathematics, Julich, 1994
[Chua75]
Leon O. Chua and Pen-Min Lin. Com puter-Aided Analysis of
Electronic Circuits: Algorithms & Computational Techniques.
Prentice Hall Press, 1975.
[Colon73]
Francis Robert Colon and Timothy N. Trick. Fast periodic steadystate analysis for large-signal electronic circuits. IEEE Journal of
Solid-State Circuits, vol. SC- 8 , no. 4, August 1973, pp. 260-269.
[D'Amore91]
Dario D'Amore and Mauro Santomauro. Steady state analysis of
non-linear circuits using spice. IEEE International Symposium on
Circuits and Systems, vol. 5, 1991, pp. 2733-2736.
[D'Amore93]
Dario D'Amore and William Fom aciari. A SPICE-based approach
to steady state circuit analysis. In International Journal of circuit
Theory and Applications, vol. 21, pp. 437-442. John Wiley &
Sons, Ltd., 1993.
[D'Amore95]
Dario D'Amore and Margherita Pillan. Efficient steady state
simulation of nonlinear integrated circuits. In Proc. Midwest
Symposium on Circuits and Systems, pp. 724—727. IEEE Circuits
and System Society, 1995.
[Demmel97]
James W. Demmel, Applied Numerical Linear Algebra. Society
for Industrial & Applied Mathematics, 1997.
[Dongarra96]
Jack Dongarra, Andrew Lumsdaine, Roldan Pozo, and Karin A.
Remington. Iterative Methods Library IML++, version 1.2, April
1996.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
123
[D uff90]
Iain S. Duff and David K. Kahaner. Two Japanese Approaches to
Circuit Simulation. Asian Science and Technology, 1990.
[Feldmann96]
Peter Feldmann, Bob Melville, David Long. Efficient frequency
domain analysis of large nonlinear analog circuits. IEEE 1996
Custom Integrated Circuits Conference, pp. 461-464.
[Filicori 8 8 ]
Fabio Filicori and Vito A. Monaco. Computer-aided design of
non-linear microwave circuits. In Alta Frequenza, vol. LVII, no. 7,
September 1988, pp. 355—378.
[FIueck96]
Alexander J. Flueck and Hsiao-Dong Chiang. Solving the
nonlinear power flow equations with a Newton process and
GMRES. Proceedings of the IEEE International Symposium on
Circuits and Systems, 1996, pp. 657-660.
[Freund92]
Roland W. Freund, Gene H. Golub, and Noel M. Nachtigal.
Iterative solution of linear systems. In Acta Numerica, 1992,
pp. 1—44.
[GadOO]
Emad Gad, Roni Khazaka, Michel S. Nakhla, and Richdard
Griffith. A Circuit Reduction Technique for Finding the SteadyState Solution of Nonlinear Circuits. IEEE Transactions on
Microwave Theory and Techniques, Vol. 48, No. 12, December
2000 .
[Gilmore91a]
Rowan J. Gilmore and Michael B. Steer. Nonlinear circuit
analysis using the method of harmonic balance— a review of the
art. Part I. Introductory concepts. In International Journal of
Microwave and Millimeter-W ave Computer-Aided Engineering,
vol. 1, no. I, pp. 22—37. John Wiley & Sons, Inc., 1991.
[Gilmore91b]
Rowan J. Gilmore and Michael B. Steer. Nonlinear circuit
analysis using the method of harmonic balance— a review of the
art. Part II. Advanced concepts. In International Journal of
Microwave and Millimeter-W ave Computer-Aided Engineering,
vol. 1, no. 2, pp. 159—180. John Wiley & Sons, Inc., 1991.
[GouraryOO]
M. M. Gourary, S. G. Rusakov, S. L. Ulyanov, M. M. Zharov, S. J.
Hamm, and B. J. Mulvaney. Compuational method of stability
investigation for large analog circuits. IEEE International
Symposium on Circuits and Systems, May 28-31, 2000, pp. II168-11-171.
[Gourary97]
M. M. Gourary, S. G. Rusakov, S. L. Ulyanov, M. M. Zharov, K.
K. Gullapalli, and B. J. Mulvaney. Iterative solution of linear
systems in harmonic balance analysis. IEEE MTT-S Digest, 1997,
pp. 1507-1510.
[Greenbaum97]
Anne Greenbaum. Iterative Methods for Solving Linear Systems.
Society for Industrial and Applied Mathematics, 1997.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
124
[Guglielmi96]
Nicola Guglielmi. Inexact Newton methods for the steady state
analysis of nonlinear cirucits. In Mathematical Models and
Methods in Applied Sciences, vol. 6 , no. 1. World Scientific
Publishing Company, 1996, pp. 43-57.
[Gunupudi98?]
Pavan K. Gunupudi and Michel S. Nakhla. Model-reduction of
nonlinear circuits using Krylov-space techniques.
[Gupta81]
K.C. Gupta, Ramesh Garg and Rakesh Chadha. Computer-Aided
Design of Microwave Circuits. Artech House, 1981
[Hanselman98]
Duane Hanselman and Bruce Littlefield. Mastering Matlab 5: A
Comprehensive Tutorial and Reference. New Jersey, 1998.
Michael T. Heath. Scientific Computing: An Introductory Survey.
McGraw-Hill, 1997. Boston.
[Heath97]
[Ipsen98]
Ilse C. F. Ipsen and Carl D. Meyer. The idea behind Krylov
methods. The American Mathematical Monthly, vol. 105, no. 10,
December 1998, pp. 889-899.
[ItohOO]
Tatsuo Itoh. Full Wave Time-Domain CAD Tools for Distributed
Non-Linear Microwave Circuits. Final Report for MICRO Project
99-053. 2000
[Kelley95]
C. T. Kelley. Iterative Methods for Linear and Nonlinear
Equations. Society for Industrial and Applied Mathematics, 1995.
[KleinerOO]
Madeline A. Kleiner and Mohammed N. Afsar. Determining the
steady-state responses in RF circuits using GMRES, CGS, and
BiCGSTAB solution in sSPICE for Linux. IEEE MTT-S Int.
Microwave Symposium Dig., June 2000, pp. 87-90.
[KleinerOl]
Madeline A. Kleiner and Mohammed N. Afsar. Steady-state
determination for RF circuits using Krylov-subspace methods in
SPICE. 2001.
[Kundert 8 8 ]
Kenneth S. Kundert and Alberto Sangiovanni-Vincentelli. Finding
the steady-state response of analog and microwave circuits. Proc.
IEEE Custom Integrated Circuits Conference, vol. LVII, no. 7,
September 1988, pp. 379-387.
[Kundert90]
Kenneth S. Kundert, Jacob K. White, and Alberto SangiovanniVincentelli. Steady-State Methods for Simulating Analog and
Microwave Circuits. Kluwer Academic Publishers, 1990.
[Kundert95]
Kenneth S. Kundert. The Designer’s Guide to Spice & Spectre.
Kluwer Academic Publishers, 1995.
[Kundert97]
Ken Kundert. Simulation methods for RF integrated circuits.
Proc. 1997 IEEE/ACM International Conference on ComputerAided Design, November 1997, pp. 752-765.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
125
[K undert99]
Kenneth S. Kundert. Introduction to RF sim ulation and its
application. IEEE Journal of Solid-State Circuits, vol. 34, no. 9,
September 1999, pp. 1298-1319.
[Larcheveque96]
R. Larcheveque and E. Ngova. Compressed transient analysis
speeds up the periodic steady state analysis of nonlinear
microwave circuits. 1996 EEEE MTT-S Digest, pp. 1369-1372.
[Lau97]
W. Lau. An integrated controller for a high frequency buck
converter. M aster’s thesis, University of California at Berkeley,
1997.
[Maas99]
Stephan A. Maas. An Essay on Nonlinear Analysis in RF Design.
Applied Wave Research, Inc. 1999
[Maliniak95]
Lisa Maliniak. RF simulator overcomes speed, capacity obstacles.
Penton Publishing, Inc., 1995.
[McCalla71]
William J. McCalla and Donald O. Pederson. Elements of
computer-aided circuit analysis. EEEE Transactions on Circuit
Theory, vol. CT-18, no. 1, January 1971, pp. 14—26.
[Meyer89]
Robert G. Meyer, William D. Mack. A wide-band class AB
monolithic power amplifier, EEEE Journal of Solid-State Circuits,
vol. 24, no. 1, Feb 1989, pp. 7-12.
[Nagel71]
Laurence Nagel and Ronald Rohrer. Com puter analysis of
nonlinear circuits, excluding radiation (CANCER). EEEE Journal
of Solid-State Circuits, August 1971, pp. 166-181.
[Nagel75]
Laurence Nagel. SPICE2: A Computer Program to Simulate
Semiconductor Circuits. University of California, Berkeley, May
1975. Available through Electronics Research Laboratory
Publications, U. C. B., 94720, Memorandum No. UCB/ERL
M75/520.
[Nichols94]
K. G. Nichols, T. J. Kazmierski, M. Zwolinski, and A. D. Brown.
Overview of SPICE-like circuit simulation algorithms. In EEE
Proc.-Circuits Devices Syst., vol. 141, no. 4, August 1994.
Institution of Electrical Engineers, pp. 242-250.
[Ogrodzki]
Jan Ogrodzki. Circuit Simulation Methods and Algorithms.CRC
Press, 1994.
[Part-Enander96]
Eva Part-Enander et al. The MATLAB Handbook. AddisonWesley, 1996.
[Pederson84]
Donald O. Pederson. A historical review of circuit simulation.
IEEE Transactions on circuits and Systems, vol. CAS-31, no. 1,
January 1984, pp. 103-111.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
126
[Press92]
W illiam H. Press, Brian P. Flannery, Saul A. Teukolsky, William
T. Vetterling. Numerical Recipes in C: The Art o f Scientific
Computing. Cambridge University Press, 1992.
[Quarles89a]
Thomas L. Quarles. Analysis of Performance and Convergence
Issues for Circuit Simulation. Ph D dissertation, University of
California, Berkeley, April 1989. Available through Electronics
Research Laboratory Publications, U. C. B., 94720, Memorandum
No. UCB/ERL M89/42.
[Quarles89b]
Thomas L. Quarles. The front end to simulator interface. Part of
Ph D dissertation Analysis of Performance and convergence Issues
for Circuit Simulation, University of California, Berkeley, April
1989.
[Quarles89c]
Thomas L. Quarles. The SPICE3 implementation guide. Part of
Ph D dissertation Analysis of Performance and convergence Issues
for Circuit Simulation, University of California, Berkeley, April
1989.
[Quarles89d]
Thomas L. Quarles. Adding devices to SPICE3. Part of Ph D
dissertation Analysis of Performance and convergence Issues for
Circuit Simulation, University of California, Berkeley, April 1989.
[Quarles89e]
Thomas L. Quarles. SPICE3 version 3C1 users guide. Part of Ph
D dissertation Analysis of Performance and convergence Issues
for Circuit Simulation, University of California, Berkeley, April
1989.
[Quarles89f]
Thomas L. Quarles. Benchmark circuits: Results for SPICE3. Part
of Ph D dissertation Analysis of Performance and convergence
Issues for Circuit Simulation, University of California, Berkeley,
April 1989.
[Razavi98]
Behzad Razavi. RF IC design challenges. In proc. Design
Automation Conference 1998, pp. 408-413. Association of
Computing Machinery.
[Rheinboldt98]
W erner C. Rheinboldt. Methods for Solving Systems of Nonlinear
Equations. Society for Industrial and Applied Mathematics,
Philadelphia, 1998.
[Rodrigues98]
Paulo J. C. Rodrigues. Computer-Aided Analysis of Nonlinear
Microwave Circuits. Artech House, Inc., 1998.
[Saad81]
Y. Saad. Krylov subspace methods for solving large unsymmetric
linear systems. In Mathematics of Computation, vol. 37, no. 155,
July 1981. American Mathematical Society, pp. 105—126.
[Saad96]
Yousef Saad. Iterative Methods for Sparse Linear Systems. PWS
Publishing Co., 1996.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
127
[Saad99]
Yousef Saad and Henk A. van der Vorst. Iterative solution of
linear systems in the 20-th Century. University of Minnesota
Supercomputing Institute Research Report UMSI 99/152,
September 1999.
[Sangiovanni98]
Alberto L. Sangiovanni-Vincentelli. Design and verification of
analog and RF integrated circuits. Proc. Final Report 1998-99 for
MICRO Project 98-134.
[Simon93]
C. Simon, M. Sadkane, and S. Mottet. Newton-GMRES method
for coupled nonlinear systems arising in semiconductor device
simulation. Simulation of Semiconductor Devices and Processes,
vol. 5, September 1993, pp. 81-84.
[Skelboe80]
Stig Skelboe. Computation of the periodic steady-state response
of nonlinear networks by extrapolation methods. EEEE
Transactions on Circuits and Systems, vol. CAS-27, no. 3, March
1980, pp. 161-175.
[Skewchuk94]
Jonathan Richard Skewchuk. An Introduction to the Conjugate
Gradient Method Without the Agonizing Pain. CMU-CS-94-125.
[Sleijpen93]
Gerard L. G. Sleijpen and Diederik R. Fokkema. Bi-CGSTAB(L)
for linear equations involving unsymmetric matrices with complex
spectrum. In Electronic Transactions on Numerical Analysis. Kent
State University, vol. 1, September 1993, pp. 11-32.
[Strang76]
Gilbert Strang. Linear Algebra and Its Applications. Academic
Press, 1976.
[Steer91a]
Michael B. Steer. Simulation of nonlinear microwave circuits —
an historical perspective and comparisons. IEEE MTT-S Digest,
1991, pp. 599-602.
[Steer91b]
Michael B. Steer, Chao-Ren Chang, and George W. Rhyne.
Computer-aided analysis of nonlinear microwave circuits using
frequency-domain nonlinear analysis techniques: The state of the
art. In International Journal of Microwave and Millimeter-Wave
computer-Aided Engineering, vol. 1, no. 2, pp. 181—200. John
Wiley & Sons, Inc., 1991.
[Stewart98]
G.W. Stewart. Aftemotes goes to Graduate School: Lectures on
Advanced Numerical Analysis. Society for Industrial and Applied
Mathematics, 1998.
[Telichevesky95]
Ricardo Telichevesky, Kenneth S. Kundert, and Jacob K. White.
Efficient steady-state analysis based on matrix-free Krylovsubspace methods. Proc. 1995 Design Automation Conference,
June, 1995. Association of Computing Machinery, pp. 480^4-84.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
128
[Telichevesky96a]
R. Telichevesky, K. Kundert, I. Elfadel, and J. W hite. Fast
Simulation Algorithms for RF Circuits. Proceedings of EEEE
Custom Integrated Circuits Conference, May 1996, pp. 437—444.
[Telichevesky96b]
Ricardo Telichevesky, Ken Kundert, and Jacob White. Receiver
characterization using periodic small-signal analysis. Proc. IEEE
Custom Integrated Circuits Conference, 1996, pp. 449-452.
[Telichevesky96c]
Ricardo Telichevesky, Ken Kundert, and Jacob White. Efficient
AC and noise analysis of two-tone RF circuits. In proc. Design
Automation Conference, Association of Computing Machinery,
June 1996, pp. 292-297.
[Trajkovic98]
Ljiljana Trajkovic, Eula Fung, and Seth Sanders. HomSPICE:
simulator with homotopy algorithms for finding DC and steadystate solutions of nonlinear circuits. Proc. IEEE Int. Symposium
on Circuits and Systems, June 1998.
[Trefethen97]
Lloyd N. Trefethen and David Bau. Numerical Linear Algebra.
SIAM, 1997.
[Trick75]
Timothy N. Trick and Francis Robert Colon. Computation of
capacitor voltage and inductor current sensitivities with respect to
initial conditions for the steady-state analysis of nonlinear periodic
circuits. IEEE Transactions on Circuits and Systems, vol. CAS-22,
no. 5, May 1975, pp. 391-396.
[Valtonen94]
Martti Valtonen. Microwave Circuit Simulation Using HigherOrder Dynamic Elements. Circuit Theory Laboratory Report CT20, October 1994
[van der Vorst92]
H. A. van der Vorst. Bi-CGSTAB: a fast and smoothly
converging variant of Bi-CG for the solution of nonsymmetric
linear systems. SIAM Journal on Scientific and Statistical
Computing, vol. 13, no. 2, March 1992, pp. 631-644.
[Vladimirescu94]
Andrei Vladimirescu. The Spice Book. John Wiley and Sons,
1994.
[Vuik92]
C. Vuik and H. A. van der Vorst. A comparison of some GMRESlike methods. In Linear Algebra and its Applications. Elsevier
Science Publishing Co., Inc., 1992, pp 131-162.
[YangOOa]
Baolin Yang and Joel Phillips. A multi-interval Chebyshev
collocation method for efficient high-accuracy RF circuit
simulation. In Design Automation Conference 2000, Los Angeles,
CA. Association for Computing Machinery, 2000, pp. 178-183.
[YangOOb]
Baolin Yang and Dan Feng. Efficient finite-difference method for
quasi-periodic steady-state and small signal analyses. EEEE, 2000,
pp. 272-276.
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
129
Published Papers
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Steady-State Determination for RF Circuits Using Krylov-subspace
Methods in SPICE
Madeline A. Kleiner and Mohammed N. Afsar
Department of Electrical Engineering and Computer Science
Tufts University. Medford. MA 02155
Abstract —
A SPICE-bascd direct shooting-Newton
method for the determ ination o f the steady-state response of
RF circuits has been developed. Different Krylov-subspace
methods, including G.MRES, CC S, BiCG, QM R, and
BiCGSTAB, w ere used to solve the iterative equations
generated by the shooting-Ncwton algorithm. T he publicdomain circuit sim ulator, SPIC E, was used for the
implementation o f the new steady-state analysis. Com pared
to standard transient analysis for the determination o f the
steady-state response for non-linear circuits encountered in
RF design, this new method is much more efficient. RF
circuits that a re difficult to sim ulate arc evaluated. For larg er
circuits, the G M RES, Q M R , and BiCGSTAB algorithm s
show the most im provem ent in the time to calculate the
steady-state.
I.
generated by the steady-state response determination with
much greater efficiency. The Krylov-subspace method o f
GMRES has been implemented with the direct shootingNewton method [6]. However, it is not implemented in
public-domain software, and does not include other
Krylov-subspace methods for comparison.
There are two parts to this new method. First, the direct
approach to steady-state determination is implemented in
SPICE, second, several different Krylov-subspace
methods, including BiCG and QMR are used to solve the
iterate generated by the direct method [5]. Examples from
microwave circuits are used to illustrate the new method.
II. STEADY-STATE
In tro d u c tio n
Given today's increasing demand for communication
devices, including cellular telephones and other cordless
devices, more effective methods are needed to examine
integrated circuits that make up these devices in the
steady-state mode. Quantities such as power, distortion,
and noise are evaluated in the steady-state and need to be
studied in detail [1], Standard simulators, such as SPICE,
can use transient analysis to determine the steady-state
response by simulating until all the transients have died
out. Unfortunately, for the type o f circuits used in
communication devices, this takes much too long for a
detailed analysis to be made. Harmonic balance methods
are not optimal for strongly non-linear circuits that we
need to evaluate. Direct time-domain methods are a good
choice for steady-state determination [2]-[3].
For SPICE-based circuit simulators, the time-domain
shooting-Newton method is relatively easy to implement
[4]-[5]. The direct shooting-Newton method o f steadystate determination was proposed by Aprille and Tricke
[2], and when implemented in an older version o f SPICE
[4], used the traditional Gaussian elimination to solve the
iterate. Because o f the computation costs, this limited the
use o f the algorithm to relatively small circuits. The newer
Krylov-subspace methods can solve these equations
DETERMINATION
The shooting-Newton direct method o f steady-state
determination for circuits with periodic input was
implemented in SPICE 3f5. In addition, the Krylovsubspace methods were also implemented in SPICE 3f5
for the solution of the iterate generated by that method.
Direct methods, such as the shooting-Newton method
presented here, were used to find the initial state needed to
put the circuit directly in steady-state [2], If the circuit
equations are represented as the system:
x = f ( x, / ) -
(1)
where x and f are n vectors, the vector f is periodic in time,
t, and has a period o f T. A constraint for achieving steadystate is that the transient effects have died off. This is
represented by:
x(0) = x( T)
(2)
In other words, the solution at the end o f the period is the
same as the condition at the beginning o f the period. This
means that the circuit is in steady-state. The state transition
function can be used to define the two-point boundary
value problem, thus:
x(O )-0(x(/o ),/0 ,7~) = 0
0-7803-6540-2/01/S10.00 (C) 2001 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
(3)
where § is the state transition function. The state transition
function was implicitly derived; it was calculated at each
timepoint until the end o f the period. It is dependent on the
initial state, Xo, the period o f the response, T, and the
starting time, to- We applied the shooting-Newton method
to solve the boundary value problem that results in the
following iteration:
=x 0(/)* -[l-J .H x 0* -0(x('oMo.n].(4)
where k is the iteration index and J 0 is the sensitivity
matrix represented by:
(5)
J 0 = -— <P(xV0 ) ,t 0 .T )
ax
The sensitivity matrix was computed at the same time as
the state transition function. Quantities needed for the
calculation o f the sensitivity matrix were already available
at each timepoint from the transient analysis. The forming
o f the sensitivity matrix is computationally expensive. The
iterate was solved and, using a user-defined limit, was
considered converged. If not, the circuit was resimulated
and another initial guess was used [2]-[4], This process
was continued until the steady-state was reached. The
shooting-Newton method computed a set o f capacitor
voltages and inductor currents for the circuit so that if
these voltages and currents are used as the initial
conditions for the transient analysis, the circuit is directly
in steady-state.
The accuracy o f the direct steady-state determination
and the accuracy o f the Krylov-subspace methods are
shown in Fig. 1.
The circuit simulated is a DC to 1 GHz class AB
amplifier [7], This circuit is difficult to simulate because
the bias circuitry transients take a long time to settle out.
II.
KRYLOV-SUBSPACE METHODS
The Krylov-subspace methods o f GMRES, BiCG,
QMR, CGS, and BiCGSTAB were incorporated into
SPICE 3f5 for the solution of the linear system of
equations generated by the shooting-Newton method to
determine the steady-state response [8]. These methods are
for general matrices, including the type that is generated
by the shooting-Newton iteration (non-symmetric) [8]. In
the following subsections, equation (4) was solved. The
matrix being referred to is the sensitivity matrix J 0 .
The convergence figures were generated using a
common-base Class C amplifier [9] operated at microwave
frequencies. Appropriate changes were made to the
original circuit for microwave operation. The circuit
contained a total o f 11 capacitors and inductors. The
residual error was calculated within the Krylov-subspace
method. The circuit was difficult to simulate using
transient analysis because the biasing circuitry takes
thousands o f cycles for the transients to settle out.
A. G M R E S
The Generalized Minimal Residual (GMRES) method
generated a sequence o f orthogonal vectors. The vectors
were generated using a special method for Krylovsubspaces called the Amoldi method. It used these vectors
to do a least squares solution. One drawback o f this
method is that all the orthogonal vectors must be stored.
So for large circuits, this storage need could be very large.
It used the actual matrix and not its transpose for solution.
The convergence behavior for the Class C amplifier is
shown in Fig. 2.
E
o
1 6
|-
Numbe r of Iterations
Fig. 2.
C onvergence b e h av io r for K rylov-subspace m ethod
GM RES.
’ i
: l os
: 11
Time (se co n d s)
: us
’ i;
X| 0‘
Fig. I.
Tw o p erio d s o f th e tran sie n t an aly sis o f the C lass AB
a m p lifie r [7] in stead y -state. T he curves for all the m ethods lie
o n to p o f each o th er, as th ey sh o u ld .
0-7803-6540-2/01 /SI 0.00 (C) 2001 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
B. BiCG
D. CGS
BiCG is the Biconjugate Gradient method. It generated
two sequences o f vectors that are made mutually
orthogonal to each other called bi-orthogonal. One o f the
sequences was generated by the original coefficient matrix,
and the other by the transpose o f that matrix. It used much
less storage than GMRES, but had problems with
convergence, and had two matrix-vectors products at each
iteration. Fig. 3 shows its convergence behavior for the
microwave example circuit.
The Conjugate Gradient Squared (CGS) method is a
variant o f BiCG. It reformulated BiCG so that only the
original matrix was needed and avoided transpose vector
operations. Fig. 5 shows the convergence behavior for the
BiCG solution.
£
1 A
•/ \
h_
o
!/
3
CD
' A''
CO
CD
cr
-
CD
>
\
iS
QT
CD
Numbe r of Iterations
Fig. 5
C G S.
1
2
3
4
5
0
7
0
9
10
H
Number of Iterations
Fig. 3.
B iC G .
C.
C o n v erg en ce b eh av io r for K rvlov-subspace m ethod.
QMR
The Quasi-Minimal Residual (QMR) method applied a
least-squares solve and update to the BiCG residuals.
QMR uses less storage than GMRES. It required matrixvector multiplications o f the original matrix and its
transpose at each iteration step. Fig. 4 shows the
convergence behavior for the QMR solution.
C onvergence b e h a v io r Ib r K ry lo v -su b sp ac e m ethod o f
E. BiCGSTAB
Biconjugate Gradient Stabilized (BiCGSTAB) method
is a variant of BiCG that used different updates to avoid
using the transpose o f the matrix. The convergence
behavior for the example is shown in Fig. 6.
z
10
\
z
_co
10
CD
0C
Numb e r of Iterations
10
Fig. 6
C onvergence b e h av io r fo r th e K ry lo v -su b sp ace m ethod
BiC G STA B .
10
2
3
4
S 0
7
a
e
III.
10
11
Number of Iterations
Fig. 4.
QMR.
C o n v erg en ce b e h av io r for the K ry lo v -su b sp ace m ethod
RESULTS
In SPICE 3f5, the direct steady-state determination
using the shooting-Newton method resulted in significant
savings in computational time and resources (Table 1).
The shooting-Newton method is also an accurate way to
0-7803-6540-2/01/S10.00 (C) 2001 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
determine the steady-state response (Fig. 1). The system is
in steady-state when the error is less than the userspecified value. The error measured was obtained using
the maximum difference between the state o f the circuit
after a cycle o f transient analysis and the initial state used
for that cycle. Even greater efficiency was found using the
Krylov-subspace methods for solution o f the iterate
generated by this method (Table 1).
Table 1. Summary o f Krylov-subspace methods and
transient analysis results o f steady-state determination for
a Class C amplifier.
Method Used
Time
ShootingTransient
for Steady(Seconds)
Newton
Analysis
State
To Reach
Iterations
Cycles
Determination Steady-State
Transient
67.8
>3000
analysis
Gaussian
1.76
26
35
Elimination
GMRES
1.53
26
35
CGS
1.68
26
35
BiCG
1.79
26
35
BiCGSTAB
1.53
26
35
QMR
1.53
26
35
The convergence behavior (Fig. 2 through Fig. 6) for the
solution o f the iterate, generated by the shooting-Newton,
pointed out the difficulty in convergence for some the
Krylov-subspace methods. For some methods, such as
CGS and BiCG, the residual is not reduced at each step o f
the iteration. The GMRES, QMR and BiCGSTAB
algorithms converged the fastest. CGS also had fast
convergence, but its convergence was very irregular.
much faster than the transient analysis approach o f
simulation until all the transients have died out. The
Krylov-subspace methods give mixed results in terms of
efficiency (Table 1) over the standard solution method o f
Gaussian elimination. The GMRES, QMR and
BiCGSTAB algorithms show the fastest convergence to
solution. Analysis with larger RF circuits is needed in
order to investigate whether one Krylov-subspace method
would be more computationally efficient in terms o f
storage and operation count for RF circuits.
The direct steady-state determination using Krylovsubspace methods was shown to be an efficient and
accurate method. Its application to larger RF circuits
should result in even greater computer resource savings.
ACKNOW LEDGEM ENT
The authors wish to acknowledge the assistance of
Kenneth Kundert at Cadence Design for the microwave
example circuit used for analysis, and Misha Kilmer of
Tufts University for many discussions on Krylov methods.
Referen ces
[1]
[2]
[3]
[4]
[5]
V. C o n c l u s io n
The shooting-Newton direct steady-state determination
was implemented in SPICE 3f5. The iterate generated by
the method can be solved by using standard Gaussian
elimination, or the different Krylov-subspace methods o f
GMRES, BiCG, QMR, CGS, and BiCGSTAB.
The direct method o f steady-state determination is
shown to be efficient and accurate (Fig. I ) when analyzing
RF circuits that have difficulty in the standard transient
analysis available in SPICE 3f5 (Table I). This efficiency
means shorter simulation times and less computer storage
needs.
The Krylov-subspace methods make for even greater
computer efficiency (Table 1). All the Krylov methods are
[6]
[7]
[8]
[9]
K. S. Kundert. J. K. White, and A. Sangiovanni-Vincentilli.
Steady-State Methods fo r Simulating Analog and
Microwave Circuits. Boston. MA.
T. J. Aprille and T. N. Trick. "Steady-state analysis of
nonlinear circuits with periodic inputs.” Proc. IEEE. vol.
60. pp. 108-114. January 1972.
R. Telichevesky. K. Kundert. I.Elfadel. and J. White. "Fast
simulation algorithms for RF circuits." Proc. o f the Custom
Integrated Circuits Conference. May 1996.
P. N. Ashar. "Implementation o f algorithms for the periodic
steady-state analysis o f nonlinear circuits." Masters Thesis.
University o f California Berkeley. March 1989.
M. A. Kleiner. M. N. Afsar. "Determining the steady-state
responses in RF circuits using GMRES. CGS. and
BiCGSTAB solution in sSPICE for Linux.” 2000 IEEE
MTT-S Int. Microwave Symp. Dig., pp. 87-90. June 2000.
R. Telichevesky. K. S. Kundert. and Jacob K. White.
"Efficient steady-state analysis based on matrix-free
Krylov-subspace methods." Proc. 1995 Design Automation
Conference. Santa Clara. California. June 1995.
R. G. Meyer. W. D. Mack. "A widc-band class AB
monolithic power amplifier." IEEE Journal o f Solid-State
Circuits. Vol. 24. no. 1. pp. 7-12. Feb 1989.
R. Barrett, M. Berry. T. F. Chan. J. Demmel. J. Donato. J.
Dongarra. V. EijkhouL R. Pozo. C. Rominc. and H. van der
VorsL Templates fo r the Solution o f Linear Systems:
Building Blocks fo r Iterative Methods. Philadelphia: SIAM.
1994.
F. R. Colon. T. N. Trick. "Fast periodic steady-state
analysis for large-signal electronic circuits." IEEE Journal
o f Solid-State Circuits. Vol. 8. no.4. pp. 260-269. August
1973.
0-7803-6540-2/01/S10.00 (C) 2001 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
DETERMINING THE STEADY-STATE RESPONSES IN RF CIRCUITS
USING GM RES, CGS, AND BICGSTAB SOLUTION IN sSPICE FOR LINUX
Madeline A. Kleiner, Mohammed N. Afsar, Fellow IEEE
Department of Electrical Engineering and Computer Science
Tufts University
Medford, MA.
ABSTRACT
sSPICE, an extension o f SPICE3f5 for
Linux, is under development to determine the
steady-state responses for RF integrated
circuits. The shooting-Newton algorithm for
steady-state determination is incorporated
with solution of the resulting iterates using
the Krylov-subspace methods, GMRES, CGS,
and BICGSTAB. The direct method shows
much more computer efficiency than the
regular transient analysis in SPICE. Only
small improvements are seen using the
explicit matrix instead of Gaussian
elimination methods. Using the implicit form
o f GMRES resulted in improvements over the
explicit GMRES and over the shootingNewton Gaussian elimination method.
INTRODUCTION
Determining the steady-state response for
integrated circuits that are lightly damped and
have large time constants is computationally
expensive. Simulating high Q circuits is also
computationally expensive and these types o f
circuits occur regularly in communication
systems. Four methods are commonly used to
determine the steady-state response [1]: transient
analysis, finite difference, shooting-Newton, and
harmonic balance. In SPICE, the most popular
analog circuit simulator, the steady-state is
determined by allowing the time-domain
transient analysis to run until all the transient
signals have died o ff and all that remains is the
periodic steady-state. With many integrated
circuits this simulation takes a long time. The
use o f SPICE, therefore, is limited to integrated
circuits having few components. The finite
difference method and the shooting-Newton
method both use the direct method [2], The
direct method uses the determination o f the
steady-state as a solution to a two-point
boundary value problem. But using the standard
matrix methods o f solution, this m ethod is still
too expensive. Lastly, the harmonic balance
method is used extensively and works well for
weakly nonlinear circuits, but becomes
computationally expensive for strongly
nonlinear circuits. SPICE can sim ulate these
strongly nonlinear circuits, but with poor
computer efficiency.
Much work has explored the use o f newer
numerical methods to take advantage o f matrix
sparsity and the freedom from having to
explicitly form the matrix needed for solution.
GMRES, a robust non-stationary iterative
method has been extensively reported as a speed
up for steady-state determination in the timedomain [3,4]. Both GMRES and Q M R have
been used in the harmonic balance m ethod to
speed up steady-state solution on linear circuits
[5]. Numerical methods not included in the nonstationary iterative methods have also been
explored. Homotopy methods, for example,
have been used to find the steady-state solution
[6], but, unfortunately, they were found to be
computationally expensive.
sSPICE as presented here solves som e of
these problems. It is an extension o f SPICE 3f5,
the analog circuit simulator developed at the
University o f California Berkeley. It uses the
same direct methods for steady-state
determination as incorporated into the SPICE
3c 1-version [6,7], GMRES, CGS, and
BICGSTAB, all non-stationary iterative methods
are incorporated into the code and com piled on
Linux. A version o f GMRES is available that
uses the implicit matrix technique[4]. Users
may choose either the standard transient analysis
in SPICE or the shooting-Newton method in
combination with the GMRES, CGS, or
BICGSTAB method to solve the equations
0-7803-5687-X/00/S10.00 (c) 2000 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
formed in the solution. Results from the
comparison o f the different methods are
presented below.
1. ITERATIVE METHODS
Non-stationary iterative methods are an
efficient means to solve non-symmetric linear
systems, such as the large system o f equations
generated during the shooting-Newton method
o f steady-state determination. They are Krylovsubspace methods and variations on the
Conjugate Gradient m ethod for symmetric,
positive definite systems. Iterative methods are
used to solve the system represented as: Ax = b.
The main differences between the nonsymmetric iterative methods lies in (1) the
method o f constructing the basis vectors for the
Krylov-subspaces and (2) the choice o f the
iterate [9]. For GMRES, an orthogonalization
method, the matrix A itself is used. BICGSTAB
and CGS use a biorthogonalization method. The
'A
0 j
matrix
is used to construct the Krylov
subspace. The direct solvers such as Gaussian
elimination have the workload o f O(m) steps
each requiring O(rrf) work for a total work
estimate o f 0 (m 3)[8]. This severely limits the
number o f equations that can make up the
system being solved. The iterative methods
theoretically require the same amount o f work,
but this applies only to the worst case. By using
the sparsity and special structure o f the large
matrices generated by the shooting-Newton
method o f solution for the steady-state response,
the amount o f work can be reduced to O(m) at
best or at least 0 (m : ). The Krylov-subspace
algorithms also allow one to avoid the formation
o f the matrix A above. Instead o f actually
forming this matrix, its parts can be used in the
matrix-multiplications needed for the
calculations in the GMRES, CGS, and
BICGSTAB. This method allows much larger
circuits to be analyzed.
The sSPICE code implementing these
methods was written in C and incorporated into
the University o f California Berkeley SPICE 3f5
[9], Users can specify the direct steady-state
determination using either the standard Gaussian
elimination or the non-stationary iterative
methods GMRES, CGS, or BICGSTAB.
2. STEADY-STATE
The determination o f the steady-state
response o f many wireless communication
devices driven by periodic inputs is o f great
interest to designers. I f an analog circuit
simulator such as SPICE is used for analysis, the
differential equations generated by the system
are integrated over tim e interval long enough for
transient waveforms to die out. With rapidly
decaying transients, SPICE can determine the
steady-state response w ith great efficiency. But
many o f the RF circuits have transients that
decay very slowly and it becomes almost
impossible to integrate the differential equations
over the entire transient response needed. Direct
methods, such as the shooting-Newton method
presented here, will find the initial state needed
to put the circuit directly in steady-state [2], If
the circuit equations are represented as the
system:
(1) x = f( x, D,
where x and f are n vectors, the vector f is
periodic in time, t, and has a period o f T. A
constraint for achieving steady-state is that the
transient effects have died off. This is
represented by:
(2) x(0) = x(D.
In other words, the solution at the end o f the
period is the same as the condition at the
beginning o f the period. This means that the
circuit is in steady-state. The state transition
function can be used to define the two-point
boundary value problem , thus:
(3) x(0)-<#x(/0),/0, T) =0,
where $ is the state transition function. The state
transition function is implicitly derived; it is
calculated at each tim epoint until the end o f the
period. It is dependent on the initial state, Xo, the
period o f the response, T, and the starting time,
to- Applying the shooting-Newton method to
solve the boundary value problem results in the
following iteration:
(4)
*0(')*+l = x0(O* - [ l - Jd] W
-<#x(r0),/0,n]
where k is the iteration index and J 0 is the
sensitivity matrix and is represented by:
(5) J j, =^-<#x(r0),r0, n .
0-7803-5687-X/00/S 10.00 (c) 2000 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
The sensitivity matrix can be computed at the
same time as the state transition function.
Quantities needed for the calculation o f the
sensitivity m atrix are already available at each
timepoint from the transient analysis. One only
uses these quantities and calculates the
sensitivity matrix. The iterate is solved and,
using a user defined limit, is considered
converged or the circuit is sim ulated and another
initial guess is used [2,7]. This process
continues until the steady-state is reached. The
shooting-Newton m ethod computes a set o f
capacitor voltages and inductor currents for the
circuit so that if these voltages and currents are
used as the initial conditions for the transient
analysis, the circuit is directly in steady-state.
The solution o f this iterate equation is
computationally expensive. Forming and
solving the sensitivity matrix, because it is
dense, is the m ost expensive. If a method like
Gaussian elimination is used, then the costs are
very high and the speed-up o f the direct method
o f steady-state determination is limited. The
newer iterative m ethods result in m uch greater
computational savings. An even greater savings
in computation time can be realized if instead o f
forming the sensitivity matrix at each transient
analysis step, the quantities are stored and then
accessed when needed.for the solution o f the
iterate in the Krylov-subspace m ethod[l, 3].
T able
C ircuit
#
1
2
3
4
5
6
7
8
3.
1: T e st C ircuits
C irc u it
# S tates
DC Pow er Supply
Class C am plifier-low Q
Class C am plifier -h ig h Q
Class C am plifier
Colpitts Oscillator- Q=50
Colpitts Oscillator-Q =100
CoIpitts-Oscillator-high
freq.
EC-Colpitts Oscillator
4
5
5
11
3
3
3
2
R E SU LT S
Table 1 lists the circuits used for simulation.
Using SPICE3f5, their steady-state responses
produce very long transient sim ulation times.
The results are summ arized in Figure 1 and
Figure 2 for the shooting-Newton m ethod o f
steady-state determination. The relative CPU
time is the time o f the iterative method divided
by the tim e for Gaussian elimination. Using
GMRES, CGS, and BICGSTAB iterative
m ethods produce the expected significant
improvement over regular transient analysis
determ ination o f steady-state. However,
BICGSTAB seems to have difficulty converging
to solution for some circuits, and for circuit 7
converged to the incorrect solution. U sing the
shooting-Newton algorithm for direct
calculation o f the steady-state response show s
improvement over the transient analysis.
However, large improvements expected from the
iterative methods used to solve the shootingNewton iterate are not evident. Using the
implicit matrix method with GM RES resulted in
more significant timesavings.
Because o f memory overflow problems
simulation o f larger example circuits was not
possible. The next round o f examples will be
run on a PC that has available more swap space
available for Linux. Other non-stationary
Krylov-subspace methods will be implem ented
and compared.
4.
REFERENCES
[1] K. S. Kundert, J. K. White, and A. SangiovanniVincentilli, Steady-State M ethods fo r Sim ulating
A nalog and Microwave Circuits. Boston, MA:
Kluwer Academic Publishers, 1990.
[2] T. J. Aprille and T. N. Trick, “Steady-state
analysis o f nonlinear circuits with periodic
inputs,” Proc. IEEE, vol. 60, pp. 108-114,
January 1972.
[3] R. Telichevesky, K. S. Kundert, Jacob K. White.
“Efficient steady-state analysis based on matrixfree Krylov-subspace methods." Proc. 1995
Design Automation Conference, Santa Clara,
California, June 1995.
[4] R. Telichevesky, K. Kundert, I.ElfadcI, J. White,
“Fast simulation algorithms for RF circuits”,
Proc. O f the Custom Integrated Circuits
Conference, May 1996.
[5] M. M. Gourary, S. G. Rusakov, S. L. Ulyanov,
M. M. Zharov, K. K. Gullapalli, B. J. Mulvaney,
“Iterative solution of linear systems in harmonic
balance analysis,” pp. 1507-1510, 1997 IEEE
MTT-S Digest.
[6] L. Trajkovic, E. Fung, S. Sanders, “ HomSPICE:
simulator with homotopy algorithms for finding
0-7803-5687-X/00/S10.00 (c) 2000 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
[9] R. Barren, M. Berry, T. F. Chan, J. Demmel, J.
Donato, J. Dongarra, V. Eijkhout, R. Pozo, C.
Romine, H. van der Vorst, Templates fo r the
DC and steady-state solutions o f nonlinear
circuits,” Proc. IEEE Symposium o f Circuits
and Systems, Monterey, California, June 1998.
[7] P. N. Ashar, “Implementation o f algorithms for
the periodic steady-state analysis o f nonlinear
circuits,” Masters Thesis, UC Berkeley, March
1989.
[8] L. N. Trefethen, D. Bau,III, N um erical Linear
A lg eb ra , Philadelphia: SIAM, 1997.
1
2
3
Solution o f Linear System s: Building Blocks fo r
Iterative Methods, Philidelphia: SIAM, 1994.
4
5
6
7
8
Circuit
Figure 1. Comparison o f GMRES, CGS, and BICGSTAB methods
□ I MPLI CI T G M R E S
Q g m res
zt
CL
(J 0 6
F igure 2. Comparison o f explicit and implicit GMRES.
0-7803-5687-X/00/S 10.00 (c) 2000 IEEE
Reproduced with permission of the copyright owner. Further reproduction prohibited without permission.
Документ
Категория
Без категории
Просмотров
0
Размер файла
5 419 Кб
Теги
sdewsdweddes
1/--страниц
Пожаловаться на содержимое документа