INFORMATION TO USERS This manuscript has been reproduced from the microfilm m aster. UMI films the text directly from the original or copy submitted. Thus, som e thesis and dissertation copies are in typewriter face, while others may be from any type of computer printer. The quality of this reproduction is dependent upon the quality of the copy submitted. Broken or indistinct print, colored or poor quality illustrations and photographs, print bleedthrough, substandard margins, and improper alignment can adversely affect reproduction. In the unlikely event that the author did not send UMI a com plete manuscript and there are missing pages, these will be noted. Also, if unauthorized copyright material had to be removed, a note will indicate the deletion. Oversize materials (e.g., maps, drawings, charts) are reproduced by sectioning the original, beginning at the upper left-hand comer and continuing from left to right in equal sections with small overlaps. Each original is also photographed in one exposure and is included in reduced form at the back of the book. Photographs included in the original manuscript have been reproduced xerographically in this copy. Higher quality 6” x 9” black and white photographic prints are available for any photographs or illustrations appearing in this copy for an additional charge. Contact UMI directly to order. Bell & Howell Information and Learning 300 North Zeeb Road, Ann Arbor, Ml 48106-1346 USA 800-521-0600 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. E F F IC IE N T H A R M O N IC B A L A N C E M O D E L IN G OF LARG E M IC R O W A V E C IR C U ITS by S T E V E N G L E N SK AG G S A thesis su b m itted to th e G raduate Faculty of N orth C arolina S tate University in p artia l fulfillment of the requirem ents for the Degree of D octor of Philosophy D E P A R T M E N T O F E L E C T R IC A L A N D C O M P U T E R E N G IN E E R IN G Raleigh 1999 A P P R O V E D BY: C hair of Advisory Com m ittee Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. UMI Number: 9933902 Copyright 1999 by Skaggs, Steven Glen All rights reserved. UMI Microform 9933902 Copyright 1999, by UMI Company. All rights reserved. This microform edition is protected against unauthorized copying under Title 17, United States Code. UMI 300 North Zeeb Road Ann Arbor, MI 48103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Abstract Skaggs, Steven Glen Efficient Harm onic B alance M odeling of Large M icrowave Circuits. (U nder th e direction of Michael B. S teer.) T h e purpose of this research is to provide im provem ents in th e harm onic balance technique for th e sim ulation of large microwave circuits. T h e harm onic balance technique becom es less efficient w ith an increase in circu it size, as th e nonlinear system of equations to be solved becom es large. Since a Jaco b ian m a trix of rank N costs 0 ( N 3) floating p o in t operations to decompose, it is desirable to reduce th e b o th th e num ber of tim es th e Jacobian m ust be ev alu ated and th e am ount of processing required for m a trix decom position. For N ew ton-R aphson based harm onic balance, th e Jacobian m ay b e ap p ro x im ated in such a w ay th a t decom position is less expensive. Additionally, th e approxim ated Jacobian is often superior to th e original form ulation of th e Jaco b ian in term s of the num ber o f iteratio n s required for sim ulation convergence. Sim ilarly, K rylov subspace based m eth o d s m ay be im proved by using Jacobian preconditioners. T his linear iterativ e tech n iq u e has been shown to be m ore efficient for large m o d erately nonlinear m icrow ave circuits. This stu d y explores th e effects of different preconditioners for linear ite ra tiv e solvers in th e harm onic balance technique. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. B iograp h ical S u m m ary Steven Glen Skaggs was born D ecem ber 12, 1966 in Lafayette, IN . He received th e Bachelor of Science degree in electrical engineem g s umma cum laude from N orth C arolina S ta te U niversity in 1989. In 1991, he received the M aster of Science de gree, also from N o rth C arolina S tate U niversity. His M asters thesis research was the ex tractio n of m icrow ave tran sisto r m odel p aram eters using tree annealing optim iza tion. From 1994 to 1995, he was em ployed by C om pact Software in P aterson, NJ as a senior m icrow ave circuit engineer, working on a com mercial harm onic balance sim ulator. Since A ugust of 1995, he has been em ployed by Avant! C orporation, first as a software developer and more recently as a product specialist. His research inter ests include h arm o n ic balance sim ulation of microwave circuits and device modeling an d p aram eter ex tractio n . ii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A c k n o w le d g e m e n ts F irst of all, I m ust th an k my wife Leah and my son Luke for th eir sacrifice during all this, especially in th e last year. I am looking forward to having more tim e to spend w ith you two. T his work is d edicated to you. N ext, I am deeply indebted to M ichael Steer for his guidance and direction during m y g rad u ate and even u n dergraduate education. M ichael also served as my “conscience” on occasion when I let m y full tim e job responsibilities prevent me from progressing on my thesis. Speaking of m y conscience, thanks to D ad for keeping up w ith m y progress. L e t’s go play some golf now. T hanks also to those who have served as my supervisors while I have been working full tim e. T hanks, Jason G erber, M ark Basel, Jeff B yrd, Jo h n Studders, and K eith Lanier. You guys were all supportive and understanding about my graduate studies. T h an k s also to Carlos Christoffersen and Shunmin W ang for your assistance in the past year. A nd finally, thanks to my m any friends and co-workers who never laughed in my face when I talked ab o u t “finally finishing m y PhD .” 111 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Table of Contents L ist o f F ig u res L ist o f T ab les 1 In tro d u c tio n 2 1.1 M o tiv a tio n .............................................................................................................. 1.2 T hesis O v erv iew .................................................................................................... R e v ie w o f H arm on ic B alance T echniques 2.1 Posing th e Harmonic Balance A nalysis P r o b l e m ...................................... 2.2 R elax atio n M ethods 2.3 N ew ton M e t h o d s ................................................................................................. 2.4 Inexact N ew ton M e th o d s ................................................................................... 2.5 3 .......................................................................................... 2.4.1 Iterativ e Linear S o lv e rs ......................................................................... 2.4.2 Incom plete LU D ec o m p o sitio n ........................................................... S u m m a r y .............................................................................................................. D e v e lo p m e n t o f a H arm onic B a la n c e Sim ulator 3.1 N ew ton-based Harmonic B alance A n a l y s i s ................................................ 3.1.1 3.2 Form ing the Jacobian M a t r i x ........................................................... Im provem ents in Numerical Techniques for Harm onic Balance 3.2.1 . . . . Sparse M atrix T e c h n iq u e s .................................................................. iv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4 3.2.2 O v e r s a m p lin g ..............................................................................................24 3.2.3 T h e S ta te Variable A pproach 3.2.4 T h e C h o rd M e th o d ....................................................................................26 3.2.5 Jaco b ian A pproxim ation a n d Preconditioning Techniques . . . ...............................................................25 27 N e w to n -b a sed H a rm o n ic B alance w ith A p p roxim ate Jacob ia n M a tric es 39 4.1 D istrib u ted A m p lifie r .............................................................................................. 39 4.1.1 Using th e Full Jacobian M a t r i x ........................................................... 42 4.1.2 Using Block Jacobian M atrix T e c h n iq u e s ..........................................50 4.1.3 Using th e Linear Jacobian along w ith the Diagonal of th e N onlinear J a c o b i a n .................................................................................... 53 4.2 5 4.1.4 Using th e Linear Jacobian o n l y ........................................................... 56 4.1.5 U sing a Threshold Value for N onlinear Jacobian C ontributions 4.1.6 S u m m a r y .....................................................................................................64 59 N onlinear Transm ission L i n e s ............................................................................. 67 4.2.1 A 10 diode N L T L ....................................................................................... 70 4.2.2 A 47 diode N L T L .......................................................................................81 4.2.3 S u m m a r y .................................................................................................... 89 H arm on ic B a la n c e Sim u lation U sin g In e x a c t N ew to n M e th o d s w ith a P re co n d itio n ed Ja co b ia n M atrix 5.1 91 Sim ulation of a D istributed A m p lifie r ...............................................................91 v Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 6 5.1.1 Low Input P o w e r ........................................................................................ 92 5.1.2 High Input P o w e r ........................................................................................ 94 S im ulation of a Short N L T L ...............................................................................99 5.2.1 Low input v o lta g e ........................................................................................ 99 5.2.2 High input v o l t a g e ...................................................................................105 5.3 S im ulation of a Longer N L T L ..........................................................................110 5.4 S u m m a r y ................................................................................................................115 C on clu sion s and Future R esearch 117 6.1 D is c u s s io n ................................................................................................................117 6.2 Suggestions for F urther R e s e a r c h ...................................................................119 vi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. List of Figures ‘2 .0.1 Circuit p artitio n ed into linear a n d nonlinear subcircuits.............................. 4 3.2.1 Diagonal blocking scheme using 16 blocks along th e m a trix diagonal. The shaded areas indicate th e location of m atrix entries which will be used........................................................................................................................ 33 3.2.2 Diagonal blocking scheme using 8 blocks along th e m a trix diagonal. The shaded areas indicate th e location of m atrix entries which will be used. Each sm all shaded block represents all derivatives of the error function at a given frequency w ith respect to all the state variables a t a second given frequency................................................................33 3.2.3 Diagonal blocking scheme using 4 blocks along the m a trix diagonal. The shaded areas indicate th e location of m atrix entries which will be used. Each sm all shaded block represents all derivatives of the error function at a given frequency w ith respect to all the state variables a t a second given frequency................................................................34 3.2.4 Diagonal blocking scheme using 2 blocks along the m a trix diagonal. The shaded areas indicate th e location of m atrix entries which will be used. Each sm all shaded block represents all derivatives of the error function a t a given frequency w ith respect to all th e state variables a t a second given frequency................................................................34 vii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.5 Level one off-diagonal blocking schem e. T he shaded areas indicate th e all derivatives of th e erro r function a t a given frequency with respect to all th e state variables a t a second given freq u en cy .................................35 3.2.6 Level two off-diagonal blocking schem e. T he shaded areas indicate th e location of m atrix en tries w hich will be used. Each small shaded block represents all derivatives of th e error function at a given fre quency with respect to all th e s ta te variables a t a second given fre quency. .......................................................................................................................36 3.2.7 Level th ree off-diagonal blocking schem e. T h e shaded areas indicate th e location of m atrix en tries w hich will be used. Each small shaded block represents all derivatives of th e error function at a given fre quency with respect to all th e s ta te variables a t a second given fre quency. .......................................................................................................................36 4.1.1 D istrib u ted Amplifier C i r c u i t ................................................................................... 40 4.1.2 M agnitude of the o u tp u t voltage sp ectru m of the distributed am plifier in dB m w ith input pow er levels of 0 (O ), 10(-f), and 20 d B m (d ). T h e x-axis represents frequency in H z.............................................................. 41 4.1.3 R u n tim e in machine cycles for sim ulation of the distributed am plifier for different input pow er levels............................................................................ 42 viii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.4 N u m b er of New ton-Rap hson iterations required for convergence of sim u la tio n of the d istrib u te d am plifieer circuit w ith respect to input pow er level.................................................................................................................43 4.1.5 M ag n itu d e of nonlinear jaco b ian contributions with respect to proxim ity to th e diagonal of th e jacobian in th e sim ulation of th e distributed am plifier circuit. T h e x-axis is th e absolute difference between the row an d column indices of th e nonlinear entry, while th e y-axis is the ab so lu te value of th e entry. T h e entries of the Jacobian at 0 dBm in p u t power are represented by O, while the entries of th e Jacobian a t 20 dB m input power are represented by -f-................................................46 4.1.6 R u n tim e in m achine cycles for harm onic balance of th e d istrib u ted am plifier circuit w ith respect to input power level. T h e upper curve corresponds to calculating th e Jacobian at every step in th e NewtonR ap h so n solving process, while th e lower curve corresponds to using th e sam e Jacobian th ro u g h o u t th e solving process.......................................49 4.1.7 N u m b er of iterations required for sim ulation of the d istrib u ted amplifier w ith different blocking schem es.......................................................................... 51 4.1.8 S im ulation runtim e of th e d istrib u ted amplifier for different blocking schem es. R untim e is given in m achine cycles.................................................52 4.1.9 N u m b er of iterations required for sim ulation of the d istrib u ted amplifier using nonlinear Jaco b ian contributions on the m a trix diagonal only. ix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 54 4.1.10 S im ulation runtim e required for sim ulation of th e d istrib u ted amplifier using nonlinear Jacobian contributions on th e m atrix diagonal only. 55 4.1.11 N um ber of iterations required for sim ulation of th e distrib u ted amplifier w hen only linear Jacobian contributions are u sed ........................................ 57 4.1.12 S im ulation runtim e (in m achine cycles) required for th e distributed am plifier when only linear Jacobian contributions are used.............................. 58 4.1.13 S im ulation runtim e (in m achine cycles) required for th e distributed am plifier a t low power w hen Jacobian entries are subject to threshold levels. T he x-axis represents the base 10 logarithm of th e threshold value............................................................................................................................ 61 4.1.14 N um ber of iterations required for convergence of th e Newton-Raphson m eth o d for the d istrib u ted amplifier at low power when Jacobian en tries are subject to threshold levels. T he x-axis represents the base 10 logarithm of th e threshold value..........................................................62 4.1.15 S im ulation runtim e (in m achine cycles) required for th e distributed am plifier a t high power w hen Jacobian entries are subject to threshold levels. The x-axis represents th e base 10 logarithm of th e threshold value............................................................................................................................ 65 x Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.16 N um ber of iterations required for convergence of th e N ew ton-Raphson entries are subject to threshold levels. T he x-axis represents the base 10 logarithm of th e threshold value......................................................... 66 4.2.1 N onlinear transm ission line unit cell....................................................................... 68 4.2.2 M easured(solid line) and sim ulated(dashed line) o u tp u t waveform for th e 47 diode NLTL..................................................................................................69 4.2.3 M agnitude of voltage o u tp u t spectrum of th e 10-Diode NLTL w ith 1 (0 ), 3 (+ ), 6(1=1), and 9 ( x ) volt input AC voltages................................................. 71 4.2.4 N um ber of iterations and ratio of unused to used Jaco b ian entries for sim ulation of th e 10 diode NLTL w ith lv AC in p u t voltage. The dashed line represents the ratio of th e m agnitude of unused Jacobian entries to the to tal m agnitude of all Jacobian entries, while th e solid line represents the corresponding num ber of iteratio n s required for convergence................................................................................................................74 4.2.5 N um ber of iterations and ratio of unused to used Jaco b ian entries for sim ulation of the 10 diode NLTL w ith 3 volt AC in p u t voltage. The dashed line represents th e ratio of th e m agnitude of unused Jacobian entries to the to tal m agnitude of all Jacobian entries, while the solid line represents th e corresponding num ber of iterations required for convergence............................................................................................................... 76 xi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.6 N um ber o f ite ra tio n s and ratio of unused to used Jacobian entries for sim ulation of th e 10 diode NLTL w ith 6 volts A C in p u t voltage. T he dashed line represents the ra tio of th e m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian en tries, while the solid line rep resen ts th e corresponding num ber of itera tio n s required for convergence................................................................................................................ 78 4.2.7 N um ber of ite ra tio n s and ratio of unused to used Jacobian entries for sim ulation of th e 10 diode NLTL w ith 9 volt AC in p u t voltage. T he dashed line represents th e ratio of th e m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian entries, while the solid line represents th e corresponding num ber of itera tio n s required for convergence................................................................................................................ 80 4.2.8 Sim ulation ru n tim e and ratio of unused to used Jacobian entries for the 47 diode NLTL w ith 1 volt AC input voltage. T h e dashed line represents th e ratio of th e m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian entries, while th e solid line represents th e corresponding ru n tim e required for convergence. . . . xu Reproduced with permission of the copyright owner. Further reproduction prohibited w ithout permission. 83 4.2.9 Sim ulation ru n tim e and ratio of unused to used Jacobian entries for th e 47 diode NLTL w ith 2 volt AC input voltage. T he dashed line represents the ratio of th e m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian entries, while the solid line represents th e corresponding runtim e required for convergence. . . . S5 4.2.10 Sim ulation ru n tim e and ratio of unused to used Jacobian entries for th e 47 diode NLTL w ith 3 volt AC input voltage. T he dashed line represents th e ratio of th e m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian entries, while the solid line represents the corresponding runtim e required for convergence. . . . 86 4.2.11 S im ulation ru n tim e and ratio of unused to used Jacobian entries for th e 47 diode NLTL w ith 4 volt AC input voltage. T he dashed line represents th e ratio of the m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian entries, while the solid line represents the corresponding runtim e required for convergence. . . . 88 5.1.1 Sim ulation ru n tim e (solid line) an d ratio of the m agnitude of ail unused Jacobian entries to th e m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for th e d istrib u ted am plifier circuit. T h e input power level is 0 d B m .........................................................93 xm Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.2 Sim ulation ru n tim e (solid line) and ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for th e d istrib u ted amplifier circuit. T h e in p ut power level is 5 d B m ...........................................................95 5.1.3 Sim ulation ru n tim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to the m agnitude of ail Jacobian entries calculated (dashed line) versus m atrix sparsity for th e d istrib u ted amplifier circuit. T h e input power level is 10 d B m ........................................................ 96 5.1.4 Sim ulation ru n tim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to the m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for th e d istrib u ted amplifier circuit. T h e in p ut power level is 15 d B m ........................................................ 98 5.2.1 Sim ulation ru n tim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to the m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for the 10 diode NLTL. The in put AC voltage level is 1 volt.................................................................100 5.2.2 Sim ulation ru n tim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to the m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for th e 10 diode NLTL. The in put AC voltage level is 2 volts............................................................... 102 xiv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2.3 S im ulation runtim e (solid line) a n d ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of ail Jaco b ian entries calculated (dashed line) versus m a trix sparsity for th e 10 diode NLTL. The in p u t AC voltage level is 3 v o lts....................................................................... 104 5.2.4 Sim ulation runtim e (solid line) a n d ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of all Jaco b ian entries calculated (dashed line) versus m a trix sparsity for th e 10 diode NLTL. The in p u t AC voltage level is 4 volts...................................................................... 106' 5.2.5 Sim ulation runtim e (solid line) a n d ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of all Jaco b ian entries calculated (dashed line) versus m a trix sparsity for th e 10 diode NLTL. The in p u t AC voltage level is 5 volts...................................................................... 108 5.2.6 S im ulation runtim e (solid line) a n d ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of all Jaco b ian entries calculated (dashed line) versus m a trix sparsity for th e 10 diode NLTL. The in p u t AC voltage level is 6 volts...................................................................... 109 5.3.1 Sim ulation runtim e(solid line) and ratio of th e m ag n itu d e of all un used Jacobian entries to th e m agnitude of all Jacobian entries calculated(dashed line) versus m a trix sparsity for th e 20 diode NLTL. T h e inp u t AC voltage level is 1 volt................................................................ I l l xv Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.3.2 Sim ulation runtim e(solid line) an d ratio of th e m ag n itu d e of all un used Jaco b ian entries to th e m ag n itu d e of all Jacobian entries calcu lated (d ash ed line) versus m a trix sp arsity for th e 20 diode NLTL. T he in p u t AC voltage level is 2 v o lts...............................................................112 5.3.3 Sim ulation runtim e(solid line) an d ratio of th e m ag n itu d e of all un used Jaco b ian entries to the m ag n itu d e of all Jacobian entries calcu lated (d ash ed line) versus m a trix sparsity for th e 20 diode NLTL. T he in p u t AC voltage level is 3 v o lts...............................................................114 xvi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. L ist o f T ab les 3.2.1 R elative A pproxim ate R u n tim e for LU D ecom position of a Rank 135 Jaco b ian M a t r i x ..................................................................................................... 29 4.1.1 Average m agnitude of nonlinear jacobian contributions w ithin each frequency p a rtitio n w ith input power level of 0 dB m . Each row corresponds to th e frequency of th e unknown w hile each column corresponds to th e frequency of th e error function. T h e final Jaco bian used in th e sim u latio n is shown, for which th e average of all th e nonlinear Jaco b ian contributions is 1.349e-3. T h e first row and colum n are frequencies in G H z............................................................................47 xvii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.2 Average m agnitude of nonlinear jacobian contributions w ithin each frequency p artitio n with input power level of 20 dB m . Each row corresponds to the frequency of the unknown while each column corresponds to th e frequency of th e error function. T h e final Jaco bian used in th e sim ulation is shown, for which the average of all the nonlinear Jacobian contributions is 1.482e-3. T h e first row and column are frequencies in GHz........................................................................... 48 4.2.1 Blocking approxim ation schemes. A num ber by itself refers to the num ber of diagonal blocks used. A num ber preceded by th e letter “o” indicates an off-diagonal blocking scheme............................................... TO 4.2.2 N um ber of Jacobian calculations required for convergence and simu lation ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an input AC voltage level of 1 volt.............................................75 4.2.3 N um ber of Jacobian calculations required for convergence and simu lation ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an input AC voltage level of 3 volts...........................................75 4.2.4 N um ber of Jacobian calculations required for convergence and simu lation ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an input AC voltage level of 6 volts...........................................79 xviii Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.2.5 N um ber of Jacobian. calculations required for convergence and sim u lation runtim e for different blocking schem es for th e 10 diode NLTL circuit with an input AC voltage level of 9 volts........................................... 81 4.2.6 N um ber of Jacobian calculations required for convergence and sim u lation runtim e for different blocking schem es for the 47 diode NLTL circuit with an input AC voltage level of 1 v olt............................................. 83 4.2.7 N um ber of Jacobian calculations required for convergence and sim u lation runtim e for different blocking schem es for th e 47 diode NLTL circuit with an input AC voltage level of 2 volts........................................... 84 4.2.8 N um ber of Jacobian calculations required for convergence and sim u lation runtim e for different blocking schem es for th e 47 diode NLTL circuit with an input AC voltage level of 3 volts........................................... 87 4.2.9 N um ber of Jacobian calculations required for convergence and sim u lation runtim e for different blocking schem es for th e 47 diode NLTL circuit with an in p u t AC voltage level of 4 volts........................................... 88 5.1.1 N um ber of Jacobian calculations required for convergence and sim u lation runtim e for different blocking schem es for the d istrib u te d am plifier circuit w ith an in p u t power level of 0 dB m ................................... 93 5.1.2 N um ber of jacobian calculations required for convergence and sim ulatio n runtim e for different blocking schemes for the d istrib u ted am plifier circuit w ith an input power level of 5 dB m ................................... 95 xix Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.1.3 N um ber of Jaco bian calculations req u ired for convergence an d sim ulation ru n tim e for different blocking schemes for th e d istrib u ted amplifier circuit w ith a n input pow er level of 10 dB m ................................. 97 5.1.4 N um ber of Jaco bian calculations req u ired for convergence an d sim ulation ru n tim e for different blocking schemes for the d istrib u te d am plifier circuit w ith an input pow er level of 15 dB m .................................98 5.2.1 N um ber of Jaco bian calculations req u ired for convergence an d sim u lation ru n tim e for different blocking schem es for the 10 diode NLTL circuit w ith an input voltage of 1 v o lt............................................................. 100 5.2.2 N um ber of Jaco bian calculations req u ired for convergence an d sim u lation ru n tim e for different blocking schem es for the 10 diode NLTL circuit w ith an in p u t voltage of 2 volts. N /C denotes no convergence. 102 5.2.3 N um ber of Jacobian calculations req u ired for convergence an d sim u lation ru n tim e for different blocking schem es for the 10 diode NLTL circuit w ith an input voltage of 3 volts. N /C denotes no convergence. 103 5.2.4 N um ber of Jacobian calculations req u ired for convergence an d sim u lation ru n tim e for different blocking schem es for the 10 diode NLTL circuit w ith an input voltage of 4 volts. N /C denotes no convergence. 106 5.2.5 N um ber of Jaco bian calculations req u ired for convergence an d sim u lation ru n tim e for different blocking schem es for the 10 diode NLTL circuit w ith an input voltage of 5 volts. N /C denotes no convergence. 107 xx Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2.6 N u m b er of Jacobian calculations required for convergence and simu latio n runtim e for different blocking schem es for th e 10 diode NLTL circuit w ith an input voltage of 6 volts. N /C denotes no convergence. 108 5.3.1 N um ber of Jacobian calculations required for convergence and simu latio n runtim e for different blocking schem es for th e 20 diode NLTL circuit w ith an input voltage of 1 volt..................................................I l l 5.3.2 N u m b er of Jacobian calculations required for convergence and simu latio n runtim e for different blocking schem es for the 20 diode NLTL circuit w ith an input voltage of 2 volts................................................ 112 5.3.3 N u m b er of Jacobian calculations required for convergence and simu latio n runtim e for different blocking schem es for th e 20 diode NLTL circuit w ith an input voltage of 3 volts................................................ 113 6.1.1 R u n tim e required for harm onic balance sim ulation of a 20 diode NLTL using the exact N ew ton m ethod an d th e inexact Newton m eth o d . T he runtim es given are th e best o b tain ed from all available Jaco b ian approxim ations and preconditioners............................................. 119 xxi Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 1 Introduction 1.1 Motivation As com m unication technologies continue to advance, there is a growing need for an increased use of th e frequency spectrum . T he move to higher frequency systems pro vides m any benefits, including sm aller an ten n a sizes, wider bandw idth, and higher resolution for im aging. Traditionally, travelling-wave tubes have been the only de vices available w hich can produce the necessary tens or hundreds of w atts of power at m illim eter wave frequencies. U nfortunately, these devices suffer the disadvan tages of large size an d weight, and also require high voltage power supplies. On the other hand, solid s ta te devices designed for m illim eter wave applications are small and do not need high voltage power supplies, b ut provide lim ited am ounts of power. Therefore, in ord er to produce the power required for high frequency applications, the power from m an y solid state devices m ust be combined. In order to sh o rten the design cycle of power-combining circuits, efficient CAD tools are required. Harm onic balance analysis is now an integral p a rt of designing nonlinear m icrow ave and m illim eter wave circuits. W ith an increase in the num ber of devices an d th e level of input power to be sim ulated, harm onic balance simulators 1 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. are being pushed to th eir lim its. This thesis will present advances in CAD techniques for designing quasi-optical system s. In particular, im provem ents will be developed for num erical techniques for harm onic balance analysis, both in term s of sim ulation ru n tim e and m em ory storage requirem ents. 1.2 Thesis Overview C hapter 2 is a review of th e state-of-the-art of th e harm onic balance technique. The review discusses th e latest published techniques, including th e use of iterativ e linear solvers in K rylov subspace techniques. In C h a p te r 3, the building blocks of a general-purpose harm onic balance sim u lator are developed. T here is an em phasis on th e num erical techniques necessary to optim ize th e speed and accuracy of th e sim ulator. C onsideration is also given to the m em ory storage requirem ents for sim ulation of large nonlinear circuits. Sev eral different schemes for Jacobian m atrix approxim ation and preconditioning axe presented. T hese schemes have been im plem ented in b o th a traditional NewtonR aphson solver as well as a lin ear itera tiv e solver. T he application of Jacobian approxim ations for N ew ton-Raphson-based har m onic balance sim ulation is d em o n strated in C h ap ter 4. A distributed am plifier circuit an d nonlinear transm ission line circuits of different lengths are sim ulated using th e different approxim ations. T hese techniques are analyzed in term s of th eir 2 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. effects upo n sim ulator perform ance. C h a p te r 5 presents th e use of inexact New ton m ethods in th e harm onic balance sim u latio n of the distrib u ted am plifier and nonlinear transm ission line circuits. T he approxim ations used for N ew ton-R aphson-based sim ulations are used as precon ditioners for th e iterative lin ear solver. These preconditioners are com pared w ith respect to th eir im pact upon sim u lato r perform ance. C h a p te r 6 draws conclusions from this stu d y and presents suggestions for future work. 3 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 2 Review of Harmonic Balance Techniques Harm onic balance analysis is widely used for steady-state sim ulation of nonlinear microwave circuits. This technique involves partitioning a circuit into two subcir cuits, one containing only linear devices an d th e other containing only the nonlinear devices, as show n in Figure 2.1. T he stead y -state voltage and current waveforms are represented in the freqeuncy dom ain by a truncated Fourier spectrum , the form of which is assum ed a priori. The problem then becomes one of determ ining the voltages and currents at the interface nodes. In this ch ap ter, form ualtion of the harm onic balance equations and different techniques for solving th em will be discussed. LINEAR NONLINEAR SUBCIRCUIT ELEMENT Figure 2.0.1: C ircuit partitioned into linear and nonlinear subcircuits. 4 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.1 Posing the Harmonic Balance Analysis Problem If th e circuit in Figure 2.1 has N interface nodes and if v is the vector of node voltage waveforms, th e n th e application of K irchoff’s current law (KCL) to each node results in a system of equations [1] / ( u , t) = i ( v ( t )) + ^rq(v(t)) + [ CLt J —oo y( t — r ) u ( r ) d r + is(t) = 0 (2.1.1) w here i(v(t)) is the sum of nonlinear currents en terin g the interface nodes, q(v(t)) is th e sum of charges entering the interface nodes, y is the im pulse response of th e linear subcircuit, an d is is the set of independent external source currents. T h is form ulation is well su ited to finding th e nonlinear response to th e circuit state variables as the nonlinearities are generally defined in the tim e dom ain. However, a convolution operation is necessary for th e calculation of the linear response. If th is equation is transform ed to the frequency dom ain, the linear response is easily calculated by m ultiplication with the m odified nodal adm ittance m atrix . In that case, E quation 2.1.1 becomes F { V ) = I ( V ) + fi Q ( V ) + Y V + I S = 0 (2.1.2) where ft is a m atrix of frequency com ponents representing the tim e differentiation in th e frequency dom ain. T his is the equation th a t harm onic balance techniques try to solve. T he nonlinear response to the interface s ta te variables is calculated in the tim e dom ain and transform ed to the frequency dom ain via th e fast Fourier trans form (F F T ), while th e linear response is conveniently calculated in th e frequency 5 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dom ain. G enerally som e form of N ew ton’s m ethod is em ployed to solve th e system of nonlinear equations F ( V ) . W h ile this equation was derived w ith reference to node voltages as th e unknown variables for ease of n o tatio n , the harm onic balance equations can easily accom odate cu rren ts as the sta te variables as well. In th e for m ulation of Rizzoli, et al., th e s ta te variables and equations are quite general and reference b oth voltage and current unkowns at the su b circu it interface [2]. A discrete set of analysis frequencies are chosen a priori for steady-state solu tion. For single-tone analysis, th e frequencies chosen are th e fundam ental excitation tone and some num ber of its harm onics, including dc. M ulti-tone analysis involves th e noncom m ensurate excitation tones, dc, and interm odulation products up to a given order. For a circuit w ith N interface nodes and I \ frequency com ponenents, th e system E ( X ) of N K equations in N K unknowns m u st be solved. For sm all nonlinear circuits excited w ith low in p u t power sources, th e nonlinear problem to be solved is not terrib ly difficult, since N is small and a sm all K is sufficient for low in p u t power. However as N and K increase, the tra d itio n a l Newton m ethods for solving E quation 2.1.2 rapidly becom e more expensive. H igher drive levels produce m ore coupling betw een signal frequencies, and thus a richer frequency sp ectru m is necessary to describe the signal, thus increasing K . Higher power levels also m ean a less well-conditioned sy stem of equations. N aturally, N will increase w ith an increase in th e num ber of n o n lin ear devices to be sim ulated. T he ability of a given solution technique to solve E q u atio n 2.1.2 for increasing N and K values is of 6 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. p aram o u n t im portance. A dditionally, th e solution technique of choice m ust be able to converge even w hen th e system becomes increasingly ill-conditioned due to high driving power levels. T h e focus of th is C h a p ter is the set of different techniques th a t have been em ployed to solve this system of unknowns. T he m erits and liabilities of each technique will be discussed. 2.2 Relaxation Methods O ne of th e sim plest techniques for solving th e harm onic balance system of equations is to use the fixed-point relaxation m ethod, or ~p-factor” m eth o d , of Hicks and K han [3]. T his m ethod involves th e iterative equation = p ii(u ,) + (1 - p W t (a ,) ( 2 .2 .1 ) w here i\{w) is th e lin ear current and ij. is th e nonlinear cu rre n t a t th e k ik node an d th e j th iteration. No Jacobian m atrix need be calculated or inverted, so th e com putaion cost is m in im al for this m ethod. A n im p o rtan t elem en t of this m ethod is choosing the right value for p, 0 < p < 1, which can g reatly affect convergence. Hicks and Khan an d others [4], [5] have explored the convergence properties of this technique. A m ultiple-reflection m ethod used by K err [6] was show n to be a special case of the p factor m eth od. Cam acho-Penolosa [5] developed an algorithm for determ ining th e o p tim al p factor. W hile the relaxation m eth o d is a sim ple, fast, and efficient algorithm , it is not 7 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. a robust m ethod. Its perform ance generally degrades rapidly w ith an increase in th e n u m b er of nonlinear elem ets being analyzed. Also, unlike m ore com putationally intensive m ethods, it is n ot well suited for analysis of circuits excited w ith two or m ore noncom m ensurate frequencies. 2.3 Newton Methods T h e N ew ton-Raphson technique is widely used in trad itio n al harm onic balance sim ulators. This technique provides quadratic convergence provided th e initial solu tio n guess is sufficiently close to the m inim um of th e error function. T he .NewtonR aphson m ethod requires th e calculation of th e Jacobian m atrix J ( X ) = d F / d X , w here F ( X ) is the KCL erro r as described in E q u atio n 2.1.2 and X is the vector of unknow ns. The well-known N ew ton update equation is given by A ‘'+1 = X ' - a J ~ 1( X i ) F ( X i) (2.3.1) w here j denotes the j th N ew ton-Raphson itera tio n , a is a scalar dam ping factor, J ( X ) is th e Jacobian m atrix , and F ( X ) is th e KCL error function. T he iterative process continues until is sufficiently sm all. Note th a t th e m ost expensive p a rt of this algorithm in term s of cpu tim e an d m em ory usage is th e form ulation, inversion, and solution of th e Jacobian m atrix. Inverting the m a trix is an 0 ( N 3) o p eratio n for an N x N m atrix . There are several ways to reduce th e im pact of this expensive process. Chang, H eron, and Steer introduced th e block Jacobian m ethod [8], in which only blocks along th e diagonal of th e Jacobian were used. This allows 8 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th e individual blocks to be inverted ra th e r than the e n tire Jacobian m atrix. Due to the stru c tu re of th e Jacobian m a trix , the linear contributions are all located inside these blocks. T hey also used th e Sam anskii m ethod, which involves reuse of th e sam e inverted Jacobian for m ore th a n one iteration. T hese m ethods axe considered to be best su ited to mildly nonlinear system s due to th e inherent inaccuracies of th e Jacobian. Rizzoli used sparse m atrix techniques to im prove sim ulation perform ance. He used a Jacobian tem plate such th a t some specific m a trix elem ents were autom atically set to zero so th a t specialized m a trix solvers could be used [10]. While this technique takes advantage of the faster solving capability, its ab ility to handle high power levels seems suspect. Higher drive levels generally result in a m ore dense Jacobian, and im p o rtan t inform ation could be lost w ith this technique. As for handling high power circuit sim ulations, several techniques have been sug gested. O ne of th e most com m only used is the norm -reducing technique of Yeager and D u tto n [11] which adjusts th e N ew ton-update direction such th at the resulting Newton ite ra tio n step is an op tim al one. Rizzoli used th is technique in conjunc tion w ith param etric modeling of nonlinear devices to extend the power handling capabilities of his sim ulator [12] [13] [14]. The p aram etric modeling technique is useful w ith th e sta te variable approach described in [14]. Instead of defining th e system of unknowns to be strictly voltages or currents, th e unknowns are chosen to be p aram etric state-variables th a t m ap to voltages or currents depending on th e ir 9 Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission. value. For exam ple, a param etric m odel of a pn ju n ctio n diode can be defined as Vx 4- J l n [ l + a ( x — K )] if Vx < x (2.3.2) v(t ) = if x < Vr 7s exp (aVx) [1 + a (x — Vi)] — I s if Vi < x I s [exp (a x ) — 1] if x < Vi ^^ i(t) = where Vi is a th resh o ld voltage determ in ed em pirically. Due to th e diode’s exponen tial nonlinearity, this type of m odeling is helpful in preventing wild guesses in the Newton solver w hich could occur a t th e early stages of harm onic balance sim ulation. A nother problem to be addressed is th e perform ance of N ew ton m ethods in har monic balance w hen a circuit w ith a large num b er of nonlinear elem ents is sim ulated. The increase in cpu tim e with increasing problem size is significant due to th e 0 ( N 3) process of d irect inversion of the Jacobian. W hile the inexact N ew ton m ethods to be discussed la te r seem best suited to this problem , Rizzoli has also proposed a socalled “hierarchical harm onic-balance” technique [15]. T he unknow n variables are subdivided in to m a ster and slave sets, and a two level New ton itera tio n technique is employed to solve th e circuit. T h e applicability of this alg o rith m seems to be dependent on th e circuit topology. 2.4 Inexact Newton Methods The m ain draw back to using N ew ton’s m eth o d for solving E qu atio n 2.1.2 is the cost of inverting th e Jacobian. A ctually th e Jacobian is usually n ot inverted, but LU 10 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. decom posed to solve Jy = b where bis th e error vector for th e given iterate. (2.4.1) Once th e LU factorization is done, th e sam e decom posed m a trix can be reused as m any tim es as Sam anskii m ethod. desired in th e N evertheless, although the inverted Jaco b ian is not form ed explicitly, th e decom position of th e Jacobian is still an 0 ( N 3) process. 2.4.1 Iterative Linear Solvers To avoid th is com putational bottleneck, several authors [16], [17], [18] have pro posed using iterative linear solvers for solving Equation 2.4.1. M elville et al. first proposed th is in [16], describing th e use of th e QM R algorithm [19]. T he advantage of using lin ear solvers for E q u a tio n 2.4.1 is th a t only v ecto r-m atrix m ultiplication is necessary, such th a t th e solving tim e increases only slightly g reater than linearly w ith an increasing num ber of unknow ns. Melville forms th e following equation for Jacobian calculation: J = GPTP-1 + CPTDP-1 (2.4.2) where G is a diagonal m a trix of tim e dom ain derivatives of nonreactive circuit el em ents, C is a diagonal m a trix of tim e dom ain derivatives of reactive elem ents, T is a linear o p erato r representing th e tim e-to-frequency Fourier transform , D repre sents th e tim e differentiation o p erato r, and P is a d ata p e rm u ta tio n operator. For a circuit w ith n nodes and N h arm o n ic balance analysis frequencies, we can calculate 11 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. I th e order of operations necessary to solve E quation 2.4.1. M ultiplications of the diagonal m atrices G an d C w ith a vector cam be accom plished in 0 ( n N ) operations. A pplications of T m ay be im plem ented th rough th e F F T , costing 0 ( n N l o g N ) op erations. T he tim e differentiation operator D as well as the P and P ~ l m atrices can be done in 0 ( n N ) operations [16]. T hus, m ultiplications between th e Jacobian m a trix an d a vector require 0 ( n N l o g N ) operations. T h e draw back of linear m ethods however, is th a t they do not converge reliably. To assist in convergence, some sort of preconditioner is employed. M elville used a preconditioner to m odify Equation 2.4.1 to J~lJx = J~lb (2.4.3) which has the sam e solution. Ideally, th e preconditioner J should be a good ap proxim ation to J an d also easy to invert. Instead of solving E quation 2.4.3, he solved J z = b. (2.4.4) Melville reported th e use of th e Jacobian m a trix linearized around th e dc operating point of th e circuit. T his can be easily found from the circuit ad m ittan ce m atrix, which m ust be form ed a t th e outset of th e harm onic balance sim ulation for cal culation of th e linear response at each N ew ton iteration. The Jacobian m atrix is organized such th a t Jacobian contributions from th e linear circuit are located in N blocks along th e diagonal when N analysis frequencies are used. Thus, this precon d itioner m ay be inverted block-wise, which is significantly less expensive th a n a full 12 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m atrix inversion, especially for a circuit w ith a large num ber of unknowns. Melville reported th at this technique was best su ited for large circuits operating in a rnildly nonlinear regime. In [16], Feldm ann an d Melville extended th e ir technique to handle m ore general nonlinear circuits. In fact, the m ain addition to their earlier K rylov subspace based algorithm was to im prove on their preconditioning m atrix. Instead of using the linear Jacobian contributions only, the nonlinear contributions which appear in the diagonal blocks of th e Jacobian are used as well. It is rep o rted here th a t for stronger nonlinearities, som e inform ation in off-diagonal blocks m ust also be included in the preconditioner, resu ltin g in the loss of the block-diagonal stru c tu re . W hile the blockdiagonal stru c tu re is very helpful in factorization, th e m a trix is still m uch more sparse th an th e original Jacobian m atrix , resulting in perform ance im provem ents. Rizzoli also explored the use of inexact Newton m ethods for large microwave circuits [18]. He co n trasts the exact an d inexact Newton m ethods by w riting the exact Newton u p d a te J^i+i = X-i + n t- (2.4.5) where ra,- is the ex act Newton update. For a large num ber of circuit unknowns N , Rizzoli states th a t it m ay be more efficient to use an inexact N ew ton u p d ate s,-, such th a t |]£ (X t) + /(X .-K-ll < M E ( X {)\\ 13 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (2.4.6) and -XV+i = X i + Si (2.4. i ) where /,- (0 < f i < 1) is a forcing term . W hen / t- = 0, the u p d ate reduces to the exact u p d a te given in E q u atio n 2.4.4. O therw ise, /,- serves as a m easure of how m uch th e inexact Newton u p d a te differs from th e exact update. In Rizzoli’s im plem entation, the choice of / t- is updated at each itera tio n . He reports using an initial /o = 0.5 and calculating subsequent values v ia th e equation , f‘= ||£(X,-) - E (JfH ] - /( X .- O s i-d l p t M ' { 8) Once th e forcing term f i is chosen, the Newton e q u atio n is approxim ately solved until E q u a to n 1.4.5 is satisfied. T h e GMRES ite ra tiv e solver was chosen to perform this ap p ro x im ate solution. As w ith th e work presen ted by Melville and Feldm an, the convergence properties of th e iterative solver w ere improved by selecting an app ro p riate preconditioner. Rizzoli replaces th e ex act N ew ton equation J(Xi)m = -E (X i) (2.4.9) with J ( X i ) P r l Pini = - E ( X i ) (2.4.10) where Pi is a nonsingular m atrix of rank N . Sim ilar to M elville and Feldm an, Rizzoli stresses th e im portance of P being a close approxim ation to J which is easier to invert. O nce a preconditioner has been defined, th e in itial guess for 14 Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission. is defined by s [0) = _ p - i - E ( X i ) (2.4.11) an d a set of real vectors of length N is defined: A',1" = [/jv - ./(A '.)P r1] E ( X , i (2.4.12) A f> = J ( X {) P - 1I < r l ( q > 1) where I y is the id e n tity m atrix . The vector space spanned by th e vectors K j , 1 < q < Q is called a K rylov subspace of dim ension Q. T he Q th -o rd er approxim ation of Si is then given by s\Q) = s\0) + P,rl J 2 <*qK \q) (2.4.13) m aking th e corresponding residual rJQ) = E ( X i ) + J ( X i ) s \ Q) = K \ x) + £ a qK f +1). (2.4.14) 7=1 A least squares m eth o d is used to find the a q coefficients such th a t | | r ^ | | is m ini m ized. If Equation 2.4.5 is satisfied, For sufficiently large Q is tak en as th e approxim ate Nevvton update. this is guarranteed to be the case since lim<2_ 00 = n,- [20]. Rizzoli claims to use a sm aller dim ension Krylov subspace for th e first few approxim ate updates, gradually increasing th e dim ension as th e final solution is approached [21], b u t th a t generally Q < 50 is sufficient for m ost harm onic balance problem s [22]. Telichevesky, et al. also have reported using iterative m ethods for solving E qua tion 2.4.1 [23]. T hey po in t o u t th a t if GM RES is used for solving a linear system , the 15 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m atrix representing the system need not be explicitly formed as only m atrix-vector products are necessary. Again, th e need for a preconditioner to help in convergence for th is m eth o d is stressed. T h e block Jacobian as a preconditioner is reported as being very effective for m ildly nonlinear problem s. However, the authors claim th a t for m ore strongly nonlinear system s, the loss of inform ation when off-diagonal blocks are discarded often is too g reat to be overcome. In this situation, th e authors use higher order finite-difference techniques in the tim e dom ain to precondition th e Jacobian inform ation. 2.4.2 Incomplete LU Decomposition The use of Incom plete LU (ILU) decom position in circuit sim ulators has been re ported by Eickhoff and Engl [24]. W hile the application in this paper is not h ar monic balance sim ulation, the problem of solving a circuit w ith m any nodes via approxim ate Newton techniques is discussed. T he Jacobian m atrix is split such th a t J = Q + R where Q = LU (2.4.15) and rem ainder R ^ 0. T he ILU factors L and U usually are defined as the stan d a rd LU decom position w ithout any of th e fill-in elem ents generated during th e decom po sition process. Eickhoff and Engl have defined a m ethod in which some of th e fill-ins are used. T he “level” of a given fill-in elem ent is defined by th e m anner in which it was generated. The m ain diagonal entries axe assigned level 0, while off-diagonal 16 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. entries axe assigned level 1. A fill-in elem en t generated from two off-diagonal entries is assigned level 2, an d so on. T he level assigned to a fill-in elem ent is equal to the sum of th e levels o f its parents. T he m ax im u m level L of entries allowed th e n defines an L-level ILU schem e. T he ILU m eth o d th en is a technique in which some Jacobian m atrix elem ents are neglected in th e solving process. In c o n tra st to th e preconditioning schem es already m entioned, this technique can be th o u g h t of as preconditioning applied during the m atrix decom position stage. D eterm ining th e app ro p riate ILU level corresponds to finding th e ap p ro p riate preconditioner. 2.5 Summary T he harm onic balance technique is a m a tu re algorithm for solving nonlinear high frequency circuits. T h e challenges of th is technique are those problems involving a large num ber of nonlinear devices, stro n g ly nonlinear devices, and circuits driven at high power levels. Com m on to all these challenges is th e problem of solving a large nonlinear system of equations via som e form of N ew ton’s m ethod. As this iterativ e process progresses, a large linear system of equations m ust be solved m u ltip le tim es along th e way. W h eth er these linear problem s are solved directly or indirectly, the m ost successful m eth o d s employ som e sort of preconditioner to ap p ro x im ate the Jacobian m atrix . A preconditioner m u st be found th a t is sufficiently close to th e actu al Jacobian, b u t a t th e sam e tim e is easier to invert. Finding th e ap p ro p riate 17 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. preconditioner is of th e u tm o st im portance to all th e techniques discussed. 18 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 3 Development of a Harmonic Balance Simulator T he harm onic balance techniques developed in this study have been im plem ented in a sim ulator called TRANSIM . This sim ulator has the ability to perform several different types of analysis, including b o th a Newton-based and an iterative linear solver-based harm onic balance analysis. T h e focus of this ch ap ter is the develop m ent of a N ew ton-based harmonic balance sim ulator, b ut m any of the num erical techniques have been im plem ented in an itera tiv e linear solver as well. 3.1 Newton-based Harmonic Balance Analysis As discussed in C h ap ter 2, the nonlinear circuit is first p artitioned into its linear and nonlinear subcircuits. This results in a nonlinear subcircuit w ith N ports, where N is th e n u m b er of nonlinear term inals not connected to ground, or the com m on node of th e circu it. Each of the N ports connects to a term inal of a device of th e nonlinear subcircuit. T he DC solution a t th e interface nodes is usually found first and used as th e initial guess for th e interface state variables. Since no F F T calculations are required and th e n u m b er of s ta te variables is greatly reduced, th e DC solution is generally easy to find th ro u g h N ew ton’s m ethod. 19 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. T h e linear response to each interface voltage is found by m atrix-vector m ultipli cation Il = Y V (3.1.1) where Ii, is th e vector of linear currents, Y is th e m odified nodal ad m ittan ce m atrix, and V is th e vector of interface voltages. In order to find the nonlinear response to th e interface voltages, an inverse F F T is first perform ed on each frequency-dom ain voltage to obtain the tim e-dom ain vector v ( t) , w hich contains th e tim e-dom ain voltage values at tim e points evenly spaced th ro u g h one period of th e fundam ental frequency. T h e tim e-dom ain nonlinear currents in i(t) are then found through eval uation of th e nonlinear device models and converted back to th e frequency domain via th e F F T . The Kirchoff’s current law (KCL) e rro r function isth en defined in the frequency dom ain as F ( V ) = I L( V ) + I NL(V ). (3.1.2) This nonlinear system of equations must then be solved such th a t l|F (V )|| < e (3.1.3) for som e sufficiently sm all user-defined e. 3.1.1 Forming the Jacobian Matrix T he Jaco b ian m atrix consists of derivatives of th e error function in Equation 3.1.2 w ith respect to the unknow n interface quantities V . T he derivatives of I I w ith 20 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. respect to V axe easily found through E q u a tio n 3.1.1. T h e derivative of a linear current w ith respect to th e interface voltages will be the linear a d m itta n c e seen at th e corresponding p o rt of the linear circuit. T his inform ation is readily available in th e modified nodal ad m ittan ce m atrix . T h e nonlinear Jaco b ian contributions < 9 In l/# V axe not as straightforw ard to calculate. As reported in [7], th e frequency dom ain derivatives can be found directly from th e tim e-dom ain derivatives and the F F T . The real a n d im aginary com ponents of the com plex variables are stored separately as real num bers, requiring the co m p u tatio n of four derivatives to describe th e relationship betw een current and voltage a t any given frequency. For example, th e derivative d ’L /d 'V w here I and V are com plex quantities is represented by the m atrix 332(1) 332(V) 33(1) 332(V) 332(1) 33(V) 33(1) 33(V ) (3.1.4) w here 3? and Sr represent th e real and im ag in ary parts, respectively. N ote th a t DC q uantities are stric tly real and therefore do n ot require as m uch storage space in the Jacobian. For exam ple, for DC current q u an tities, the derivatives of th e im aginary current are not necessary, but the derivatives of the real cu rren t w ith respect to b oth real and im aginary AC voltage contributions m ust be found. O f course, only one num ber is required to represent the derivative of a DC q u a n tity w ith respect to an o th er DC quantity, since both are real num bers. For a circuit w ith N interface nodes, th e re are N unknown DC q u an tities. If there are M — 1 AC analysis frequencies, there are an additional 2 x (M — 1) unknown AC 21 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. quan tities since the AC q u an tities are expressed in term s of real and im ag in ary parts. T h u s, for a circuit w ith N interface nodes a n d M analysis frequencies including DC, th ere will be 2N ( M — 1) + IV or N ( 2 M — 1) unknow n quantities. 3.2 Improvements in Numerical Techniques for Harmonic Balance O nce th e Jacobian has b een calculated, th e N ew ton u p d ate can be o b tain ed through E q u atio n 2.3.1. However, since this equation requires th e inversion of th e Jacobian, it can be quite expensive to evaluate directly. Also, th e extensive use of th e F F T in th e calculation of b o th th e Jacobian and th e n on lin ear system response can introduce num erical error to th e sim ulation. B oth of these situations point o u t th e need for fast and accurate nu m erical techniques for h arm o n ic balance sim ulation. Several num erical techniques have been im plem ented to im prove b o th th e ac curacy and runtim e o f th e sim ulator. For im provem ent of sim ulation speed and m em ory storage requirem ents, a sparse m a trix package has been im plem ented in TR A N SIM . T he sim u lato r also has the cap ab ility to reuse th e decom posed Jaco b ia n m atrix m ultiple tim es to further im prove sim ulation runtim e. A dditionally, th e state-variable m ethods discussed in C h ap ter 2 are used in T R A N S IM to im prove th e convergence abilities of th e sim ulator. To avoid th e num erical noise introduced by rep eated use of th e F F T , a technique called oversam pling is used. Finally, there axe several different m a trix preconditioning techniques available in T R A N SIM . A 22 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. detailed description of these techniques follows. 3.2.1 Sparse Matrix Techniques Since th e Jacobian m atrix can be excessively large, a sparse m atrix package was im plem ented in TRAN SIM . T his package stores only th e non-zero entries with th eir row an d colum n indices. Obviously, one of the advantages to this approach is a reduction in th e am ount of m em ory necessary to store th e Jacobian, but there are also advantages in term s of calculation speed. No operations are performed on th e zero-valued m atrix com ponents as they do not exist in m em ory. In general, th e overhead associated w ith allocating and m anaging th e sparse m atrix structures is trivial com pared to the im provem ents in runtim e an d m em ory storage. For exam ple, a d istrib u ted am plifier circuit w ith nine unknow n variables and eight analysis frequencies has a Jacobian m atrix w ith rank 135 (N(2M -1)). T h e num ber of nonzero elem ents for th e Jacobian is 2822, requiring th e storage of 2822 double precision quantities along w ith 2822 row indices and 2822 column indicies. Each double precision num ber requires eight bytes, and each integer requires four, resulting in a to tal of 45152 bytes to hold all th e Jacobian inform ation. If th e additional non-zero entries were stored in memory, we would need space for 135 x 135 = 18225 double precision num bers, or 145800 bytes. T his m eans th a t a dense m atrix would need more th a n th ree times as m uch m em ory as th e sparse m atrix requires. 23 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. T he sparse m atrix package in T R A N SIM also contains optim ized routines for LU decomposition and solving. T h e to tal sim ulation runtim e for th e distributed am plifier circuit using sparse m atrices is about 26 seconds on a Sun M icrosystems U ltra l workstation. W hen sta n d a rd m atrix solving techniques are used, the runtim e is about one m inute. This ru n tim e im provem ent is due exclusively to the use of sparse matrices. 3.2.2 Oversampling As reported in [9], aliasing error can be introduced into harm onic balance sim ulation due to the tru n cated spectrum used in finding the steady-state solution. Frequency dom ain voltages are transform ed via th e F F T to the tim e dom ain an d applied to th e nonlinear circuit elem ents to determ ine th e nonlinear cu rren t response. If an insufficient num ber of harm onics is used in th e F F T , th e voltages applied to the nonlinear elem ents will be incorrect, causing th e current response to be incorrect. T his alters the error surface so th a t th e sim ulation may be inaccurate. Additionally, even if the voltage is accurately expressed w ith one set of analysis frequencies, the nonlinear current response m ay b e inaccurately represented due to a larger required bandw idth. This is particularly im p o rtan t when highly nonlinear circuits are sim ulated at high drive levels. O ne way to avoid th is problem is to increase th e n u m b er of analysis frequencies, but this will increase th e size of the Jacobian m atrix . Since we need 24 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to invert o r a t least ap p ro x im ately invert th e Jacobian, which is an 0 ( N 3) process, increasing th e num ber of analysis frequencies can be q u ite expensive. However, th e oversam pling technique im plem ented in TR A N SIM increases only th e num ber of frequencies used d u ring th e F F T . T he quantities to be transform ed v ia F F T are p ad d ed w ith zeros. For exam ple, if there are eight analysis frequencies, one m ight oversam ple by a facto r of two. T h e q u an tity being transform ed would have frequencies nine through sixteen set to zero. T he tim e-dom ain q u an tity th e n would have tw ice as m any tim ep oints, providing a m ore accu rate waveform for tim e-dom ain sim ulation. W hen th e tim e-dom ain response is transform ed back to th e sixteen analysis frequencies, only th e first eight frequencies are considered for calculating th e harm onic balance e rro r and Jacobian. 3.2.3 The State Variable Approach T hus far th e harm onic balance technique has been discussed in term s of th e volt ages being th e unknown q u an tities and the resulting linear and nonlinear currents form ing th e error function. As rep o rted in C hapter 2, th e s ta te variables need not be re stricted to node voltages only. For strongly nonlinear devices such as diodes, a p a ram etric m odel m ay b e used th a t defines bo th voltage and current as a function of a s ta te variable X . A p ara m e tric m odel would be defined as v ( t ) = u [x(£), d x / d t , • *• i d Px / d t 71, ££>(£)] (3.2.1) z(i) = v [z(£), d x / d t , • • •, cPx / dt 71, X£>(i)] (3.2.2) 25 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. where x £>(f) is a v ecto r of tim e-delayed s ta te variables. An exam ple of a param etric m odel was given in E quation 2.3.2 for a pn ju n ctio n diode. Not e th a t for high values of th e s ta te variable, th e device cu rren t increases linearly while th e voltage increases logarithm ically. T h e current-voltage relationship rem ains th e same, but th e relationship betw een th e unknown s ta te variable and the device current is no longer exponential. TR A N SIM uses s ta te variables to define all its nonlinear elem ent models. For som e devices such as nonlinear resistors th e s ta te variable is m erely the device voltage, while for others such as diodes, a p ara m e tric model is used. The m ain benefit of using th e param etric model is th a t it can elim inate wild guesses in th e early stages of th e sim ulation. 3.2.4 The Chord Method Because calculation an d inversion of the Jaco b ian is com putationally expensive, it is desirable to avoid th is cost whenever possible. Frequently, th e Jacobian need not be recalculated a t th e next iteration of N ew ton’s m eth o d . The reuse of th e calculated an d inverted Ja co b ian m atrix is called th e chord m ethod. This m eth o d can greatly expedite th e sim u latio n of m ildly nonlinear circuits. The tradeoff in accuracy of th e Jacobian can b ecom e significant for m ore strongly nonlinear circuits, requiring m ore frequent Jaco b ian recalculation. 26 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3.2.5 Jacobian Approximation and Preconditioning Tech niques T h e m ost com putationally expensive p art of harm onic balance analysis is solving a large linear system of equations a t each iteration of N ew ton’s method. Several au th o rs previously m entioned have used preconditioners to approxim ate the Jaco b ian for itera tiv e linear techniques. T he sam e sort of approxim ations can also be m ad e for N ew ton-based sim ulation. T h e selection of an appropriate preconditioner is of great im p o rtan ce for reducing th e am ount of com puting resources needed for sim ulation. In TR A N SIM , selecting the preconditioner or approxim ate Jacobian am ounts to selecting which Jacobian elem ents to elim inate before m a trix decomposition. Several different schemes have been im plem ented, but all of these schemes affect only th e nonlinear contributions to the Jacobian. T he linear contributions are necessary for sim ulation convergence and are never elim inated in any of the preconditioning schem es. For th e purposes of this discussion, th e term “m atrix sparsity” will be used to quan tify th e percentage of m atrix elem ents which will be ignored by an approxim a tio n scheme. If a given approxim ation uses 25 percent of th e possible m atrix entries, th e m atrix sp arsity is said to be 75 percent. If only 10 percent of all m atrix entries are used, th e sp arsity is 90 percent, and so on. 27 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. U sin g o n ly lin ear con trib u tion s to th e Jacobian One preconditioning technique is to use only th e linear con trib u tio n s to th e Jaco bian. Since th e linear contributions to th e Jacobian are co n sta n t w ith respect to the values of th e unknowns at th e interface nodes, the Jaco b ian m u st be calculated and decom posed only once at th e o u tset of sim ulation. A dditionally, th e nonlinear Jacobian entries need not be calculated a t all. As explained earlier, th e nonlinear Jacobian entries are calculated via th e F F T , so removing th e n eed for these calcula tions can significantly im prove sim ulation runtim e. T he decom posed m atrix can be reused via th e chord m ethod at every itera tio n of N ew ton’s m eth o d . Also, because all the Jacobian inform ation is stored in blocks along th e diagonal, th e blocks can be decom posed individually, saving tim e for this 0 ( N 3) process. For exam ple, th e d istributed am plifier circuit w ith nine unknow ns and eight analysis frequencies produces a Jacobian of rank 135. T he lin ear Jacobian contribu tions are located in eight blocks along th e diagonal. T he first block contains the DC Jacobian contributions and has rank 9. T he other seven blocks each are rank 18. Making th e approxim ation th a t th e LU decom position tim e for a m a trix is equal to K N 3, where K is a constant, we can estim a te the runtim e im provem ent m ade by inverting th e m a trix block-wise. Table 3.2.1 shows the ru n tim e for each of th e seven blocks along w ith th e runtim e for decom posing the entire m a trix . T h e block-wise m ethod of decom posing th e m atrix is roughly 98 percent fa ste r th a n decomposing the entire m atrix . 28 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Block No. runtim e 1 2 3 4 5 6 7 8 total(block-w ise) K{ 93) J\(183) K( 1 8 3) I<{183) I<( 183) K{ 183) /v(183) Af(183) K(41553) to ta l(e n tire m atrix) K(2460375) Table 3.2.1: Relative A p proxim ate R untim e for LU D ecom position of a R ank 135 Jaco b ian M atrix T h is technique is easily im p lem en ted and can work for weakly nonlinear cir cuits, b u t it is not robust. C ircuits w ith strong nonlinearities require a t least some no nlinear inform ation in th e Jaco b ian for convergence of N ew ton’s m eth o d to be achieved. R e la x a tio n M eth od A n o th er technique for ap p ro x im atin g th e Jacobian m a trix is to use only those non linear Jaco b ian contributions w hich occur on the m atrix diagonal. T his is equivalent to th e relax atio n m ethod discussed in C hapter 2. No interactions betw een circuit q u a n titie s at different frequencies are considered w ith th is m ethod, nor are inter actions betw een circuit q u an tities a t different nodes. A dding only diagonal entries to th e linear Jacobian does n o t ad d any com plexity to th e process of decomposing th e m a trix , since all th e diagonal elem ents will already co n tain contributions from 29 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. the linear circuit. However, now th a t only som e nonlinear contributions are being used, th e Jaco b ian may need to be recalculated and decomposed. Also, because the harm onic balance m ethod calculates th e nonlinear circuit response and deriva tives in th e tim e dom ain for one com plete period of th e fundam ental frequency, all freqeuncy dom ain derivatives are calculated a t once. Thus, the nonlinear Jacobian contributions m ust be calculated for en tire m atrix. T he Jacobian inform ation is still contained in blocks along th e diagonal which allows for faster LU decom position. W hile th is m ethod is m ore robust th a n not using any nonlinear inform ation a t all in the Jacobian, it still is not sufficient for strongly nonlinear circuits, as no frequency coupling inform ation is considered. Also, th e relationship between unknown circuit variables a n d th e error quantities are considered only for quantities at th e same node. For exam ple, the relationship betw een drain current and gate-source voltage for a M E S F E T is not considered in this technique, since this relationship does not pertain to quan tities at the sam e node. B lock J a co b ia n M ethod s The block Jacobian m ethod expands th e range of nonlinear Jacobian contributions considered. T h e block size can be varied such th a t m ore frequency coupling can be considered, allowing for a m ore accu rate representation of the Jacobian, but m aintaining th e block stru ctu re w hich reduces LU decom position runtim e. In order to understand the block Jacobian technique fully, it is helpful to have some insight into the stru ctu re of th e Jacobian m atrix. The contributions of the 30 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. linear elem ents occur only betw een variables a t the sam e frequency. Only th e nonlin ear elem ents can produce currents a t frequencies different th an th at of th e exciting voltages. If th e Jacobian is stru ctu red such th a t th e relationships between circuit p aram eters are grouped by frequency, all linear contributions to the Jacobian will be contained in blocks along th e m atrix diagonal. T h e only entries in the off-diagonal blocks will be those contributed by circuit nonlineaxities. For example, a circuit w ith M interface nodes and K analysis frequencies would have a m atrix stru ctu re of th e following form 3 F 0Or dX gO r 3FW dXoor 3 F Af0r 3*00r 3 F o ir 3*00r 3 F o i, 3*00r 3FvfA', 3*00r dFnor 3 * A f0 r dFlOr3 * A f0 r 3 F \rrir 3 * Af0 r 3 F o ir 3 * A f0 r 3 F 0 i, 3 * A f0 r 3F \fA -i 3 * AfOr dF nnr 3*01r dFwr3*01r 3 F \m r 3*01r 3 F o ir 3*01r 3 F 0 i, 3 * 0 lr dFMKi 3*01r 3F)Qr. 3 * o i. 3 F l0 r 3 * o i. 3 F Af0r 3*01. 3 F 0,r 3*01. 3 F o i, 3 * o i. 3Fnnr 3 * A /A , 3 F 0i r 3 F o i; 3* a/ a'. 3 F l0 r 3 F ;Vf0r 3* a/-a'. 3* a/ a'.‘ 3* a/ a'. . . d F M Ki 3 * o i, 3Fu a 3* ma". where dF-nfq/dX^FQ is th e derivative of F at node n and frequency / w ith respect to X at node N and frequency F , where q and Q are eith er r or i, denoting th e real and im aginary parts of F and X . As explained above, th e linear adm ittance m a trix contributes only to those m atrix elem ents where / = F . These elements occur only on blocks along the diagonal of the m atrix. The m a trix can then be rew ritten in 31 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. term s of th e different frequency blocks as Baja B 0,i Bo,2 • • • B q ,k B \fl B i,i B\,2 • • • B i,k B 2,0 B2,l B2,2 • • • B 2,k B m ,o B m ,i Bm ,2 • • • B m ,k w here each block B{,j contains all derivatives of th e error function a t th e ith frequency w ith respect to s ta te variables at th e j tfl frequency. Block Jacobian techniques fall into two different categories, diagonal blocking an d off-diagonal blocking. Diagonal blocking occurs when only blocks th a t occur on th e m atrix diagonal are used, as shown in Figures 3.2.1 th ro u g h 3.2.4. N ote th a t th e used elem ents from each row and colum n of th e m atrix are represented by only one block. For th e strictly diagonal block m ethod, th e num ber of blocks used can range from two up to the n u m b e r of analysis frequencies, increasing by powers of two. E ach block contains inform ation pertaining to th e sam e n u m b er of frequency com ponents. For exam ple, if th ere are eight analysis frequencies, blocking schemes using two, four, and eight blocks are available. T h e eight block schem e would result in blocks w ith one freqeuncy each, while th e blocks in th e four block schem e would co ntain inform ation a b o u t two frequencies. A dditionally, th e D C com ponents of th e Jaco b ian m ay be included, b u t this prevents th e block-wise decom position of th e m a trix , as the DC com ponents are located along th e top and left sides of th e m atrix 32 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.2.1: Diagonal blocking scheme using 16 blocks along the m a trix diagonal. T h e shaded areas in d icate th e location of m a trix entries which will be used. Figure 3.2.2: D iagonal blocking scheme using 8 blocks along the m a trix diagonal. T h e shaded areas in d icate th e location of m a trix entries which will be used. Each sm all shaded block rep resents all derivatives of th e error function a t a given fre quency with respect to all th e s ta te variables a t a second given frequency. 33 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.2.3: Diagonal blocking scheme using 4 blocks along th e m atrix diagonal. T he shaded areas in d icate th e location of m atrix entries which will be used. Each sm all shaded block represents all derivatives of the error function a t a given fre quency with respect to all th e state variables at a second given frequency. Figure 3.2.4: Diagonal blocking scheme using 2 blocks along th e m atrix diagonal. T he shaded areas in d icate th e location of m atrix entries which will be used. Each sm all shaded block represents all derivatives of the error function a t a given fre quency w ith respect to all th e state variables a t a second given frequency. 34 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.2.5: Level one off-diagonal blocking schem e. T he shaded areas indicate the location of m atrix entries w hich will be used. E ach block represents all derivatives of th e error function at a given frequency w ith respect to all th e s ta te variables at a second given frequency. as shown in Equations 3.2.3 a n d 3.2.4. T h e oth er blocking schem es are based on th e proxim ity of a given block to the diagonal blocks when th e highest level of diagonal blocking is used. In this case, th ere are K diagonal blocks for a circuit w ith K analysis frequencies. If the block row and column indicies are i and j respectively, an level q off-diagonal blocking schem e would include all blocks for which \i —j \ < q. Figures 3.2.5 through 3.2.7 show th e structure of th e Jaco b ian m atrix for levels one through th re e off-diagonal blocking schemes. For circuits w ith m o d e ra te nonlinearities, th e diagonal block Jaco b ian m ethod generally outperform s th e m ethods previously discussed. W hen th e n u m b er of blocks 35 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Figure 3.2.6: Level two off-diagonal blocking scheme. T he shaded areas indicate th e location of m a trix entries which will be used. Each sm all shaded block represents all derivatives of th e error function a t a given frequency w ith respect to all the sta te variables at a second given frequency. Figure 3.2.7: Level three off-diagonal blocking scheme. T he shaded areas indicate the location of m a trix entries w hich will be used. Each sm all shaded block represents all derivatives of th e error function a t a given frequency w ith respect to all th e s ta te variables a t a second given frequency. 36 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. is equal to th e n u m b er of analysis frequencies, th e e x tra nonlinear contributions m ake little difference in th e overall runtim e an d m em o ry storage requirem ents com p ared to th e linear m a trix . T h e stronger th e nonlinearities, the larger th e blocks m u st be for convergence to be obtained, and th e increase in necessary ru n tim e is proportional to th e increase in block size. U sing th e off-diagonal schemes in these situ atio n s can help provide convergence w ith an im provem ent in runtim e over using th e full Jacobian m a trix , though th e loss of th e block-diagonal stru ctu re lim its th e im provem ent available. T h e block techniques have a sim ilar effect to th e norm -reducing techniques de scribed in [11]. E rro r function contributions a t a given frequency are m ost strongly d ep en d en t upon s ta te variables at th e sam e an d n earb y frequencies. T h e block J a cobian m ethods en su re th a t these relationships axe em phasized, heavily biasing th e directio n of th e N ew ton step. T h e result is th a t th e system solution is often found m ore quickly th a n w hen th e full Jacobian m a trix is used. T h resh o ld in g T ech n iq u es T h e thresholding tech n iq u e involves selecting a m in im u m lim it a m atrix e n try m ust satisfy to be included in th e Jacobian preconditioner. T he m atrix entries which do n o t m eet this criterio n are discarded. O bviously, some care m ust be tak e n in selecting a thresh o ld value. TR A N SIM allows th is value to be specified by an ab so lu te value or by som e percentage of th e average value of Jacobian entries. It is im p o rta n t to n o te th a t th e largest Jacobian values generally occur w ith in th e 37 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. diagonal blocks. For this reason, thresholding techniques generally do not provide a noticeable im provem ent over choosing an effective blocking scheme. 38 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 4 Newton-based Harmonic Balance with Approximate Jacobian Matrices In this C hapter we will exam ine the sim ulation of two types of nonlinear circuits in detail and analyze th e perform ance of N ew ton-based harm onic balance sim ulation using Jacobian approxim ations discussed in C h ap ter 3. The different Jacobian ap proxim ations will be com pared and a general rule of thum b for determ ining the best approxim ation for each type of circuit will be discussed. 4.1 Distributed Amplifier T he first circuit to b e sim ulated is the d istrib u ted am plifier circuit shown below in Figure 4.1.1. T his circu it uses three M E SFET devices distributed along transm ission lines. Sim ulation perform ance was m easured w ith an input frequency of 4 GHz and in p u t power levels of 0, 5, 10, 15, and ‘20 dB m . T h e m agnitude of th e o u tp u t voltage sp ectru m is shown in Figure 4.1.2. T h e unknown s ta te variables for this circuit are th e gate an d d rain voltages for each of th e three tran sisto rs and the current th ro u g h th e three voltages sources, for a to ta l of nine unknow n variables. As shown in C hapter 3, th e Jacobian m atrix 39 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ww vwv Figure 4.1.1: D istrib u ted A m plifier C ircuit will have ran k 2 N ( M — 1) for a circu it w ith N unknow n quantities and M analysis frequencies. W ith eight analysis frequencies, this results in a Jacobian m atrix of rank 135. In each sim ulation, all linear contributions to the Jacobian are used. Each approxim ation m ethod has different criteria for determ ining which nonlinear Jacobian contributions are used. O nce a Jacobian has been form ulated and reduced to an LU m a trix , it is reused via th e chord m ethod until th e residual error is reduced by less th a n five percent, at w hich tim e th e Jacobian is recalculated. All sim ulations of this circuit were run on a Sun M icrosystem s Sparc-20. R untim e perform ance is m easured in m achine cycles as rep o rted by Quantify, a pro d u ct of P u re-A tria for analyzing softw are perform ance. 40 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. —1 -20 Cl, -60 0 5 10 15 20 25 F R E Q U E N C Y (G H z) F igure 4.1.2: M agnitude of the o u tp u t voltage spectrum of th e d istrib u ted am plifier in dB m w ith in p u t power levels of 0 (O ), 10(-f-), and 20 d B m (n ). T he x-axis represents frequency in Hz. 41 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.1 Using the Full Jacobian Matrix Using th e full Jacobian M atrix is th e case of not using an approxim ation a t all. Sim ulations perform ed in this m anner will provide a baseline for m easuring perfor m ance im provem ents. Additionally, insight into the stru c tu re of th e full m atrix is needed to d eterm in e how best to approxim ate the Jacobian. The to tal harm onic balance ru n tim e for each of th e sim ulated input powers is shown in Figure 4.1.3, and the corresponding num ber of N ew ton-R aphson iterations is shown in Figure 4.1.4. 2.15 C/5 _<u "o >5 O 00 W U > u w z 55 u 2.1 2.05 2 1.95 1.9 < 1.85 1.8 0 5 10 15 20 IN PU T P O W E R (dBm) Figure 4.1.3: R un tim e in machine cycles for sim ulation of th e distributed am plifier for different in p u t pow er levels. As expected, th e sim ulation ru n tim e increases w ith th e am ount of input power applied to th e circuit in sim ulation. Larger input power generally results in a m ore 42 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 90 88 oo g 86H < §3 84H tu S 82OJ 03 2 z 807876 0 5 15 10 20 INPUT POWER (dBm) F ig u re 4.1.4: N um ber of N ew ton-R aphson iterations required for convergence of sim ulation of th e d istrib u ted am plifieer circuit w ith respect to in p u t power level. 43 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. dense Jacobian m atrix and a larger m agnitude in th e nonlinear contributions to th e Jacobian. T hus, th e co m p u tatio n an d LU decom position of the Jacobian is m ore expensive for higher in p u t power. In th e case of the circuit a t hand, except for th e 20 dB m in p u t power level, th e Jacobian is calculated only once for each input power level considered. T his in itial Jacobian is reused throughout the sim ulation because th e resulting harm onic balance error a t each iteratio n is always at least 5% lower th an th e erro r a t the previous iteration. T he Jacobian is recalculated and decom posed once for th e 20 d B m in p u t power sim ulation, resulting in the large increase in sim ulation tim e seen in F igure 4.1.2. T h e first Jacobian calculation assumes only the DC solution a t th e linear-nonlinear circuit interface, so this Jacobian will be th e sam e regardless of the AC in p u t power level. Thus, th e difference in perform ance is due to th e relativ e inaccuracy of th e initial Jacobian a t each of the following itera tions. For low in p u t power, th e nonlinear contributions are small enough th a t the initial Jaco b ian is a relatively accu rate representation of any Jacobian recalculated during th e itera tiv e N ew ton-R aphson process. This representation is not quite as accurate for th e higher in p u t power levels, resulting in slower convergence. T h e m agnitudes of th e o u tp u t voltage spectra for in p u t powers of 0, 10, and 20 dB m are shown in F igure 4.1.3. Clearly, th e o u tp u t spectrum of the am plifier increases su perlinearly w ith an increase in in p u t power. T h is is especially evident at the higher analysis frequencies. A high input power level results in a flatter o u tp u t spectrum w ith larger values at all frequencies. 44 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. F u rth er insight in to th e stru ctu re of th e Jacobian m atrix for this particular cir cuit m ay be gained by exam ining th e Jacobian at different iterations and at different power levels th ro u g h o u t th e solution process. For this exam ination the Jacobian m a trix was recalculated a t each step in the iterativ e process. Regardless of input power, th e nonlinear Jaco b ian contributions occured in th e same m atrix locations for this circuit. T his im plies th a t th e behavior of th e circuit is not strongly nonlinear, as increasing in p u t pow er does not result in a m ore dense Jacobian m atrix. However, as shown in Figure 4.1.5, th e m agnitude of off-diagonal Jacobian entries increases considerably w ith an increase in input power, m eaning th a t the relationship be tween circuit q u an tities at different frequencies is stronger as input power increases. For higher input pow er levels, where the circuit quantities become larger at higher analysis frequencies, th e off-diagonal entries become m ore im portant in obtaining convergence. Each block of th e Jacobian represents th e derivative of th e error function at one p articu lar frequency / e w ith respect to th e sta te variables a t a particular frequency f x . T he diagonal blocks represent th e case where / e — f x - Table 4.1.1 and Ta ble 4.1.2 show th e average nonlinear contribution to the Jacobian w ithin each block at input power levels of 0 dB m and 20 dB m respectively. T he diagonal blocks and their neighbors generally have averages greater th an the average of all nonlinear con tributions. T he linear Jacobian contributions, not included in these tables, would greatly increase th e average m agnitude in th e diagonal blocks. W ith an increase in 45 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.016 0.014 0.012 ffl Q o.oi D g 0.008 O ^ 0.006 «» 0.004 «x> ooo 0 20 40 60 80 100 120 140 P R O X IM IT Y T O M A T R IX D IA G O N A L Figure 4.1.5: M agnitude of nonlinear jacobian contributions with respect to prox im ity to the diagonal of the jacobian in the sim ulation of th e distributed am plifier circuit. T he x-axis is th e absolute difference between th e row and column indices of the nonlinear entry, while the y-axis is the absolute value of the entry. T he entries of th e Jacobian a t 0 dBm input power are represented by O , while the entries of the Jacobian at 20 dB m input power cure represented by + . 46 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 4 8 12 16 20 24 28 0 3.984e-3 3.152&-3 1.289e-3 1.245e-3 7.597e-5 1.095e-3 1.325e-3 9.465e-4 4 2.658e-3 2.019e-3 1.712e-3 1.132e-3 6.654e-4 5.875e-4 7.174e-4 6.864e-4 8 2.612e-4 1.309e-3 1.438e-3 2.275e-3 1.50e-3 1.00e-3 4.345e-5 8.151e-4 12 6.701e-4 8.487e-4 1.909e-3 2.026e-3 2.345e-3 1.398e-3 8.417e-4 4.689e-4 16 6.777e-5 3.01e-4 9.486e-4 1.908e-3 1.844e-3 2.702e-3 1.755e-3 l.l25e-3 20 6.707e-4 3.566e-4 5.752e-4 1.115e-3 2.374e-3 2.334e-3 2.908e-3 1.682e-3 24 2.55e-4 2.702e-4 2.789e-5 5.258e-4 1.222e-3 2.453e-3 2.268e-3 3.175e-3 28 2.38e-4 3.395e-4 4.693e-4 3.169e-4 6.827e-4 1.406e-3 2.885e-3 2.702e-3 Table 4.1.1: Average m agnitude of nonlinear jacobian contributions w ithin each frequency p a rtitio n w ith input power level of 0 dBm. Each row corresponds to th e frequency of th e unknow n while each colum n corresponds to th e frequency of th e error function. T h e final Jacobian used in th e sim ulation is shown, for which th e average of all th e nonlinear Jacobian contributions is 1.349e-3. T he first row and colum n are frequencies in GHz. in put power, th e average values w ithin th e blocks generally increase, particularly for the off-diagonal blocks. T his corresponds to the increase in th e contributions to th e error fu n ctio n a t higher frequencies resulting from higher in p u t power. A nother im p o rta n t figure of m erit for evaluating th e perform ance of the sim ulator is th e n u m b er of Jacobian calculations an d the num ber of tim es the calculated Jacobians are reused during sim ulation. T h e calculation an d LU decom position of th e Jacobian is com putationally intensive, and thus can be very expensive. W hen th e Jacobian is reused, th e am ount by w hich the error is reduced w ith each reuse is generally less th a n th e am ount by which th e error would be reduced by recalculating an d decom posing th e Jacobian. However, th is cost is m ore th a n offset by th e cost savings of n o t recalculating and decom posing the Jacobian, as seen in Figure 4.1.6, which shows th e ru n tim e for recalculating th e Jacobian a t every iteration. 47 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. T he 0 4 8 12 16 20 24 28 0 3.934e-3 3.144e-3 1.353e-3 7.707e-4 7.88e-4 2.002e-3 2.016e-3 1.312e-3 4 2.620e-3 1.976e-3 1.779e-3 1.302e-3 9.912e-4 1.131e-3 1.249e-3 1.017e-3 8 3.547e-4 1.403e-3 1.656e-3 2.474e-3 1.591e-3 7.727e-4 5.142e-4 1.504e-3 12 3.103e-4 9.784e-4 1.970e-3 2.285e-3 2.376e-3 1.337e-3 7.563e-4 9.873e-4 16 6.740e-5 4.996e-4 8.666e-4 1.866e-3 1.927e-3 2.833e-3 1.88e-3 8.13e-4 20 1.108e-3 6.924e-4 4.088e-4 1.073e-3 2.407e-3 2.571e-3 3.04e-3 1.625e-3 24 2.965e-4 4.65 le-4 2.836e-4 3.966e-4 1.145e-3 2.455e-3 2.397e-3 3.312e-3 28 3.940e-4 4.936e-4 8.034e-4 6.693e-4 3.795e-4 1.367e-3 2.972e-3 2.954e-3 T able 4.1.2: Average m agnitude of nonlinear jacobian contributions w ithin each frequency partition w ith in p u t power level o f 20 dB m . Each row corresponds to the frequency of th e unknow n while each colum n corresponds to th e frequency of the erro r function. T he final Jacobian used in th e sim ulation is shown, for which the average of all th e nonlinear Jacobian contributions is 1.482e-3. T h e first row and colum n are frequencies in GHz. nu m b er of iterations required for convergence when th e Jacobian is n o t reused is n o t significantly lower th a n our m ethod of recalculating th e Jacobian only when the erro r is not reduced by five percent or m ore, an d th e runtim e is considerably higher. T h e reason for this is th a t th e Jacobian sim ply does not change m uch between iteratio n s. D ue to the m ild to m o d erate nonlinearity of th e distributed am plifier circuit, it is evident th at this circu it is am enable to approxim ation techniques. T h e nonlinear Jaco b ian entires occur in th e sam e m atrix locations across the en tire ran g e of input pow er considered, an d th e initial Jacobian m a trix is sufficient for sim u latio n conver gence across m ost in p u t power levels as well. T h is implies th a t some approxim ation of th e Jacobian m ay be m ade w ithout sacrificing th e convergence abilities of the sim ulator. 48 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.5 V njI o U o co M J U >< 1.5 a Z 53 u < 0.5 0 0 1 1 5 10 1---------------------------------------------- 15 20 IN P U T P O W E R (dB m ) Figure 4.1.6: R untim e in m achine cycles for harm onic balance of the distributed am plifier circuit w ith respect to in p u t power level. T he u p p er curve corresponds to calculating th e Jacobian at every step in th e N ewton-Raphson solving process, while th e lower curve corresponds to using th e sam e Jacobian throughout the solving process. 49 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 4.1.2 Using Block Jacobian Matrix Techniques T he first approxim ation technique used for sim u latin g th e distributed am plifier cir cuit was th e block Jacobian m ethod as described by C hang. The blocking schemes used were 2, 4, and 8 blocks for sim ulations using eight analysis frequencies. Recall th e Jacobian stru ctu re as described in C h ap ter 3. The m atrix is ordered such th a t frequency inform ation is grouped to g eth er in large blocks, w ith th e blocks along th e diagonal representing th e relationship betw een circuit variables which rep resent the sam e frequency. Off-diagonal blocks represent th e relationship between circuit variables at different frequencies. Since th e re are eight frequencies consid ered, a schem e of eight blocks ignores all relationships betw een variables at different frequencies, while larger blocks include more inform ation about these relationships. For exam ple, a scheme of two blocks with eight analysis frequencies groups dc, the fundam ental analysis frequency, and the second an d th ird harm onics together in one block, and th e fourth through seventh harm onics in th e o th er block. Any relation ship betw een circuit p aram eters a t frequencies w ithin th e sam e block is used, but no relationship between p aram eters at frequencies from different blocks will be con sidered in th e analysis. For a weakly nonlinear circu it, th e blocks m ay be smaller, s since th e o u tp u t spectrum is generally not as rich as th e o u tput spectrum of a m ore strongly nonlinear circuit. However, as in p u t power increases, th e Jacobian entries in th e off-diagonal regions become larger and m ay be necessary to obtain convergence. Thus, the block size m ay need to b e increased. 50 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. T he resulting num ber of iterations required for convergence for each blocking scheme is shown below in Figure 4.1.7. C orresponding runtim es are shown in Fig ure 4.1.8. 90 NO BLOCKING 80 00 70 HH 2 BLOCKS 60 HH 40 4 BLOCKS 30 20 8 BLOCKS 0 10 5 15 20 INPUT POWER (dBm) Figure 4.1.7: N um ber of iteratio n s required for sim u latio n of the d istrib u ted am pli fier w ith different blocking schemes. As Figure 4.1.8 indicates, th e eight-block m eth o d seem ed to perform best among the block m ethods. T h e discarded Jacobian entries ap p aren tly were not significant enough to im pact ru n tim e negatively. In fact, th e sim ulation runtim es decreased almost across th e bo ard w ith a decrease in th e am o u n t of nonlinear inform ation included in th e Jacobian. T hus, it appears th a t th e m ost im p o rtan t nonlinear 51 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 NO BLO C KIN G 2 BLOCKS 4 BLOCKS HH EC 0.8 < 0.6 U 8 BLOCKS S 0.4 0.2 0 5 10 15 20 INPUT POWER (dBM) F igure 4.1.8: Sim ulation ru n tim e of th e d istrib u ted am plifier for different blocking schem es. R untim e is given in m achine cycles. 52 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. entries are located in th e diagonal sam e-frequency blocks. At higher input power levels, th e im provem ent seen by using sm aller diagonal blocks can be offset som ew hat by the need to recalculate and decom pose th e Jacobian. At th e highest power level, the Jacobian was recalculated once for th e 4 block scheme. Nevertheless, due to block-wise LU decom position and a fewer num ber of iterations required, th e sm aller blocking schem es still were the m ost successful. The runtim es for the eight block m ethod were lower th a n all others except for the 15 dB m in p u t power level, w here the Jacobian was recalculated once d u e to insufficient reduction of the residual error. T he Jacobians created by th e o th e r blocking schemes, while they contain more entries w hich are technically accu rate, are nevertheless slowed by these e x tra entries during LU decom position and solving. Additionally, th e direction of th e Newton u p d a te is im proved by rem oving the m atrix entries in off-diagonal blocks as evidenced by th e lower num ber of iterations. The runtim e advantages gained by doing block-wise LU decom position indicate th a t there is little to gain by including non-diagonal blocks which would p rev en t block-wise decom position. The m ethod of using off-diagonal blocks m ay prove useful for more strongly nonlinear circuits. 4.1.3 Using the Linear Jacobian along with the Diagonal of the Nonlinear Jacobian T he next ap p roxim ation technique im plem ented was the use of only the nonlinear Jacobian contributions which occur along th e diagonal of th e Jacobian. 53 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. This is equivalent to using th e relaxation technique of Hicks an d K han. This technique has been found to be useful a t low input power, but perform ance rapidly d eteriorates w ith increasing in put power, as seen in Figure 4.1.9 and Figure 4.1.10. F ar m ore iteratio n s are required as input power increases, due to th e increasing inaccuracy of th e approxim ation. At th e 10 dBm level and above, th e runtim e resulting from this technique is higher th a n th a t obtained by using th e full Jacobian m atrix. 300 CO 2 5 0 z o ^200 o4 E-1 ►-i 150 CLi O 04 pq 100 PQ s p Z 50 0 5 10 15 20 INPUT POWER (dBm) F igure 4.1.9: N um ber of iterations required for sim ulation of the d istrib u ted am pli fier using nonlinear Jacobian contributions on th e m atrix diagonal only. It is im p o rta n t to note th at th e largest Jacobian entries are not being used w ith this technique. Figure 4.1.5 shows th a t th ere are m any off-diagonal nonlinear 54 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. MACHINE CYCLES (GCycles) 8 7 6 5 4 3 2 1 0 0 5 10 15 20 INPUT POWER (dBm) Figure 4.1.10: Sim ulation, ru n tim e required for sim ulation of the d istrib u ted amplifier using nonlinear Jaco b ian contributions on th e m a trix diagonal only. Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Jacobian contributions w hich are m uch larger th an th e nonlinear contributions along th e diagonal. However, it has now been shown th a t th ese larger elem ents are not necessary for convergence, even at high input levels. Also, since th e linear Jacobian contributions are present only in th e smallest size diagonal blocks, this technique has th e added efficiency of being able to use block-wise m atrix decom position. Of course, for this exam ple th e price associated w ith th e high num ber of necessary iteratio n s makes this technique u n attractiv e for high in p u t levels. 4.1.4 Using the Linear Jacobian only For weakly nonlinear circuits, th e nonlinear contributions to the Jacobian are not always necessary for convergence of th e N ew ton-Raphson algorithm . A t low input powers especially, th e linear contributions alone m ay be sufficient for convergence in ju st a few steps. For th e d istrib u ted amplifier circuit, th e linear portion of the Jaco b ian is sufficient for convergence at all input power levels investigated. The n u m b er of reuses increases significantly with increased in p u t power. Figure 4.1.11 shows th e num ber of reuses of th e linear Jacobian required for convergence with respect to input power level. T h e corresponding runtim es are shown in Figure 4.1.12. O nce again, the ru n tim es a t th e 10 dBm in p u t power level and above are as high or higher th an those using th e full Jacobian. C learly this is a m ethod which works best at low in p u t power. C om pared to using only th e nonlinear contributions along th e diagonal, this technique takes more th a n tw ice th e num ber of iterations 56 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 700 600 00 Z O HH 500 H < gj 400 H i—( Pu O 300 PC w 29 200 D Z 100 0 5 10 15 20 INPUT POWER (dBm) Figure 4.1.11: N um ber of iterations required for sim ulation of the distributed am plifier w hen only linear Jacobian contributions are used. 57 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 14 GO 0 10 5 15 20 INPUT POWER (dBm) Figure 4.1.12: S im ulation ru n tim e (in m achine cycles) req u ired for the distrib u ted am plifier when o n ly linear Jacobian contributions are used. 58 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to ob tain convergence at the highest power level. In this case, because th e Jacobian need be calcu lated only once and reused throughout th e sim ulation, the relaxation technique costs alm ost nothing e x tra as com pared to using only linear contributions. B oth techniques result in a m atrix which can be inverted block-wise, and since the nonlinear co n trib u tio n s are calculated only once, the associated cost of calculating these con trib u tio n s is negligible. 4.1.5 Using a Threshold Value for Nonlinear Jacobian Con tributions A nother tech n iq u e for approxim ating the Jacobian m atrix is to set a threshold value for nonlinear Jacobian contributions. All nonlinear contributions with a m agnitude g reater th a n th e threshold value are used in the approxim ate Jacobian, while the o ther n onlinear contributions are discarded. From experim entation, it was found th a t th e lin ear Jacobian contributions are critical for sim ulation convergence and m ust not be rem oved from the m a trix regardless of th eir m agnitudes. O f course, th e o ptim al th resh o ld value for th e nonlinear Jacobian contributions m ust be d eter m ined, an d th is value depends on circuit param eters. It has alread y been shown in Sections 4.1.2, 4.1.3, and 4.1.4 th a t rem oving som e of th e nonlinear contributions to th e Jacobian can have a positive im p act on th e sim ulation runtim e. At some point, however, rem oving too m uch d ata can cause significant increase in runtim e due to inaccuracy of th e Jacobian. The threshold- 59 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. ing m eth o d a tte m p ts to avoid this problem by including only the m ost significant nonlinear Jaco b ian contributions. T h e d istrib u ted am plifier circuit was sim ulated w ith several different threshold values a t th e sam e input pow er levels discussed previously. Obviously, different input power levels will generate Jaco b ian entries of different m agnitudes, so it would seem th a t different threshold levels would apply. F in d in g A p p rop riate T h resh o ld Values For low in p u t power, in this case 0 and 5 dBm, th e threshold of infinity will actually work well for this circuit, as seen when only th e linear Jacobian contributions were used. F igure 4.1.13 shows a logrithm ic scale plot of th e runtim e for 0 and 5 dBm inpu t pow er a t different threshold levels as well as th e ru n tim e when no thresholding is used. T h e x-axis is th e base 10 logarithm of th e threshold value, and the y-axis is th e ru n tim e in m achine cycles. C learly th ere is a range of threshold values which perform best for th e low input levels, sta rtin g where th e base 10 logarithm of th e threshold value is about —2.2. This corresponds to a threshold value of about 6.e-3. It has already been shown in Section 4.1.4 th a t the nonlinear contributions to th e Jacobian are unnecessary for fast ru n tim es a t low power levels, so it is expected th a t raising th e threshold even higher has little im pact on sim ulation runtim e. No m a tte r which threshold values are used, th e b est runtim e for low input power is still slightly higher th a n th a t obtained when th e highest level of diagonal blocking 60 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 5 dBm(no thresholding)^ CO •8 I-® 1 16 ffl 1-4 J y U M 2 £ u < 0 dBm (no thresholding) ,a 1 °-8 0.6 5 dBm s 0.4 0 dBm 0.2 •5 -4.5 -4 -3.5 •3 -2.5 -2 -1.5 1 LOG(THRESHOLD VALUE) Figure 4.1.13: Sim ulation runtim e (in m achine cycles) required for the distributed amplifier at low power when Jacobian en tries are subject to threshold levels. T he x-axis represents th e base 10 logarithm of th e threshold value. 61 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100 90 00 80 Z 2 70 1 «• t2 50 o Oh 4Q pa 40 co 1 30 z 5 dBm 10 0 dBm ■5 -4.5 -4 -3.5 •3 -2.5 •2 -1.5 •1 LOG(THRESHOLD VALUE) Figure 4.1.14: N u m b er of iterations req u ired for convergence of th e N ew ton-Raphson m ethod for th e d istrib u te d amplifier a t low power when Jaco b ian entries are su b ject to threshold levels. T h e x-axis represents the base 10 lo g arith m of the threshold value. 62 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. (8 blocks) was used. As Figure 4.1.5 im plies, unless th e th resh o ld value is very high, th ere are nonlinear Jacobian entries very far from th e m a trix diagonal, which prevents block-wise decomposition of th e m a trix . T he ru n tim e savings seen by th e thresholding techniques is due to two factors. T h e num ber of elem ents to be factored in the LU decom position process is reduced, resulting in a slight runtim e im prove m ent. However, th e m ain reason for th e ru n tim e im provem ent is th e reduction in the num ber o f iteratio n s required for convergence. By choosing only th e largest non linear co n trib u tio n s to th e Jacobian, th e thresholding technique is em phasizing th e strongest relationships between the s ta te variables and th e erro r function. This em phasis biases th e direction of the Newton step , resulting in fewer Newton-Raphson iterations, as seen in Figure 4.1.14. At higher in p u t power levels, th e tradeoff betw een Jaco b ian accuracy and sim ulation ru n tim e is m ore evident. Figure 4.1.15 shows a logrithm ic scale plot of th e runtim e for 10, 15, and 20 dBm input pow er a t different threshold levels as well as the runtim e a t th ese power levels when no thresholding is used. As was the case with low pow er in p u t, th e m inim um for all th e curves occurs a t a threshold level of 6.e-3. However, ru n tim e does increase w ith an increase in th e threshold level. At these higher in p u t power levels, the nonlinear Jacobian contributions are larger, with m ore significant entries further away from th e m atrix diagonal th a n for lower power levels. As th e threshold value increases, sim ulation ru n tim e approaches th a t of the technique w hich uses no nonlinear Jaco b ian contributions, as the num ber of 63 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. nonlinear contributions to th e Jacobian is greatly reduced. The num ber of iterations required for sim ulation convergence is shown in Figure 4.1.16. O f th e 2484 nonlinear contributions to th e Jacobian, 150 are above th e optim al threshold level of 6.e-3 for an input power level of 20 dB m . Of those, 106 occur in th e blocks for which / e = f x , an(i 143 occur either in these diagonal blocks or blocks d irectly adjacent to th e diagonal blocks. Thus, even a t high input power levels, th e largest Jacobian contributions are located in or n ear th e diagonal blocks. For th e 0 dB m in p u t level, 114 of 2484 nonlinear contributions to the Jacobian are above the thresh o ld level, 21 of w hich do not occur in th e diagonal blocks. All of these occur in blocks directly ad jacen t to the diagonal blocks. N ote th a t at 0 dB m input power, a th resh o ld of 0.01 provides convergence w ith th e sam e runtim e. T he only nonlinear Jaco b ian contributions which exceed this threshold occur in the diagonal blocks, b u t n o t all the contributions in the diagonal blocks are used. Again, because the Jaco b ian inversion is n o t done block-wise, th e ru n tim e for the threshold technique is still slightly higher th a n th e 8-block m ethod. 4.1.6 Summary From observing sim u lato r perform ance for all th e approxim ation m ethods investi g ated in this chapter, it appears th a t the best technique for this circuit is to use th e diagonal blocks of th e Jacobian only. This technique uses roughly the sam e nonlin ear Jacobian entries of th e optim al threshold value, and thus emphasizes th e sam e 64 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 2.2 20 dBm (no thresholding) C/5 JO £ 1-8 U S 1 .6 00 W J u >< 1.4 u 10 dBm, 15 dBm (no thresholding) 1.2 m 5 1 s u < 20 dBm 0.8 s 15 dBm 0.6 10 dBm 0.4 ■3 - 2.8 - 2.6 -2 .4 - 2.2 ■2 LOG(THRESHOLD VALUE) Figure 4.1.15: Sim ulation ru n tim e (in m achine cycles) required for th e distributed amplifier at high power when Jaco b ian entries are subject to threshold levels. T he x-axis represents th e base 10 logarithm of th e threshold value. 65 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 100 90 oo 80 70 60 50 40 20 dBm 15 dBm 20 10 dBm ■3 - 2.8 - -2.4 2.6 - 2.2 ■2 LOG(THRESHOLD VALUE) Figure 4.1.16: N u m b er of iterations required for convergence of th e N ew ton-Raphson m eth o d for th e d istrib u te d amplifier at high pow er w hen Jacobian entries are subject to threshold levels. T h e x-axis represents th e base 10 logarithm of th e threshold value. 66 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. im p o rtan t Jaco b ian entries, b u t it has th e added advantage of simplifying th e LU decom position of th e m atrix . W hile th e m atrix was decom posed only once for m ost of th e sim ulations, th e block Jacobian m ethod still provided slightly faster runtim es th an th e th resh o ld m eth o d by providing an equally good N ew ton update direction and a fa ste r m ethod for decom posing th e m atrix. T h e m ethods which involve using only linear Jacobian contributions or only the nonlinear contributions which occur on th e m a trix diagonal are useful only for m ildly nonlinear circuits and only a t low input pow er levels. Even in th ese cases, the diagonal block m ethods perform ed at least as well as these techniques. For this circuit, th e best Jacobian approxim ation is th e diagonal block m e th o d which uses the sam e num ber of blocks as analysis frequencies. 4.2 Nonlinear Transmission Lines A n o nlinear transm ission line(N LTL) consists of reverse biased diodes d istrib u ted along a transm ission line a t regular intervals. A large single^tone AC voltage is used to excite th e circuit a t its in p u t, producing a very short d u ratio n voltage spike at its o u tp u t. A u n it cell of th is type of transm ission line is shown in F igure 4.2.1. Several u n it cells are connected in series to form a tru e NLTL. Due to th e strong n onlinearity of th e circuit elem ents a n d th e high in p u t voltage levels a t w hich th e circuit is driven, this circuit is an excellent test for a harm onic balance sim u lato r. T h e o u tp u t frequency spectrum of th is type of circuit will be 67 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. F igure 4.2.1: Nonlinear transm ission line u nit cell. very dense due to th e high input power and strong nonlinearity of th e circuit. Also, th e num ber of u n it cells is generally large as well, resulting in a large num ber of unknown variables. As a result, th e Jacobian m atrix will be large. Figure 4.2.2 shows the o u tp u t waveform of a 47 diode NLTL driven at 14 volts AC. Transm ission lines of varying lengths were sim ulated w ith approxim ate Jaco bian m atrices. T h e Jacobian approxim ations used for this circuit include the block m ethods already discussed as well as blocking techniques which include some offdiagonal blocks. T h e off-diagonal blocks allow for a more regular treatm en t of the cross-frequency inform ation than th e diagonal block Jacobian m ethods. The differ en t off-diagonal m eth o d s include blocks up to some predeterm ined num ber of blocks away from th e diagonal blocks. For exam ple, if th e off-diagonal level is two, then all blocks B ij, w here \i — j \ < 2 are used. Additionally, th e off-diagonal blocking technique is used in place of the threshold technique. It was determ ined in Sec tio n 4.1 th a t th e advantage gained by using th e threshold technique was not due to an im provem ent in th e m atrix decom position b u t rather due to an im provem ent in 68 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. > LU (D < f__l O > h13 a. -10 hZ> EXPERIMENTAL DATA HARMONIC BALANCE o -15 0 10 20 30 40 50 60 TIME (ps) Figure 4.2.2: M easured(solid line) and sim ulated(dashed line) o u tp u t waveform for the 47 diode NLTL. 69 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th e d irection of th e Nevvton-Raphson u p d a te . T h e significant nonlinear elem ents in th e off-diagonal blocks of th e Jacobian generally occur in blocks close to th e m atrix diagonal. In this case th e lower level off-diagonal block Jacobian techniques m ay be used. For future reference, th e blocking schem es are defined in Table 4.2.1. Blocking Scheme D escription 1 2 4 8 16 32 ol o2 O ne block used, i.e., w ithout approxim ation. Two diagonal blocks used. Four diagonal blocks used. Eight diagonal blocks used. Sixteen diagonal blocks used. T hirty-tw o diagonal blocks used. Level one off-diagonal blocking used. Level two off-diagonal blocking used. T ab le 4.2.1: Blocking approxim ation schem es. A num ber by itself refers to the n u m b er of diagonal blocks used. A n u m b er preceded by the letter “o” indicates an off-diagonal blocking schem e. 4.2.1 A 10 diode NLTL A circuit of ten unit cells was sim ulated a t several different power levels. T h e o u tp u t pow er spectrum is shown below in F igure 4.2.3 for input voltages of 1, 3, 6, and 9 volts. Clearly, th e o u tp u t spectrum changes dram atically with an increase in drive level. T h e spectrum decreases m onotonically for th e low input voltages, b u t becomes non-m onotonic and fla tte r as th e in p u t level is increased. T h e sim ulator perform ance was m easu red w ith respect to the cpu ru n tim e re q u ire d for convergence a n d th e num ber of iteratio n s required. T h e n u m b er of tim es th e Jaco b ian is recalculated during th e sim u latio n is another im p o rtan t perform ance 70 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 20 40 60 80 100 120 140 160 180 FREQUENCY (GHZ) Figure 4.2.3: M agnitude of voltage o u tp u t spectrum of th e 10-Diode NLTL w ith 1 (0 ), 3 (+ ), 6 ( d ), and 9 ( x) volt in p u t AC voltages. 71 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. m easurem ent, and th is is generally reflected in th e runtim e required for convergence. Results are presented w ith respect to the sparsity of the m atrix approxim ation. Ma trix sparsity was defined in Section 3.2.5 as th e percentage of th e m atrix which is being neglected in th e approxim ation. For exam ple, using no approxim ation would be 0% sparsity, while using th e two-block m eth o d is 50% sparsity. Thus, m atrix sparsity indicates th e stru ctu re of the Jacobian m atrix approxim ation. T he num ber of Newton-Raphson iterations and the “Jacobian entry ratio” are reported for each sim ulation. The Jacobian en try ratio is defined as the ratio be tween the sum of th e m agnitudes of all Jacobian entries used by each approxim ation and the sum of th e m agnitudes of all Jacobian entries before th e approxim ation is applied. In other words, each tim e the Jacobian is calculated, th e sum of th e mag nitudes of all Jacobian entries is calculated. T h e Jacobian approxim ation is then applied, resulting in som e m atrix entries being discarded. T h e sum of the magni tudes of all discarded entries is then calculated. T he ratio of th e two m agnitudes is then the “Jacobian e n try ratio.” This ratio is an indication of how m uch inform ation is being discarded for each approximation. Low Input V oltage Results for an in p u t voltage of 1 volt are shown in Figure 4.2.4. Generally, th e Jaco bian approxim ation techinques rarely perform ed b etter th an using th e full Jacobian m atrix at this in p u t level. The performance for all approxim ations was very similar. Though the nonlinearities of this circuit are strong, this drive level is low enough 72 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th a t m ost of th e Jaco b ian elem ents of significant size are located close to the m a trix diagonal. However, for lower in p u t levels, there seems to be a range of Jacobian sparsity which provides slight perform ance improvements. As seen in F igure 4.2.4, 1 volt of ex citatio n is sm all enough th a t settin g large sections of th e Jaco b ian to zero has little effect on th e num ber of iterations required for sim ulator convergence. Due to th e low level of in p u t power, these sections are close to zero anyway. T h e only situations where th e Jacobian approxim ations had any effect on th e n u m b er of iterations were a t high levels of sparsity. T h e first spike in the iterations curve corresponds to using eight diagonal blocks in th e Jacobian approxim ation. T h e first dip in the curve corresponds to using level one off-diagonal blocking as shown in Figure 3.2.5. T h e final two points on th e g rap h correspond to 16 and 32 diagonal blocks. However, all of th e Jacobian ap p roxim ations provided a ru n tim e im provem ent over not using an approxim ation. W hen large sections of the m a trix are set to zero, the m atrix decom position tim e decreases accordingly. A t an in p u t AC voltage level of 3 volts, a richer Jacobian m a trix is formed. T hus, th e blocking m ethods resu lt in a larger amount of inform ation being lost th an for an in put level of 1 volt. As seen in Figure 4.2.5, th e blocking m eth o d s result in higher num ber of iteratio n s only for th e highest level of sparsity. A gain, the technique which provided th e g reatest benefit was the level one off-diagonal approxim ation. U nlike th e m ore w eakly nonlinear distributed am plifier circuit, there is a point a t 73 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.1 22 0.09 0 .0 8 H 0 .0 7 Ph 0 .0 6 04 < P-, O w PQ % ID 20 0.05 19 0 .0 4 ^ 0 .0 3 S < 0.02 C) iz; 0.01 --r 18 0 0.1 ~~" 0 .2 0 .3 > 0 .4 »_____ r ~ ~ . I 0 .5 0 .6 0 .7 _ f 0 .8 il_ 0 .9 1 JACOBIAN MATRIX SPARSITY F igure 4.2.4: N um ber of iterations a n d ratio of unused to used Jacobian entries for sim ulation of th e 10 diode NLTL w ith Iv AC input voltage. T h e dashed line rep re sents th e ratio of th e m agnitude of unused Jacobian entries to th e total m agnitude of all Jacobian entries, while th e solid line represents th e corresponding num ber of iteratio n s required for convergence. 74 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Schem e M atrix Sparsity 1 2 4 8 16 32 ol o2 o5 o l6 0 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.6843 0.2298 N um ber of Jacobian Evaluations 2 2 2 2 2 2 2 2 2 2 R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 213 210 213 219 220 218 209 209 211 209 19 19 19 21 21 22 18 19 19 19 N /A 1.4 0.0 -2.8 -3.3 -2.3 1.9 1.9 0.9 1.9 Table 4.2.2: N um ber of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an input AC voltage level of 1 volt. Blocking Scheme M atrix Sparsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im p ro vem ent ( %) 1 2 4 8 16 32 ol o2 o5 o l6 0 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.6843 0.2298 2 2 2 2 2 2 2 2 2 2 253 248 245 244 251 270 225 245 251 254 36 35 33 33 35 40 31 36 36 36 N /A 2.0 3.2 3.6 0.8 -6.7 11.1 3.2 0.8 -0.4 Table 4.2.3: N um ber of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an input AC voltage level of 3 volts. 75 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 40 0 .4 0 .3 5 38 0.3 00 2 36 O H < 0 .2 5 0.2 fc 2 w 32 0 .1 5 03 03 s D 30 2 0.1 0 .0 5 26 0 0.1 0.2 0 .3 0 .4 0.5 0.6 0 .7 0.8 0 .9 1 JACOBIAN MATRIX SPARSITY Figure 4.2.5: N um ber of itera tio n s and ratio of unused to used Jacobian entries for sim ulation of the 10 diode NLTL w ith 3 volt AC input voltage. T h e dashed line represents the ratio of th e m agnitude of unused Jacobian entries to th e to tal m agnitude of all Jacobian en tries, while th e solid line represents th e corresponding num ber of iterations required for convergence. 76 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. w hich rem oving Jaco b ian inform ation in off-diagonal blocks has a negative im p act on sim ulation runtim e, even a t low in p u t pow er levels. For the 10-diode line, th is level of sparsity appears to be the highest level diagonal blocking approxim ations. B oth th e 16 and 32 block approxim ations resu lted in a longer runtim e th a n th a t o b tain ed by not ap p ro x im ating th e Jacobian a t all. Rem oving too m uch inform ation from th e Jacobian resu lts in inaccurate N ew ton up d ates to the vector of unknow ns an d slows convergence. Even a t low input pow er levels, th e am ount of processing tim e saved in factoring a sim pler Jacobian m a trix was offset significantly by th e n u m b er of iterations req u ired for convergence. H ig h In p u t V oltage O nce th e in p u t AC voltage was increased to 6 volts, th e Jacobian m atrix becam e dense enough th a t significant am ounts of inform ation were lost even for th e m ost conservative approxim ations to the Jacobian. In fact, th e only approxim ations to m ake even a m odest im provem ent in sim ulator perform ance were th e level 16 an d 4 off-diagonal blocking schem es. From exam ining F igure 4.2.6, it is apparent th a t th e n u m b er of iterations required for convergence is roughly proportional to th e am o u n t of inform ation being rem oved by the different approxim ation techniques. A lso, any ad vantage in m a trix decom position tim e gained by using the approxim ations is lost du e to an increase in th e num ber of iterations required for convergence. For som e approxim ations, th e n u m b er of required Jaco b ian calculations is larger as well, as seen in Table 4.2.4. 77 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0 .5 240 CO 220 Z 200 O H 180 < 0.45 2 E- 0.4 < 0.35 >-i OJ c* 160 w H 140 »—H 120 0.3 g 0.25 ^ w 0.2 m 100 w £3 80 D 60 £ 0.15 g 0.1 ^ 0.05 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 0 1 JACOBIAN MATRIX SPARSITY Figure 4.2.6: N u m b er of iterations and ratio of unused to used Jacobian entries for sim ulation of th e 10 diode NLTL w ith 6 volts AC in p u t voltage. T he dashed line represents th e ratio of th e m ag n itu d e of unused Jacobian entries to th e total m agnitude of all Jacobian entries, w hile th e solid line represents th e corresponding num ber of iteratio n s required for convergence. 78 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme M atrix Sparsity N um ber of Jaco b ian E valuations R untim e (cpu secs) N um ber of Iterations R u n tim e Im provem ent (96) 1 2 4 8 16 32 ol o2 oo ol6 0 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.6843 0.2298 3 3 3 3 17 16 4 4 3 3 342 363 396 427 1933 1835 549 524 340 333 37 45 55 65 222 215 71 68 37 35 N /A -6.1 -15.8 -24.9 -465 -437 -60.5 -53.2 0.6 2.6 Table 4.2.4: N um ber of Jaco b ian calculations required for convergence and sim ula tion runtim e for different blocking schemes for th e 10 diode NLTL circuit w ith an input AC voltage level of 6 volts. Sim ilar results are seen for an input AC voltage level of 9 volts, as shown in Figure 4.2.7 and Table 4.2.5. Again, too m uch inform ation is being rem oved from the Jacobian m atrix by th e different approxim ation techniques for th e techniques to be successful. Only th e level 16 off-diagonal approxim ation was able to provide a slight ru n tim e im provem ent. Figures 4.2.4 through 4.2.7 show how th e Jacobian m atrix becom es m ore dense as th e in p u t drive level is increased. As the voltage is increased, th e ratio of the m agnitude of unused Jacobian entries to th e to ta l m agnitude of all Jacobian en tries rises m ore quickly w ith an increase in th e level of m atrix sparsity. It is also interesting to note th e relationship between th e effectiveness of th e approxim ation techniques and the form of th e o u tp u t voltage spectrum . As shown in Figure 4.2.3, 79 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.3 180 160 0.25 o CO Z O H 140 0.2 H < 06 £ 120 ca H 0.15 IX, 100 o > 06 EH Z w z Cti w 0.1 < S3 PQ 80 o D ^ 60 0.05 u < 40 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 JACOBIAN MATRIX SPARSITY Figure 4.2.7: N um ber of iterations an d ratio of unused to used Jacobian entries for sim ulation of th e 10 diode NLTL w ith 9 volt AC in p u t voltage. The dashed line represents th e ratio of th e m agnitude of unused Jacobian entries to the total m agnitude of all Jacobian entries, while th e solid line represents th e corresponding num ber of iteratio n s required for convergence. 80 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme M atrix Sparsity N um ber of Jaco b ian Evaluations R untim e (cpu secs) N um ber of Iterations R u n tim e Im provem ent ( %) 1 2 4 8 16 32 ol o2 o5 o l6 0 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.6843 0.2298 5 5 14 N /C N /C N /C N /C N /C 8 5 588 629 1621 N /C N /C N /C N /C N /C 951 561 59 76 173 N /C N /C N /C N /C N /C 91 56 N /A -7 -176 N /C N /C N /C N /C N /C -61.7 4.6 Table 4.2.5: N um ber of Jaco b ian calculations req u ired for convergence an d sim ula tion ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an in p u t AC voltage level of 9 volts. th e o u tp u t spectrum of this circuit is m onotonically decreasing for in p u t levels of 1 and 3 volts, while the sp ectru m is m uch bro ad er for the higher in p u t levels of 6 an d 9 volts. T he relationship betw een th e higher order harm onics becom es m ore im p o rtan t as th eir relative m agnitudes increase. Inform ation corresponding to this relationship is located away from th e diagonal of th e Jacobian, requiring m ore offdiagonal inform ation for an approxim ation to b e useful. 4.2.2 A 47 diode NLTL A 47 diode NLTL was built an d m easured by R odw ell’s research group a t th e U ni versity of California at S anta B arb ara [25]. T h e circuit was m easured w ith an in p u t signal of 27 dB m at 9 GHz. H arm onic balance sim ulations of this circu it a t bo th 81 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. low and high in p u t power levels were perform ed using TR A N SIM , w ith high power results re p o rte d in [26]. M atrix approxim ations were used w ith some success a t low input power, b u t th e effectiveness of approxim ate Jacobians deteriorated rapidly w ith an increase in in p u t power. Low in p u t v o lta g e T he 47 diode NLTL was sim ulated using 16 analysis frequencies w ith input AC voltages of 1, 2, 3, an d 4 volts. T he associated system of equations to be solved is of ran k 1457. A t an AC input level of 1 volt, th e different approxim ation techniques perform ed sim ilarly. All of these sim ulations required only two Jacobian evaluations to be perform ed, b u t sim ulation runtim es and th e n u m b er of iterations required for convergence were slightly different. As seen in F igure 4.2.8 and Table 4.2.6, shortest sim u latio n ru n tim e and the fewest num ber of itera tio n s required occurred for th e 16 block diagonal m ethod, w hich was the m ost sparse. T he block diagonal m ethods generally provided faster runtim es and required fewer iterations than offdiagonal m eth o d s in th is case. Again, th e relative uniform ity in th e perform ance of th e different Jacobian approxim ations can be traced to th e fact th a t m ost of the Jacobian inform ation is located in th e sam e-frequency blocks along the m atrix diagonal. As seen in Figure 4.2.8, th e next best approxim ation corresponded to a very sm all ra tio of th e sum of th e m agnitudes of all unused Jacobian entries to th e sum of th e m agnitudes of all Jacobian entries. T his is th e level th ree off-diagonal m ethod. 82 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.12 1440 1420 1400 zn 0.08 *3 1380 0.06 pj H ^ 1360 OS 0.04 0 1340 0.02 1320 1300 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M A TR IX SPARSITY Figure 4.2.8: Sim ulation ru n tim e and ratio of unused to used Jacobian entries for th e 47 diode NLTL w ith 1 volt AC input voltage. T he dashed line represents the ratio of th e m agnitude of unused Jacobian entries to th e to tal m agnitude of all Jaco b ian entries, while th e solid line represents th e corresponding runtim e required for convergence. Blocking Scheme M atrix Sparsity 1 2 4 8 16 ol o2 o3 0 0.5 0.75 0.875 0.9375 0.907 0.8506 0.7932 N um ber of Jacobian Evaluations 2 2 2 2 2 2 2 2 Runtim e (cpu secs) N um ber of Iterations Runtim e Improvement (%) 1416 1424 1364 1368 1314 1406 1370 1410 26 26 25 24 21 26 26 26 N /A -0.6 3.7 3.4 7.1 0.7 3.2 0.4 T able 4.2.6: N um ber of Jacobian calculations required for convergence and sim ula tio n ru n tim e for different blocking schemes for th e 47 diode NLTL circuit w ith an in p u t AC voltage level of 1 volt. 83 Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission. At th e 2 volt AC level, the 16 block diagonal m eth o d again proved most effective in reducing sim ulation runtim e, as seen in Figure 4.2.9 and Table 4.2.7. Each m ethod once again required only 2 Jacobian calculations for sim ulation convergence, but at this level th e difference betw een th e 16 block diagonal m eth o d and th e other approxim ations was more significant. N ote th a t th e level th ree off-diagonal m ethod, sparsity of ab o u t 0.79, is not able to provide a ru n tim e im provem ent over not using a Jacobian approxim ation. T his level of off-diagonal blocking was the second best approxim ation a t th e one volt level. Also, th e Jaco b ian e n try ratio for the 16 block diagonal m e th o d is m ore than twice as high as it was for a 1 volt AC input, indicating th a t th e m ag n itu d e of the Jacobian entries in off-diagonal blocks is increasing w ith an increase in drive level. Blocking Scheme M atrix Sparsity 1 2 4 8 16 ol o2 o3 0 0.5 0.75 0.875 0.9375 0.907 0.8506 0.7932 N um ber of Jacobian Evaluations 2 2 2 2 2 2 2 2 R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1845 1891 1869 1728 1488 1725 1824 1878 52 54 53 45 31 44 50 52 N /A -2.5 -1.3 6.3 19.4 6.5 1.1 -1.8 Table 4.2.7: N um ber of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schem es for th e 47 diode NLTL circuit w ith an input AC voltage level of 2 volts. At an AC in p u t voltage of 3 volts, th e num ber of Jacobians required for sim ula tion convergence varies somewhat for th e various Jacobian approxim ations. Again, 84 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.3 1900 1850 0.25 1800 0.2 1750 1700 0.15 1650 O H < OS >* ftS H Z EU z 0.1 1600 1550 0.05 < PQ O U < 1500 1450 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M A TRIX SPARSITY Figure 4.2.9: S im ulation ru n tim e and ratio of unused to used Jacobian entries for th e 47 diode NLTL w ith 2 volt AC input voltage. T h e dashed line represents the ratio of th e m ag n itu d e of unused Jacobian entries to the to tal m agnitude of all Jaco b ian entries, w hile th e solid line represents th e corresponding ru n tim e required for convergence. ’85 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 3600 3400 3200 ^3000 W S. a 2800 H z D 2600 a 2400 2200 2000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MATRIX SPARSITY Figure 4.2.10: Sim ulation runtim e an d ratio of unused to used Jacobian entries for the 47 diode NLTL w ith 3 volt AC input voltage. T he dashed line represents the ratio of th e m ag n itu d e of unused Jacobian entries to the to ta l m agnitude of all Jacobian entries, while th e solid line represents the corresponding runtim e required for convergence. as seen in Figure 4.2.10 and Table 4.2.8, the 16 block diagonal approxim ation was the best perform er, requiring only 2 Jacobian evaluations throughout the sim ula tion. T h e 8 an d 4 block diagonal techniques along w ith th e level 1 and 3 off-diagonal techniques required 3 Jacobians for convergence, while the level 2 off-diagonal ap proxim ation required 4 Jacobian evaluations. The 2 block diagonal approxim ation required 5 Jacobain evaluations as did th e m ethod of using th e full Jacobian m atrix. W hen th e in p u t AC level is increased to 4 volts, sim ulation convergence cannot be reached using th e full Jacobian. m a trix w ith 16 analysis frequencies. T he level two and level th re e off-diagonal approxim ations also fail to provide convergence. 86 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R u n tim e Im provem ent (%) 1 2 4 8 16 ol o2 o3 0 0.5 0.75 0.875 0.9375 0.907 0.8506 0.7932 5 5 3 3 2 3 4 3 3370 3560 2375 2360 2087 2171 2684 2349 52 55 53 51 63 40 43 51 N /A -5.6 29.5 30.0 38.1 35.6 20.4 30.3 T able 4.2.8: N um ber of Jacobian calculations required for convergence and simula tio n ru n tim e for different blocking schemes for th e 47 diode NLTL circuit with an in p u t AC voltage level of 3 volts. However, the o th er Jacobian approxim ations do provide convergence and the re su lts com pare very favorably to sim ulation w ith m ore analysis frequencies. This shows th a t the approxim ations can help avoid th e need for increasing th e num ber of analysis frequencies or th e use of excessive oversam pling as discussed in C hapter 3. R untim es and Jacobian en try ratios are shown for th e different Jaco b ian approxim a tions w ith an AC in p u t level of 4 volts in F igure 4.2.10 and Table 4.2.9. Note th at a t th e highest level of m atrix sparsity, alm ost half th e m agnitude of th e Jacobian is being neglected. V ery high inp ut v o lta g e In order to obtain convergence at 27 dB m in p u t power, a continuation m ethod was necessary. T he in p u t voltage level was increased in 2 volt increm ents up to the 14 87 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.5 25000 0.45 20000 O P 0.4 15000 0.35 0.3 5000 0.25 I* 04 Z U Z < 5 o u < 0.2 0.4 0.5 0.6 0.7 0.8 0.9 1 MATRIX SPARSITY Figure 4.2.11: S im ulation runtim e and ratio of unused to used Jacobian entries for th e 47 diode NLTL w ith 4 volt AC input voltage. T he dashed line represents th e ratio of the m ag n itu d e of unused Jacobian entries to th e to ta l m agnitude of all Jacobian entries, w hile th e solid line represents the corresponding runtim e required for convergence. Blocking Scheme M atrix S p arsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1 2 4 8 16 ol o2 0 0.5 0.75 0.875 0.9375 0.907 0.7932 N /C 18 18 7 3 6 N /C N /C 23053 13397 5399 2730 4752 N /C N /C 26 25 24 21 26 N /C N /C N /A N /A N /A N /A N /A N /C Table 4.2.9: N u m b er of Jacobian calculations required for convergence and sim ula tion runtim e for different blocking schem es for the 47 diode NLTL circuit w ith an in p u t AC voltage level of 4 volts. 88 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. volt level which corresponds to 27 dBm . At th e early stages of this process, a sm all n u m b er of analysis frequencies were used. T h e num ber of analysis frequencies was increased throughout th e continuation process w ith increases in in p u t voltage, w ith 40 analysis frequencies necessary for sim ulation convergence when th e circuit was finally driven a t 14 volts AC. T h e m easured a n d sim ulated o u tp u t waveforms of this circu it are shown in F igure 4.2.2. The Jacobian approxim ations discussed for the d istrib u te d amplifier and th e 10 diode soliton line were unable to provide sim ulation convergence a t this level. 4.2.3 Summary T h e perform ance of N ew ton-R aphson based harm onic balance technique for sim u latin g nonlinear transm ission lines clearly m ay be im proved by approxim ating the Jaco b ian m atrix , especially for low input drive levels. Not only is th e tim e required to decom pose the m atrix reduced, but th e num ber of iterations required to achieve convergence is also reduced. W ith the ap p ro p riate selection of a Jaco b ian approxi m atio n , th e direction of th e resulting N ewton step m ay be superior to th a t provided by th e full Jacobian m a trix . By focusing th e Newton up d ate direction upon the diagonal blocks of th e m a trix , the Jacobian approxim ations em phasize th e m ost im p o rta n t relationships betw een th e s ta te variables and the error function. As seen w ith th e 47 diode line, an ap p ro p riate Jacobian approxim ation m ay even extend the ab ility o f th e sim ulator to converge w ith a given set of analysis frequencies. This 89 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. can greatly reduce the n um ber of sta te variables to be solved, resulting in a m uch sm aller system of nonlinear equations w ith a corresponding reduction in m atrix decom position an d solving. 90 Reproduced with permission o f the copyright owner. Further reproduction prohibited without permission. Chapter 5 Harmonic Balance Simulation Using Inexact Newton Methods with a Preconditioned Jacobian Matrix T his chapter exam ines th e use of inexact N ew ton m ethods for harm onic balance sim ulation of th e circuits discussed in C h ap ter 4. T h e block Jacobian approxim a tions described in C h ap ter 4 are used as preconditioners for iterative linear solution of Equation 2.4.1. T h e quasi-Newton process used here is described in Section 2.4.1 and is the well known GMRES technique. 5.1 Simulation of a Distributed Amplifier T h e distributed am plifier shown in Figure 4.1.1 was sim ulated using an iterative lin ea r solver to determ ine th e quasi-Newton step a t each iteration. T hirty-tw o analysis frequencies were used in th e sim ulations. Convergence was obtained only up to an in pu t power level of 15 dB m with this technique, as opposed to 20 dB m w ith the exact Newton m eth o d . Results were obtained fo r in p u t power levels of 0, 5, 10, and 15 dBm with an in p u t frequency of 4 GHz. The unknown circuit quantities are th e g a te and drain voltages for each of the 91 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th ree tran sisto rs. W ith th irty -tw o analysis frequencies, th e Jacobian will have rank 378. T h e Jacobian preconditioning techniques used for this circuit are basically th e sam e as th e approxim ations used w ith the N ew ton-based harm onic balance sim ulations described in C h a p te r 4. 5.1.1 Low Input Power Figure 5.1.1 shows the ru n tim e in cpu seconds and th e Jaco b ian en try ratio versus th e sp arsity of the Jacobian preconditioner for an in p u t level of 0 dB m . All precon d itioning techniques show an im provem ent in runtim e over using no preconditioner a t all, i.e., m atrix sp arsity of zero. T he runtim e seems to be inversely related to th e am o u n t of Jacobian inform ation deleted by each preconditioning technique. Of course, as seen in the figure, th e m agnitude of th e deleted Jacobian entries is quite sm all relativ e to the overall m ag n itu d e of all the Jacobian entries. W hile all th e pre conditioners provided a ru n tim e im provem ent, th e best im provem ent was a slightly less th a n 10 percent decrease in runtim e. This m oderately nonlinear circuit driven a t low pow er does not produce m uch of a distinction betw een th e different preconditioners, as most of th e Jaco b ian inform ation is located in the diagonal blocks represen tin g sam e frequency circuit quantities. T h e ru n tim e variation is due to th e effect of th e different m a trix stru ctu res upon decom position tim e for th e precondi tioner. T able 5.1.1 shows th a t th e num ber of Jacobian evaluations is th e sam e for all preconditioners, while th e n u m b er of iterations required for convergence is equal 92 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. to or g reater th a n th e case of not preconditioning th e Jacobian. 235 0.025 230 0.02 0.015 W 220 0.01 2 215 0.005 ^ Pi 210 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M A TR IX SPARSITY Figure 5.1.1: S im ulation runtim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to th e m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sp a rsity for the distrib u ted am plifier circuit. The input power level is 0 dB m Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R u n tim e Im provem ent (%) 1 2 4 8 16 32 ol o2 0 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 3 3 3 3 3 3 3 3 235 223 218 214 222 222 223 222 16 16 16 17 19 20 18 16 N /A 5.1 7.2 8.9 5.5 5.5 5.1 5.5 Table 5.1.1: N u m b er of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schemes for th e d istrib u ted am plifier circuit with an in p u t power level of 0 dBm . 93 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. A fter increasing th e in p u t power to 5 dB m , a n increase in the Jacobian entries in off-diagonal blocks is seen, resulting in greater perform ance variation am ong the different preconditioners. As shown in Figure 5.1.2 an d Table 5.1.2, perform ance is very sim ilar to th e 0 dB m input case up to a certain level of m atrix sparsity, b u t ru n tim e is higher for th e level two off-diagonal preconditioner th a n when no preconditioner is used. T h e best runtim e was provided by the level one off-diagonal schem e. W ith only one level of off-diagonal blocks used, only two Jacobians are needed for convergence as opposed to three for all o th e r preconditioners. A t th e lower in p u t power levels, the diagonal block preconditioners seem to provide a somewhat consistent im provem ent corresponding to the increase in the sp arsity of the preconditioners. Also, it is interesting to note th a t the 16 an d 32 block m ethods provide very sim ilar perform ance a t these levels. At the sam e tim e, the different off-diagonal preconditioners seem to be less effective than the block diagonal preconditioners of sim ilar sparsity. This is due to th e m ore difficult decom position for off-diagonal techniques. 5.1.2 High Input Power For an in p u t power level of 10 dBm , m ore variation is seen in the perform ance of th e preconditioners. As seen in Figure 5.1.3 and T able 5.1.3, the best ru n tim e was provided by the 16 block preconditioner, b u t th e 32, 8, and 4 block preconditioners provided a sim ilar ru n tim e im provem ent. T he off-diagonal preconditioners, while 94 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.045 0.04 0.035 0.03 2 H < 06 0.025 H z 0.02 z < 0.015 3 0.01 O U < 0.005 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MATRIX SPARSITY Figure 5.1.2: S im ulation runtim e (solid line) and ratio of the m ag n itu d e of all unused Jacobian entries to the m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sp arsity for th e d istrib u ted am plifier circuit. T h e input power level is 5 dBm . Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1 2 4 8 16 32 ol o2 0 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 3 3 3 3 3 3 2 3 241 228 222 223 219 214 162 256 26 26 27 35 27 18 36 27 N /A 5.4 7.9 7.5 9.1 11.2 32.8 -6.2 Table 5.1.2: N u m b er of jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schemes for th e distrib u ted am plifier circuit w ith an input power level of 5 dB m . 95 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th e y provide a slight ru n tim e im provem ent o f 8.4 percent, axe still not as effective as th e block diagonal schem es. 0.08 0.07 O 0.06 <. az 0.05 >- 06 E0.04 g Z 0.03 < ca O 0.02 U < l—s 0.01 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MATRIX SPARSITY F ig u re 5.1.3: Sim ulation ru n tim e (solid line) a n d ratio of th e m ag n itu d e of all unused Jaco b ian entries to th e m agnitude of all Jaco b ian entries calculated (dashed line) versus m atrix sparsity for th e distributed am plifier circuit. T h e in p u t power level is 10 dB m . W hen the input pow er is increased to 15 d B m , th e preconditioners show much m ore inconsistent perform ance, as seen in T able 5.1.4 an d Figure 5.1.4. The 16 block preconditioner, w hich was the best perform er for an in p u t power level of 10 d B m , was unable to provide a runtim e im provem ent over n ot using a preconditioner a t all. T h e 32 block diagonal preconditioner perform ed best in this case, providing a ru n tim e decrease of alm o st 15 percent. T h e level one off-diagonal schem e was the w orst perform er for th e 15 d B m input sim ulations, while th e level two off-diagonal 96 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme M atrix Sparsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1 2 0 3 3 3 3 3 3 3 3 238 224 213 26 26 22 N /A 5.9 10.5 210 21 11.8 207 214 218 218 18 24 13 4 8 16 32 ol o2 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 21 21 10.1 8.4 8.4 Table 5.1.3: N um ber of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schem es for th e d istrib u ted amplifier circuit with an input pow er level of 10 dB m . scheme provided only a slight ru n tim e im provem ent. T he two preconditioners which resulted in poorer runtim es th a n those obtained by not using a preconditioner both required an e x tra Jacobian calculation. 97 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 230 0.25 220 0.2 O 210 160 0.05 O 150 140 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MATRIX SPARSITY F igure 5.1.4: S im ulation runtim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to th e m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for the distributed amplifier circuit. T he input power level is 15 dBm . Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im p rovem ent (%) 1 0 2 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 2 2 175 165 161 158 208 149 224 165 45 45 46 43 18 34 16 45 N /A 5.7 4 8 16 32 ol o2 2 2 3 2 3 2 8.0 9.7 -IS .9 14.9 -28.0 5.7 Table 5.1.4: N um ber of Jacobian calculations required for convergence and sim ula tio n ru n tim e for different blocking schemes for the distributed am plifier circuit with an input power level of 15 dBm. 98 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.2 Simulation of a Short NLTL T he 10 -diode NLTL described in C hapter 4 was sim ulated w ith input AC voltages of 1 thro u g h 6 volts a t one volt increm ents. T h e analysis frequencies chosen for these sim ulations were DC, th e fundam ental frequency of 9 GHz, and th e next 30 harm onics for a total of 32 analysis frequencies. T hus, th e rank of th e Jacobian m atrix is 630. Each of th e blocking preconditioning techniques was used for these sim ulations. 5.2.1 Low input voltage For an in p u t voltage of only one volt, all preconditioning m ethods worked fairly well. All ru n tim es were sh o rter th a n th a t o b tain ed by using no preconditioner, and as seen in T able 5.2.1, th e range of im provem ent in ru n tim e is from 17.8 percent for th e 32 block diagonal preconditioner to 44 percent for the 16 block diagonal preconditioner. Each of th e off-diagonal preconditioners provided about 30 percent ru n tim e im provem ent versus not using a preconditioner, b u t they were not as ef fective as th e block diagonal preconditioners w ith sim ilar levels of sparsity, as seen in th e Table. T his is m ost likely due to th e in h eren t advantages of decom posing a block-diagonal m atrix. As seen in th e table, th e level two and level three off diagonal preconditioners provide convergence w ith th e sam e num ber of Jacobian evaluations and iteratio n s as the 16 block diagonal preconditioner, yet do not provide q u ite as great a ru n tim e benefit. 99 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme 1 2 M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 0 4 4 4 4 4 259 186 158 151 145 213 181 184 190 19 19 19 24 24 32 23 19 19 N /A 28.2 39.0 41.7 44.0 17.8 30.1 29.0 26.6 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 4 S 16 32 ol o2 o3 6 4 4 4 Table 5.2.1: N um ber of Jacobian calculations req u ired for convergence and sim ula tion ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an in p u t voltage of 1 volt. 0.08 260 0.07 240 0.06 < 220 ^ 0.05 C/3 M §E- O 0.04 200 Z D >« C4 H Z w 0.03 z 0.02 o 180 160 < 3 u < 0.01 140 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M ATRIX SPARSITY Figure 5.2.1: S im ulation rnntim e (solid line) an d ra tio of the m agnitude of all unused Jacobian entries to th e m agnitude of all Jaco b ian entries calculated (dashed line) versus m atrix sp arsity for th e 10 diode NLTL. T h e in p u t AC voltage level is 1 volt. 100 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. W ith an increase in voltage to 2 volts, the preconditioners with the greatest am ount of sparsity no longer are able to provide sim ulation convergence. Figure 5.2.2 also shows th a t th e ratio of unused to used Jacobian entries rises more quickly w ith increasing sparsity th a n it did for 1 volt input voltage. T h e range of im provem ent a t this level of input ranges from 16 to 40 percent, as seen in Table 5.2.2. In this case th e greatest ru n tim e im provem ent was provided by th e preconditioner which uses four blocks along th e diagonal. As seen in th e figure, this level of sparsity results in a much larger ratio of unused Jacobian entries to th e total m agnitude of th e Jacobian as com pared to m ost of th e other preconditioners which provided sim ulation convergence. It is also interesting to note th a t th e level three off-diagonal preconditioner uses th e m ost Jacobian inform ation of any of th e preconditioners, yet still provides a 27 percent im provem ent in runtim e. T h e level one off-diagonal preconditioner provides a slight ru n tim e im provem ent, b ut th e improvem ent is not as significant as it was for a 1 volt in p u t. W ith an increase to 2 volts, a loss in Jacobian accuracy of th e level one off-diagonal preconditioner becomes evident, as this preconditioner requires an additional Jacobian evaluation. T he level two and th ree off-diagonal preconditioners perform much as th ey d id for a 1 volt input, providing runtim e decreases of 29.4 and 27.1 percent respectively. For a 3 volt in p u t AC voltage, the tren d s observed in increasing voltage from 1 to 2 volts continued, as shown in Figure 5.2.3 and Table 5.2.3. T h e ratio of unused to used Jacobian entries continued to rise m ore quickly w ith increasing m atrix sparsity 101 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R u n tim e (cpu secs) N um ber of Iterations R untim e Improvem ent (%) 1 2 0 6 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 6 402 290 241 N /C N /C N /C 339 284 293 28 28 29 N /C N /C N /C 38 28 28 N /A 27.9 40.1 N /C N /C N /C 15.7 29.4 27.1 4 8 16 32 ol o2 o3 6 N /C N /C N /C 7 6 6 Table 5.2.2: N um ber of Jacobian. calculations required for convergence and sim ula tion runtim e for different blocking schem es for th e 10 diode NLTL circuit w ith an input voltage of 2 volts. N /C denotes no convergence. 0.035 420 400 0.03 380 0.025 < 360 0.02 340 0.015 2 300 0.01 O 280 0.005 260 240 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M ATRIX SPARSITY Figure 5.2.2: Sim ulation runtim e (solid line) a n d ratio of th e m agnitude of all unused Jacobian entries to th e m agnitude of all Jaco b ian entries calculated (dashed line) versus m atrix sp arsity for the 10 diode NLTL. T h e input AC voltage level is 2 volts. 102 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th a n it did for low er in p u t levels. Also, th e preconditioners which are m ost sparse were unable to p ro v id e sim ulation convergence. In this case, th e level 1 off-diagonal preconditioner was also unable to provide sim ulation convergence. Among th e p reconditioners which were ab le to provide convergence, th e runtim e im provem ent ranges from 13.8 to 40.9 percent. Again, the 4 block preconditioner seem ed to provide th e best runtim e im provem ent, and tins level of sparsity corre sponds to the highest ra tio betw een unused Jaco b ian entries and the to ta l m agnitude of all Jacobian en trie s. It is interesting to n o te th a t th e percent im provem ent pro vided by this p reco n d itio ner was somewhat co n stan t for all three voltage levels. In fact, when a p a rtic u la r preconditioner was able to provide sim ulation convergence, th e runtim e im provem ent was relatively consistent for all three voltage levels. Blocking Scheme M atrix S p arsity N um ber of Jacobian Evaluations R u n tim e (cpu secs) N um ber of Iterations R u n tim e Im provem ent (%) 1 2 0 6 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 6 6 406 287 240 N /C N /C N /C N /C 294 350 29 30 31 N /C N /C N /C N /C 34 37 N /A 29.3 40.9 N /C N /C N /C N /C 27.6 13.8 4 8 16 32 ol o2 o3 N /C N /C N /C N /C 6 7 Table 5.2.3: N u m b er of Jacobian calculations required for convergence and sim ula tion runtim e for different blocking schemes for th e 10 diode NLTL circuit w ith an in p u t voltage of 3 v olts. N /C denotes no convergence. A nother im p o rta n t fact is th a t the diodes in th e circuit are not being driven into forw ard bias at th ese in p u t levels, nor is th e reverse bias large enough to result in 103 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 420 0 .0 4 400 ' 0.035 380 0.03 3 360 ffl § 340 < 0.025 0.02 E—1 2 320 D gg 0.015 300 0.01 280 O 0.005 260 240 0 0.1 0.2 0.3 0.4 0.6 0.5 0.7 0.8 0.9 MATRIX SPARSITY Figure 5.2.3: S im ulation runtim e (solid line) and ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of all Jacobian entries calcu lated (dashed line) versus m atrix sp arsity for th e 10 diode NLTL. T h e input AC voltage level is 3 volts. 104 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. diode breakdow n. In stead , th e diodes are m erely actin g as nonlinear capacitances w ith a nonlinear current-voltage relationship. N evertheless, even w ith a low AC in p u t voltage of two or th re e volts, the m ost sparse preconditioners were not able to provide sim ulation convergence. This indicates th a t there is enough frequency coupling even at these low in p u t levels to affect th e convergence of th e sim ulator. 5.2.2 High input voltage W hen th e inp u t voltage was increased to 4 volts, th e sam e effects associated with increasing th e in p u t voltage level were seen. T h e ra tio of th e m agnitude of th e un used Jacobian entries to th e to ta l m agnitude of all Jaco b ian entries continued to rise m ore quickly w ith an increase in m atrix sparsity. F igure 5.2.4 and Table 5.2.4 show th a t at 4 volts, th e level 3 off-diagonal preconditioner is no longer the schem e which retains the m ost Jacobian inform ation. In fact, this is the preconditioner which perform s best a t this in p u t level, due to a sm aller n u m b er of Jacobian evaluations th a n any other preconditioner. T he 4 block diagonal preconditioner rem ains the scheme which uses th e least am ount of Jacobian inform ation, and it provides a 40.7 percent ru n tim e im provem ent. At th e in p u t level of 5 volts, the off-diagonal preconditioning schem es seem to work best. T h e best ru n tim e im provem ent was o b tain ed by th e level one off-diagonal preconditioner, b u t th e level tw o and level th ree off-diagonal preconditioners also perform ed well, as seen in F igure 5.2.5 and Table 5.2.5. Note th e relatively high 105 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Schem e M atrix Sparsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1 0 6 2 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 6 6 403 289 239 N /C N /C N /C N /C 333 185 28 30 30 N /C N /C N /C N /C 35 N /A 28.3 40.7 N /C N /C N /C N /C 17.4 54.1 4 8 16 32 ol o2 o3 N /C N /C N /C N /C 7 4 21 Table 5.2.4: N um ber of Jacobian calculations required for convergence an d simula tio n ru n tim e for different blocking schemes for th e 10 diode NLTL circuit w ith an in p u t voltage of 4 volts. N /C denotes no convergence. 0.06 c/: S P 400 0.05 350 P < 0:5 o.o4 r* >* 05 300 0.03 z O 05 O § Z < 0.02 250 ^ o o.oi 200 ie;n u < 0 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 M A T R IX SPA R SITY F igure 5.2.4: Sim ulation ru n tim e (solid line) and ratio of the m agnitude of all unused Jaco b ian entries to th e m agnitude of all Jacobian entries calculated (dashed line) versus m a trix sparsity for th e 10 diode NLTL. T h e in p u t AC voltage level is 4 volts. 106 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. values of th e Jaco b ian ratios for these preconditioners. At th e higher in p u t levels, these ratios begin to have a relationship w ith ru n tim e th a t is d irectly proportional as opposed to th e inversely proportional relationship for low input voltages. O f course th e notable exception occurs for th e level one off-diagonal preconditioning scheme, which also produced th e fastest runtim e. Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R u n tim e (cp u secs) N um ber of Iterations R u n tim e Im provem ent ( %) 1 0 2 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 7 7 7 N /C N /C N /C 3 469 339 282 N /C N /C N /C 140 287 248 31 38 38 N /C N /C N /C 30 32 29 N /A 27.7 39.9 N /C N /C N /C 70.1 38.8 47.1 4 8 16 32 ol o2 o3 6 5 Table 5.2.5: N um ber of Jacobian calculations required for convergence and sim ula tio n runtim e for different blocking schem es for th e 10 diode NLTL circuit w ith an in p u t voltage of 5 volts. N /C denotes no convergence. Once th e in p u t voltage was raised to a level of 6 volts, it becam e clear th a t th e iterativ e linear ap p ro ach was nearing its lim its in its ability to provide sim ulation convergence. It was very difficult to find a preconditioner th a t was accu rate enough to provide convergence an d sim ple enough to provide runtim e im provem ent. All of th e previously used preconditioning schem es eith er did not provide convergence or provided a m uch slower runtim e th a n n o t using a preconditioner a t all, as shown in Figure 5.2.6 an d Table 5.2.6. H igher level off-diagonal schem es were used until one was found w hich provided convergence a t a faster runtim e. T h e level eight off107 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 500 450 0.2 400 ^ W 2 P z p 350 O p < 06 0.15 £ E- Z 300 w 0.1 06 250 z <H CO o o 200 0.05 < 150 100 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M A TR IX SPA R SITY Figure 5.2.5: S im ulation runtim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to the m agnitude of all Jacobian entries calculated (dashed line) versus m a trix sp arsity for th e 10 diode NLTL. T he input AC voltage level is 5 volts. Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1 2 0 4 103 72 131 N /C N /C N /C N /C N /C 5 482 4911 2860 4484 N /C N /C N /C N /C N /C 290 42 341 291 404 N /C N /C N /C N /C N /C 31 N /A -919. -493. -8.30 N /C N /C N /C N /C N /C 39.8 4 8 16 32 ol o2 o3 08 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 0.5362 Table 5.2.6: N u m b er of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schem es for th e 10 diode NLTL circuit w ith an input voltage of 6 volts. N /C denotes no convergence. 108 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 0.25 5000 4500 0.2 400 0 < (X >« 0.15 £ § 0.1 2000 < OQ 1500 o .u 0.05 1000 500 0 0.1 0.2 0 .3 0.4 0.6 0.5 0 .7 0.8 0.9 MATRIX SPARSITY F igure 5.2.6: Sim ulation ru ntim e (solid line) and ratio of th e m agnitude of all unused Jacobian entries to th e m agnitude of all Jacobian entries calculated (dashed line) versus m atrix sparsity for th e 10 diode NLTL. T he input AC voltage level is 6 volts. 109 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. diagonal schem e provided convergence at a 40 percent faster ru n tim e th a n not using a preconditioner. T he level of sparsity for this preconditioner is very close to the sp arsity level of the two block diagonal preconditioner, yet th e ru n tim es obtained by th e tw o schemes are vastly different. Also, th e ratio of Jacobian entries increases fairly rap id ly in changing from th e two block diagonal to the level eight off-diagonal preconditioner. While b oth of th ese preconditioners consider some level of frequency coupling, th e level eight off-diagonal preconditioner includes all frequency coupling betw een all circuit q u an itites w ithin eight harm onics of each other. T h e 2 block diagonal preconditioner, however, leaves out a m ajo rity of frequency coupling in the m iddle of th e outp u t sp ectrum . 5.3 Simulation of a Longer NLTL N ext th e length of th e NLTL was doubled to 20 diodes. T he circuit was sim ulated at th e sam e in put voltages as th e shorter line. For this circuit, convergence was not o b tain ed for th e higher in p u t voltages. As seen in Table 5.3.1 an d F igure 5.3.1, th e best perform ing preconditioners are th e 16 an d 32 block diagonal schem es. These two preconditioners require only three Jacobian calculations for convergence as com pared to four Jacobian calculations for th e o th er preconditioners. A dditionally, these are th e preconditioners w hich contain th e least am ount of Jacobian inform ation. Table 5.3.2 and Figure 5.3.2 show the perform ance of th e preconditioners when 110 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Scheme M atrix S p arsity N um ber of Jacobian Evaluations R untim e (cpu secs) N u m b er of Ite ratio n s R untim e Im provem ent (%) 1 2 0 4 4 4 4 3 3 4 4 4 1317 733 504 395 270 272 665 721 755 21 N /A 44.3 61.7 70.0 79.5 79.3 49.5 45.3 42.7 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 4 8 16 32 ol o2 o3 20 20 19 20 24 20 18 19 Table 5.3.1: N u m b er of Jacobian calculations required for convergence and sim ula tion ru n tim e for different blocking schem es for th e 20 diode NLTL circuit with an input voltage o f 1 volt. 1400 0.25 1200 0.2 O F 1000 0.15 F £ H Z w 800 0.1 z < S 600 o 0.05 400 200 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 u < 1 M ATRIX SPA R SITY Figure 5.3.1: S im u latio n runtim e(solid line) and ratio o f th e m agnitude of all unused Jacobian entries to th e m agnitude of all Jacobian en tries calculated(dashed line) versus m a trix sp a rsity for the 20 diode NLTL. T he in p u t AC voltage level is 1 volt. Ill Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Blocking Schem e M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations R untim e Im provem ent (%) 1 0 2 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 30 31 N /C N /C 5 11493 6584 N /C N /C 491 523 N /C N /C N /C 119 134 N /C N /C 13 31 N /C N /C N /C N /A 42.7 N /C N /C 95.7 95.4 N /C N /C N /C 4 8 16 32 ol o2 o3 6 N /C N /C N /C T able 5.3.2: N um ber of Jacobian calculations required for convergence and sim ula tio n ru n tim e for different blocking schemes for th e 20 diode NLTL circuit w ith an in p u t voltage of 2 volts. 0.4 12000 0.35 10000 ^ Vi s 8000 0.25 6000 > 0.2 z £3 0.15 < 4000 2000 0.05 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 M A TR IX SPARSITY F igure 5.3.2: S im ulation runtim e(solid line) and ratio of th e m agnitude of all unused Jaco b ian entries to th e m agnitude of all Jacobian entries calculated(dashed line) versus m a trix sp arsity for th e 20 diode NLTL. T h e in p u t AC voltage level is 2 volts. 112 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th e input to th e 20 diode NLTL is 2 volts. In this case, the 16 and 32 diagonal block schemes again perform ed best. T he perform ance im provem ent was quite d ram atic, w ith a 95 percent ru n tim e improvement observed for b oth preconditioners. T h e num ber of Jaco b ian calculations required for convergence was m uch smaller for th ese schemes. Blocking Scheme M atrix S parsity N um ber of Jacobian Evaluations R untim e (cpu secs) N um ber of Iterations Runtim e Improvem ent(% ) 1 2 0 41 39 N /C N /C N /C 7 N /C N /C N /C 15956 8356 N /C N /C N /C 634 N /C N /C N /C 185 174 N /C N /C N /C 40 N /C N /C N /C N /A 47.6 N /C N /C N /C 96.0 N /C N /C N /C 4 8 16 32 ol o2 o3 0.5 0.75 0.875 0.9375 0.96875 0.907 0.8506 0.7932 Table 5.3.3: N um ber of Jacobian calculations required for convergence and sim ula tion runtim e for different blocking schemes for the 20 diode NLTL circuit w ith an input voltage of 3 volts. As seen in T able 5.3.3, the 32 diagonal block preconditioner again provided the m ost dram atic ru n tim e improvement for a 3 volt input, b ut th e 16 diagonal block scheme did not provide sim ulation convergence. In fact, only th e 32 and 2 diagonal block schemes w ere able to solve the system of equations. T he Jacobian ratios for these schemes were much higher than for lower input sim ulations as show n in Figure 5.3.3. 113 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 16000 0.35 14000 0.3 12000 0.25 ^10000 Eti S r 3 8ooo H 0.2 z £ 0.15 6000 0.1 4000 H < >* H Z ffl z < S o u < 0.05 2000 0 0.1 0.2 0.3 0.4 0.5 0.6 0.7 0.8 0.9 1 MATRIX SPARSITY Figure 5.3.3: Sim ulation runtim e(solid line) and ratio of th e m ag n itu d e of all unused Jacobian entries to th e m agnitude of all Jacobian entries calculated(dashed line) versus m atrix sp arsity for the 20 diode NLTL. The in p u t AC voltage level is 3 volts. 114 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. 5.4 Summary Clearly, th e choice of preconditioner for th e inexact N ew ton m ethod can have a pro found effect upon sim ulation perform ance. At low in p u t drive levels, modest runtim e im provem ents were obtained from preconditioning, an d m ost preconditioners exam ined here e x h ib it sim ilar behavior. T his is because a t low drive levels, the Jacobian inform ation needed for sim ulation convergence is located w ithin th e diagonal blocks of th e m atrix . As th e drive level was increased, th e choice of preconditioners becam e m ore crucial. For th e d istrib u ted am plifier circu it, the 32 block diagonal preconditioner pro vided ru n tim e im provem ents a t all in p u t levels. A lthough th is method did not always provide th e best ru n tim e com pared to other m ethods, it was more consistent th a n th e o th e r m ethods an d was th e best perform er a t the highest drive level. At low in p u t pow er levels, the only ad vantage provided by th e preconditioners was th a t th e m atrix decom position was less expensive for th e diagonal block techniques th a n for th e off-diagonal techniques. As in p u t power was increased, m atrix decomposition tim e becam e less im portant, and th e best perform ing preconditioners were those th a t provided a b etter direction for th e quasi-N ew ton update. For th e 10 diode NLTL, however, th e 32 block diagonal preconditioner was one of th e w orst perform ers, presum ably because of th e strong nonlinearity of the cir cuit. Frequency coupling inform ation is com pletely ignored by this preconditioner. T h e o th er diagonal block preconditioners which contain some cross-frequency in- 115 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. form ation provided th e b est sim ulation perform ance for this circuit, especially at low power. The off-diagonal blocking techniques perform ed well a t low power but were not as effective as th e other diagonal blocking schemes. This is due to their stru c tu re which is not as efficiently handled as th e diagonal block stru ctu re. How ever, a t higher input levels, the off-diagonal preconditioners perform ed best, as the frequency coupling inform ation is more im p o rtan t. Block diagonal m ethods are not as effective for frequency coupling because th ey consider th e coupling unevenly. T h e 20 diode NLTL was sim ulated only a t low in p u t levels. It is apparent that frequency coupling does not play as large a p a rt in the sim ulation convergence for this circuit at low input levels, as the 32 an d 16 block diagonal preconditioners perform ed best. T he perform ance im provem ents provided by the preconditioners in this case were much g reater th a n for smaller circuits. T h e inexact Newton m ethod w ith a preconditioned Jacobian m atrix proved most effective a t lower input levels and with th e largest circuits investigated. W hile run tim e im provem ents were seen for the distrib u ted am plifier circuit, th e exact Newton m eth o d exam ined in C h ap ter 4 still provided th e fastest sim ulation runtim es. Of course, this is to be expected since the inexact N ew ton m ethod is known to be best suited for larger circuits. T he inexact Newton m ethod also failed to converge for th e largest circuit investigated, th e 47 diode NLTL discussed in C h ap ter 4. 116 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Chapter 6 Conclusions and Future Research Harmonic balance analysis has been shown to be an effective technique for steadystate analysis of a wide variety of microwave circuits. This study has shown that m any of th e com putational drawbacks of th e harm o n ic balance technique can be reduced through judicious choice of num erical techniques. 6.1 Discussion In C hapter 3, Jacobian m atrix approxim ations for Newton-Raphson based simu lation were developed. These approxim ations also a re used as preconditioners for iterative linear solvers used in inexact Newton solvers. The Jacobian approxim a tions take advantage of th e special block s tru c tu re of th e m atrix which allows the inform ation associated w ith different pairs of frequencies to be considered separately. It was shown in C h ap ter 4 th a t when approxim ation techniques can be used, the m ost effective approxim ations are often those w hich lead to the use of th e least am ount of Jacobian inform ation, especially a t low drive levels. Using a diagonal blocking schem e w ith th e largest possible num ber of blocks frequently proved to be the m ost efficient approxim ation. Not only does this sim plify th e m atrix decom posi tion, b ut it forces th e Newton u p d ate direction to focus on the relationships between 117 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. th e error function contributions a n d s ta te variables a t th e sam e frequency. T hese are often th e m ost im p o rtan t relationships between th e s ta te variables and th e error funciton. As th e drive level increases, off-diagonal blocks of th e Jacobian becom e m ore im p o rtan t a n d m ust be included in order to o b tain convergence. For th e 47 diode nonlinear transm ission line, it was also shown th a t Jacobian approxim ations can reduce th e n u m b er of analysis frequencies required for convergence. This results in fewer s ta te variables and thus a m uch sm aller Jacobian m atrix. In C h ap ter 5, th e preconditioners used for an ite ra tiv e linear solver were also block-based, again w ith the m ost effective preconditioners being those which used the sam e-frequency diagonal blocks of th e Jacobian. In cases of higher drive levels, where this preconditioner did not provide sim ulation convergence, preconditioners which use some cross-frequency inform ation proved useful. T h e iterative linear solver proved not to be as effective for sim ulating at high drive levels, but when th ere are a large num ber o f unknown variables a t low to m o d erate input levels, this tech nique proved su p erio r to the ex act Newton m ethod. T able 6.1.1 shows a ru n tim e com parison for th e 20 diode NLTL excited at 1, 2, and 3 volts. Thirty-tw o analysis frequencies were used in the sim ulation, resulting in a Jaco b ian m atrix of rank 1260. Clearly th ere is a distinct ad vantage to using an ite ra tiv e linear approach for larger circuits, b u t th e ability of th is m ethod to provide convergence a t high drive levels is suspect. As th e drive level increases and th e ite ra tiv e linear approach be comes less effective, th e runtim e ra tio between the two techniques decreases slightly. 118 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. In p u t V oltage 1 2 3 E xact New ton R untim e 964 1069 1223 Inexact Newton R untim e 270 491 634 T ab le 6.1.1: R untim e required for harm onic balance sim ulation of a 20 diode NLTL u sing th e exact N ew ton m ethod and th e inexact Newton m eth o d . T he runtim es given are the best o b tain ed from all available Jacobian approxim ations and precon d itioners. 6.2 Suggestions for Further Research T h is stu d y shows th e im p ortance of the choice of Jacobian approxim ations and pre conditioners for harm onic balance sim ulation. For the circuits a t hand, th e diagonal blocking techniques in w hich the num ber of blocks is either th e sam e or half as m uch as th e num ber of analysis frequencies have been shown to be th e best choice, espe cially a t lower drive levels. As the drive level is increased, th e off-diagonal blocks becom e more helpful. T hese blocking techniques should be applied to a wider vari e ty of microwave circuits to verify the generality of these conclusions. O ne circuit of p a rtic u la r interest is a grid am plifier for quasi-optical power com bining applications. T h is is a distributed circuit w ith a large n u m b er of nonlinear devices, providing a larg e Jacobian m atrix w ith m any nodes per frequency. W ith two s ta te variables p er n o n lin ear device, th e individual blocks of th e Jacobian will be larger an d m ore dense th a n th e nonlinear transm ission lines discussed in this work, b u t th e preconditioning techniques should still be q u ite effective. A ny additional frequency coupling m ay be h an d led by using off-diagonal blocks. 119 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. Also, m o d em digitally m odulated com m unications circuits are of in terest. A nal ysis of th ese circuits requires a very large num ber of analysis frequencies, producing a correspondingly large Jacobian m a trix . Recent publications [21], [22] have shown th at K rylov subspace techniques are particularly effective in sim ulating these cir cuits, b u t little is known about how to choose an appropriate preconditioner. T h e high-pow er sim ulation of th e 47 diode NLTL required continuation m ethods in order for sim ulation convergence to be achieved. T he continuation m eth o d im ple m ented in TR A N SIM uses a reduced frequency spectrum a t low input power levels. As th e in p u t level increases, th e n u m b er of analysis frequencies used also increases. This process could be improved by changing the Jacobian approxim ation m ethod during continuation. For low input levels, the approxim ation should be th e diagonal block Jaco b ian w ith th e largest possible num ber of blocks, i.e., the sam e num ber of blocks as analysis frequencies. As th e drive level increases along w ith th e num ber of analysis frequencies, the off-diagonal blocks could th en be used. Eventually, a large n u m b er of off-diagonal blocks will become necessary before th e sim ulation is com plete. 120 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. References [1] R. Gilm ore and M .B .Steer, “N onlinear circuit analysis using th e m ethod of harm onic balance - a review of th e a r t,” International Jo u rn al of Microwave and M illim eter-W ave C om puter-A ided Engineering. Vol. 1, No. 1, 1991, pp. 22-37. [2] V. Rizzoli and A. N eri, “State of th e A rt and P resent Trends in Nonlinear Microwave CAD Techniques,” IEE E Transactions on Microwave Theory and Techniques, Vol. 36, February 1988, pp. 343-65. [3] R. Hicks and P. K han, “Numerical analysis of nonlinear solid-state device ex citation in microwave circuits,” IE E E T ransactions on M icrowave Theory and Techniques, Vol. 30, M arch 1982 pp. 251-9. [4] A. Kerr, “Noise a n d loss in balanced and subharm onically pum ped mixers: part 1 - theory,” IE E E Transactions on Microwave Theory and Techniques, Vol. 27, Decem ber 1979, pp. 938-43. [5] G. Cam acho-Penalosa, “Num erical stead y -state analysis of nonlinear microwave circuits w ith periodic excitation,” IE E E T ransactions on Microwave Theory and Techniques, Vol. 31, Septem ber 1983, pp. 724-30. [6 ] A. Kerr, “A technique for determ ining th e local oscillator waveforms in a m i crowave m ixer,” IE E E Transactions on Microwave T heory and Techniques, Vol. 23, O ctober 1975, pp. 828-31. [7] P.L. Heron and M .B . Steer, “Jacobian C alculation Using th e M ultidim ensional Fast Fourier Transform in the H arm onic Balance Analysis of N onlinear Cir cuits,” IE E E T ransactions on Microwave T heory and Techniques, Vol. 38, April 1990, pp. 429-31. [8 ] C.R. Chang, P.L. H eron, and M .B. S teer, “Harm onic balance and frequency dom ain sim ulation of nonlinear microwave circuits using th e block Newton m ethod,” IE E E T ransactions on Microwave Theory and Techniques, Vol. 38, April 1990, pp. 431-4. [9] P.L. Heron, C.R. C hang, and M.B. Steer, “Control of A liasing in the Harmonic Balance S im ulation of N onlinear M icrowave C ircuits,” 1989 IE E E M TT-S In ternational Sym posium Digest, Ju n e 1989, pp. 355-358. [10] V. Rizzoli et al., “T h e exploitation of sparse m atrix techniques in conjunction w ith the piecewise harm onic-balance m ethod for nonlinear microwave circuit analysis.” 1990 IE E E M TT-S In tern atio n al Sym posium D igest, Ju n e 1990, pp. 1295-1298. [11] H. Yeager and R. D u tto n , “Im provem ent in norm -reducing Newton m ethods for circuit sim ulation,” IE E E Transactions on C om puter A ided Design of Inte grated C ircuits an d System s, May 1989, pp. 538-546. [12] V. Rizzoli and A. N eri, “Expanding th e pow er-handling capabilities of harm onic-balance analysis by a param etric form ulation of th e M ESFET m odel,” Electronics L etters, Vol. 26, A ugust 16, 1990, pp. 1359-1361. 121 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. [13] V. Rizzoli e t al., “A highly efficient p-n ju n c tio n m odel for use in harm onicbalance sim u latio n ,” 19th European M icrow ave Conference, 1989, pp. 979-984. [14] V. Rizzoli e t al., “S tate-of-the-art h arm onic-balance sim ulation of forced non linear m icrow ave circuits by the piecew ise tech n iq u e,” IEEE T ransactions on Microwave T h e o ry and Techniques, Vol. 40, pp 12-28, January 1992. [15] V. Rizzoli et al., “A hierarchical haxm onic-balance technique for th e effi cient sim u latio n o f large-size nonlinear m icrow ave circuits,” 25th European Microwave C onference, 1995, pp. 615-619. [16] R. Melville, P. Feldm ann, J. Roychow dhury, “Efficient m ulti-tone distortion analysis of analog in tegrated circuits,” IE E E 1995 C ustom In te g rated Circuits Conference, pp. 241-244. [17] P. Feldm ann, B . M elville, D. Long, “Efficient frequency dom ain analysis of large nonlinear analog circuits,” 1996 IE E E M T T -S International Microwave Sym posium D igest, Ju n e 1996, pp. 461-464. [18] V. Rizzoli e t al., “H arm onic-balance sim u latio n of strongly nonlinear very largesize microwave circuits by inexact N ew ton m eth o d s,” 1996 IE E E M TT-S In ternational S ym posium Digest, Ju n e 1996, pp. 1357-1360. [19] R. Freund, G. G olub, and N. N achtigal, “Ite ra tiv e solution of lin ear system s,” A cta N um erical, 1991, pp. 57-100. [20] Y. Saad a n d M. Schultz, “GMRES: a generalized m inim al residual m ethod for solving n o n sy m m etric linear system s,” SIA M Jo u rn al of Scientific S tatistical C om puting, Vol 7, Ju ly 1986, pp. 856-869. [21] V. Rizzoli e t al., “Nonlineax processing of digitally m odulated carriers by the inexact-N ew ton harm onic-balance tech n iq u e,” Electronics Leters, Vol. 33, Oc tober 9th, 1997, p p . 1760-1761. [22] V. Rizzoli et al., “F ast and robust in ex act N ew ton approach to th e harm onicbalance analysis o f nonlinear microwave circ u its,” IE E E Microwave and G uided Wave L etters, Vol. 7, O ctober 1997, p p .359-361. [23] R. Telichevesky, K. K undert, I. Elfadel, a n d J . W hite, “Fast sim ulation algo rithm s for R F C ircu its,” IEEE 1996 C u sto m In teg rated C ircuits Conference, pp. 437-444. [24] K. Eickhoff an d W . Engl, “Levelized in co m p lete LU factorization and its ap plication to large-scale circuit sim u latio n ,” IE E E Transactions on C om puterAided Design o f In te g rated Circuits a n d System s, Vol. 14, Ju n e 1995, pp. 720727. [25] M. Case, “N onlineax Transmission Lines for Picosecond Pulse, Im pulse and M illim eter-W ave H arm onic G eneration,” P h D dissertation, U niversity of Cali fornia a t S a n ta C lara, 1993. [26] C. Christoffersen, M . Ozkar, et al., “S ta te V ariable-B ased T ransient Analysis Using C onvolution,” accepted fo r publication. 122 Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. IMAGE EVALUATION TEST TARGET (Q A -3 ) ✓ ✓ * ' 'S V- 1 2 .8 1.0 152 13.2 IIIIM [j 2.2 1£& U£ l.l 12.0 1.8 1.25 L4 1.6 150mm 03 A P P L IE D A IIVMGE . Inc — - = 1653 E ast Main Street R ochester, NY 14609 USA Phone: 716/482-0300 Fax: 716/288-5989 0 1993, Applied Image. Inc.. All Rights Reserved Reproduced with permission of the copyright owner. Further reproduction prohibited without permission. / r

1/--страниц