close

Вход

Забыли?

вход по аккаунту

?

Shewchuk J. R. An Introduction to the Conjugate Gradient Method Without the Agonizing Pain 1994

код для вставкиСкачать
AnIntroductionto
theConjugateGradientMethod
WithouttheAgonizingPain
Edition1
1
4
JonathanRichardShewchuk
August4,1994
SchoolofComputerScience
CarnegieMellonUniversity
Pittsburgh,PA15213
Abstract
TheConjugateGradientMethodisthemostprominentiterativemethodforsolvingsparsesystemsoflinearequations.
Unfortunately,manytextbooktreatmentsofthetopicarewrittenwithneitherillustrationsnorintuition,andtheir
victimscanbefoundtothisdaybabblingsenselesslyinthecornersofdustylibraries.Forthisreason,adeep,
geometricunderstandingofthemethodhasbeenreservedfortheelitebrilliantfewwhohavepainstakinglydecoded
themumblingsoftheirforebears.Nevertheless,theConjugateGradientMethodisacompositeofsimple,elegantideas
thatalmostanyonecanunderstand.Ofcourse,areaderasintelligentasyourselfwilllearnthemalmosteffortlessly.
TheideaofquadraticformsisintroducedandusedtoderivethemethodsofSteepestDescent,ConjugateDirections,
andConjugateGradients.EigenvectorsareexplainedandusedtoexaminetheconvergenceoftheJacobiMethod,
SteepestDescent,andConjugateGradients.OthertopicsincludepreconditioningandthenonlinearConjugateGradient
Method.Ihavetakenpainstomakethisarticleeasytoread.Sixty-sixillustrationsareprovided.Denseproseis
avoided.Conceptsareexplainedinseveraldifferentways.Mostequationsarecoupledwithanintuitiveinterpretation.
SupportedinpartbytheNaturalSciencesandEngineeringResearchCouncilofCanadaundera1967ScienceandEngineering
ScholarshipandbytheNationalScienceFoundationunderGrantASC-9318163.Theviewsandconclusionscontainedinthis
documentarethoseoftheauthorandshouldnotbeinterpretedasrepresentingtheofficialpolicies,eitherexpressorimplied,of
NSERC,NSF,ortheU.S.Government.
Keywords:conjugategradientmethod,preconditioning,convergenceanalysis,agonizingpain
Contents
1.Introduction1
2.Notation1
3.TheQuadraticForm2
4.TheMethodofSteepestDescent6
5.ThinkingwithEigenvectorsandEigenvalues9
5.1.EigendoitifItry
9
5.2.Jacobiiterations
11
5.3.AConcreteExample
12
6.ConvergenceAnalysisofSteepestDescent13
6.1.InstantResults
13
6.2.GeneralConvergence
17
7.TheMethodofConjugateDirections21
7.1.Conjugacy
21
7.2.Gram-SchmidtConjugation
25
7.3.OptimalityoftheErrorTerm
26
8.TheMethodofConjugateGradients30
9.ConvergenceAnalysisofConjugateGradients32
9.1.PickingPerfectPolynomials
33
9.2.ChebyshevPolynomials
35
10.Complexity37
11.StartingandStopping38
11.1.Starting
38
11.2.Stopping
38
12.Preconditioning39
13.ConjugateGradientsontheNormalEquations41
14.TheNonlinearConjugateGradientMethod42
14.1.OutlineoftheNonlinearConjugateGradientMethod
42
14.2.GeneralLineSearch
43
14.3.Preconditioning
47
ANotes48
BCannedAlgorithms49
B1.SteepestDescent
49
B2.ConjugateGradients
50
B3.PreconditionedConjugateGradients
51
i
B4.NonlinearConjugateGradientswithNewton-RaphsonandFletcher-Reeves
52
B5.PreconditionedNonlinearConjugateGradientswithSecantandPolak-Ribi
`
ere
53
CUglyProofs54
C1.TheSolutionto
MinimizestheQuadraticForm
54
C2.ASymmetricMatrixHas
OrthogonalEigenvectors.
54
C3.OptimalityofChebyshevPolynomials
55
DHomework56
ii
AboutthisArticle
AnelectroniccopyofthisarticleisavailablebyanonymousFTPtoWARP.CS.CMU.EDU(IPaddress
128.2.209.103)underthefilenamequake-papers/painless-conjugate-gradient.ps.A
PostScriptfilecontainingfull-pagecopiesofthefiguresherein,suitablefortransparencies,isavailable
asquake-papers/painless-conjugate-gradient-pics.ps.Mostoftheillustrationswere
createdusingMathematica.
c
1994byJonathanRichardShewchuk.Thisarticlemaybefreelyduplicatedanddistributedsolongas
noconsiderationisreceivedinreturn,andthiscopyrightnoticeremainsintact.
ThisguidewascreatedtohelpstudentslearnConjugateGradientMethodsaseasilyaspossible.Please
mailme(jrs@cs.cmu.edu)comments,corrections,andanyintuitionsImighthavemissed;someof
thesewillbeincorporatedintoasecondedition.Iamparticularlyinterestedinhearingaboutuseofthis
guideforclassroomteaching.
Forthosewhowishtolearnmoreaboutiterativemethods,IrecommendWilliamL.Briggs’“AMultigrid
Tutorial”[2],oneofthebest-writtenmathematicalbooksIhaveread.
SpecialthankstoOmarGhattas,whotaughtmemuchofwhatIknowaboutnumericalmethods,and
providedmewithextensivecommentsonthefirstdraftofthisarticle.ThanksalsotoJamesEpperson,
DavidO’Hallaron,JamesStichnoth,NickTrefethen,andDanielTunkelangfortheircomments.
Tohelpyouskipchapters,here’sadependencegraphofthesections:
1 Introduction
2 Notation
10 Complexity
13 Normal Equations
3 Quadratic Form
4 Steepest Descent5 Eigenvectors
6 SD Convergence7 Conjugate Directions
8 Conjugate Gradients
9 CG Convergence
11 Stop & Start12 Preconditioning14 Nonlinear CG
ThisarticleisdedicatedtoeverymathematicianwhousesfiguresasabundantlyasIhaveherein.
iii
iv
1.Introduction
WhenIdecidedtolearntheConjugateGradientMethod(henceforth,CG),Ireadfourdifferentdescriptions,
whichIshallpolitelynotidentify.Iunderstoodnoneofthem.Mostofthemsimplywrotedownthemethod,
thenproveditspropertieswithoutanyintuitiveexplanationorhintofhowanybodymighthaveinventedCG
inthefirstplace.Thisarticlewasbornofmyfrustration,withthewishthatfuturestudentsofCGwilllearn
arichandelegantalgorithm,ratherthanaconfusingmassofequations.
CGisthemostpopulariterativemethodforsolvinglargesystemsoflinearequations.CGiseffective
forsystemsoftheform
(1)
where
isanunknownvector,
isaknownvector,and
isaknown,square,symmetric,positive-definite
(orpositive-indefinite)matrix.(Don’tworryifyou’veforgottenwhat“positive-definite”means;weshall
reviewit.)Thesesystemsariseinmanyimportantsettings,suchasfinitedifferenceandfiniteelement
methodsforsolvingpartialdifferentialequations,structuralanalysis,circuitanalysis,andmathhomework.
IterativemethodslikeCGaresuitedforusewithsparsematrices.If
isdense,yourbestcourseof
actionisprobablytofactor
andsolvetheequationbybacksubstitution.Thetimespentfactoringadense
isroughlyequivalenttothetimespentsolvingthesystemiteratively;andonce
isfactored,thesystem
canbebacksolvedquicklyformultiplevaluesof
.Comparethisdensematrixwithasparsematrixof
largersizethatfillsthesameamountofmemory.Thetriangularfactorsofasparse
usuallyhavemany
morenonzeroelementsthan
itself.Factoringmaybeimpossibleduetolimitedmemory,andwillbe
time-consumingaswell;eventhebacksolvingstepmaybeslowerthaniterativesolution.Ontheother
hand,mostiterativemethodsarememory-efficientandrunquicklywithsparsematrices.
Iassumethatyouhavetakenafirstcourseinlinearalgebra,andthatyouhaveasolidunderstanding
ofmatrixmultiplicationandlinearindependence,althoughyouprobablydon’trememberwhatthose
eigenthingieswereallabout.Fromthisfoundation,IshallbuildtheedificeofCGasclearlyasIcan.
2.Notation
Letusbeginwithafewdefinitionsandnotesonnotation.
Withafewexceptions,Ishallusecapitalletterstodenotematrices,lowercaseletterstodenotevectors,
andGreekletterstodenotescalars.
isan
matrix,and
and
arevectors—thatis,
1matrices.
Writtenoutfully,Equation1is
11
12
1
21
22
2
.
.
.
.
.
.
.
.
.
1
2
1
2
.
.
.
1
2
.
.
.
Theinnerproductoftwovectorsiswritten
,andrepresentsthescalarsum
1
.Notethat
.If
and
areorthogonal,then
0.Ingeneral,expressionsthatreduceto1
1matrices,
suchas
and
,aretreatedasscalarvalues.
2JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
3
1
2
2
2
2
1
6
2
8
Figure1:
Sampletwo-dimensionallinearsystem.Thesolutionliesattheintersectionofthelines.
Amatrix
ispositive-definiteif,foreverynonzerovector
,
0
(2)
Thismaymeanlittletoyou,butdon’tfeelbad;it’snotaveryintuitiveidea,andit’shardtoimaginehow
amatrixthatispositive-definitemightlookdifferentlyfromonethatisn’t.Wewillgetafeelingforwhat
positive-definitenessisaboutwhenweseehowitaffectstheshapeofquadraticforms.
Finally,don’tforgettheimportantbasicidentities
and
1
1
1
.
3.TheQuadraticForm
Aquadraticformissimplyascalar,quadraticfunctionofavectorwiththeform
1
2
(3)
where
isamatrix,
and
arevectors,and
isascalarconstant.Ishallshowshortlythatif
issymmetric
andpositive-definite,
isminimizedbythesolutionto
.
Throughoutthispaper,Iwilldemonstrateideaswiththesimplesampleproblem
32
26
2
8
0
(4)
Thesystem
isillustratedinFigure1.Ingeneral,thesolution
liesattheintersectionpoint
of
hyperplanes,eachhavingdimension
1.Forthisproblem,thesolutionis
2
2
.The
correspondingquadraticform
appearsinFigure2.Acontourplotof
isillustratedinFigure3.
TheQuadraticForm3
-4
-2
0
2
4
6
-6
-4
-2
0
2
4
0
50
100
150
-2
0
2
4
6
1
2
1
Figure2:
Graphofaquadraticform
.Theminimumpointofthissurfaceisthesolutionto
.
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
Figure3:
Contoursofthequadraticform.Eachellipsoidalcurvehasconstant
.
4JonathanRichardShewchuk
-4
-2
2
4
6
-8
-6
-4
-2
2
4
6
1
2
Figure4:
Gradient
ofthequadraticform.Forevery
,thegradientpointsinthedirectionofsteepest
increaseof
,andisorthogonaltothecontourlines.
Because
ispositive-definite,thesurfacedefinedby
isshapedlikeaparaboloidbowl.(I’llhavemore
tosayaboutthisinamoment.)
Thegradientofaquadraticformisdefinedtobe
1
2
.
.
.
(5)
Thegradientisavectorfieldthat,foragivenpoint
,pointsinthedirectionofgreatestincreaseof
.
Figure4illustratesthegradientvectorsforEquation3withtheconstantsgiveninEquation4.Atthebottom
oftheparaboloidbowl,thegradientiszero.Onecanminimize
bysetting
equaltozero.
Withalittlebitoftediousmath,onecanapplyEquation5toEquation3,andderive
1
2
1
2
(6)
If
issymmetric,thisequationreducesto
(7)
Settingthegradienttozero,weobtainEquation1,thelinearsystemwewishtosolve.Therefore,the
solutionto
isacriticalpointof
.If
ispositive-definiteaswellassymmetric,thenthis
TheQuadraticForm5
(c)
1
2
1
(d)
1
2
1
(a)
1
2
1
(b)
1
2
1
Figure5:
(a)Quadraticformforapositive-definitematrix.(b)Foranegative-definitematrix.(c)Fora
singular(andpositive-indefinite)matrix.Alinethatrunsthroughthebottomofthevalleyisthesetof
solutions.(d)Foranindefinitematrix.Becausethesolutionisasaddlepoint,SteepestDescentandCG
willnotwork.Inthreedimensionsorhigher,asingularmatrixcanalsohaveasaddle.
solutionisaminimumof
,so
canbesolvedbyfindingan
thatminimizes
.(If
isnot
symmetric,thenEquation6hintsthatCGwillfindasolutiontothesystem
1
2
.Notethat
1
2
issymmetric.)
Whydosymmetricpositive-definitematriceshavethisniceproperty?Considertherelationshipbetween
atsomearbitrarypoint
andatthesolutionpoint
1
.FromEquation3onecanshow(AppendixC1)
thatif
issymmetric(beitpositive-definiteornot),
1
2
(8)
If
ispositive-definiteaswell,thenbyInequality2,thelattertermispositiveforall
.Itfollowsthat
isaglobalminimumof
.
Thefactthat
isaparaboloidisourbestintuitionofwhatitmeansforamatrixtobepositive-definite.
If
isnotpositive-definite,thereareseveralotherpossibilities.
couldbenegative-definite—theresult
ofnegatingapositive-definitematrix(seeFigure2,butholditupside-down).
mightbesingular,inwhich
casenosolutionisunique;thesetofsolutionsisalineorhyperplanehavingauniformvaluefor
.If
isnoneoftheabove,then
isasaddlepoint,andtechniqueslikeSteepestDescentandCGwilllikelyfail.
Figure5demonstratesthepossibilities.Thevaluesof
and
determinewheretheminimumpointofthe
paraboloidlies,butdonotaffecttheparaboloid’sshape.
Whygotothetroubleofconvertingthelinearsystemintoatougher-lookingproblem?Themethods
understudy—SteepestDescentandCG—weredevelopedandareintuitivelyunderstoodintermsof
minimizationproblemslikeFigure2,notintermsofintersectinghyperplanessuchasFigure1.
6JonathanRichardShewchuk
4.TheMethodofSteepestDescent
InthemethodofSteepestDescent,westartatanarbitrarypoint
0
andslidedowntothebottomofthe
paraboloid.Wetakeaseriesofsteps
1
2
untilwearesatisfiedthatwearecloseenoughtothe
solution
.
Whenwetakeastep,wechoosethedirectioninwhich
decreasesmostquickly,whichisthedirection
opposite
.AccordingtoEquation7,thisdirectionis
.
Allowmetointroduceafewdefinitions,whichyoushouldmemorize.Theerror
isa
vectorthatindicateshowfarwearefromthesolution.Theresidual
indicateshowfarwe
arefromthecorrectvalueof
.Itiseasytoseethat
,andyoushouldthinkoftheresidualas
beingtheerrortransformedby
intothesamespaceas
.Moreimportantly,
,andyou
shouldalsothinkoftheresidualasthedirectionofsteepestdescent.Fornonlinearproblems,discussedin
Section14,onlythelatterdefinitionapplies.Soremember,wheneveryouread“residual”,think“direction
ofsteepestdescent.”
Supposewestartat
0
2
2
.Ourfirststep,alongthedirectionofsteepestdescent,willfall
somewhereonthesolidlineinFigure6(a).Inotherwords,wewillchooseapoint
1
0
0
(9)
Thequestionis,howbigastepshouldwetake?
Alinesearchisaprocedurethatchooses
tominimize
alongaline.Figure6(b)illustratesthistask:
wearerestrictedtochoosingapointontheintersectionoftheverticalplaneandtheparaboloid.Figure6(c)
istheparaboladefinedbytheintersectionofthesesurfaces.Whatisthevalueof
atthebaseofthe
parabola?
Frombasiccalculus,
minimizes
whenthedirectionalderivative
1
isequaltozero.Bythe
chainrule,
1
1
1
1
0
.Settingthisexpressiontozero,wefindthat
shouldbechosensothat
0
and
1
areorthogonal(seeFigure6(d)).
Thereisanintuitivereasonwhyweshouldexpectthesevectorstobeorthogonalattheminimum.
Figure7showsthegradientvectorsatvariouspointsalongthesearchline.Theslopeoftheparabola
(Figure6(c))atanypointisequaltothemagnitudeoftheprojectionofthegradientontotheline(Figure7,
dottedarrows).Theseprojectionsrepresenttherateofincreaseof
asonetraversesthesearchline.
is
minimizedwheretheprojectioniszero—wherethegradientisorthogonaltothesearchline.
Todetermine
,notethat
1
1
,andwehave
1
0
0
1
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
0
TheMethodofSteepestDescent7
0.2
0.4
0.6
20
40
60
80
100
120
140
-4
-2
2
4
6
-6
-4
-2
2
4
-4
-2
2
4
6
-6
-4
-2
2
4
-2.5
0
2.5
5
-5
-2.5
0
2.5
0
50
100
150
-2.5
0
2.5
5
(c)
1
(d)
2
1
1
(a)
2
0
0
(b)
1
2
1
Figure6:
ThemethodofSteepestDescent.(a)Startingat
2
2
,takeastepinthedirectionofsteepest
descentof
.(b)Findthepointontheintersectionofthesetwosurfacesthatminimizes
.(c)Thisparabola
istheintersectionofsurfaces.Thebottommostpointisourtarget.(d)Thegradientatthebottommostpoint
isorthogonaltothegradientofthepreviousstep.
-2
-1
1
2
3
-3
-2
-1
1
2
1
2
Figure7:
Thegradient
isshownatseverallocationsalongthesearchline(solidarrows).Eachgradient’s
projectionontothelineisalsoshown(dottedarrows).Thegradientvectorsrepresentthedirectionof
steepestincreaseof
,andtheprojectionsrepresenttherateofincreaseasonetraversesthesearchline.
Onthesearchline,
isminimizedwherethegradientisorthogonaltothesearchline.
8JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
0
Figure8:
Here,themethodofSteepestDescentstartsat
2
2
andconvergesat
2
2
.
Puttingitalltogether,themethodofSteepestDescentis:
(10)
(11)
1
(12)
TheexampleisrununtilitconvergesinFigure8.Notethezigzagpath,whichappearsbecauseeach
gradientisorthogonaltothepreviousgradient.
Thealgorithm,aswrittenabove,requirestwomatrix-vectormultiplicationsperiteration.Thecomputa-
tionalcostofSteepestDescentisdominatedbymatrix-vectorproducts;fortunately,onecanbeeliminated.
BypremultiplyingbothsidesofEquation12by
andadding
,wehave
1
(13)
AlthoughEquation10isstillneededtocompute
0
,Equation13canbeusedforeveryiterationthereafter.
Theproduct
,whichoccursinbothEquations11and13,needonlybecomputedonce.Thedisadvantage
ofusingthisrecurrenceisthatthesequencedefinedbyEquation13isgeneratedwithoutanyfeedbackfrom
thevalueof
,sothataccumulationoffloatingpointroundofferrormaycause
toconvergetosome
pointnear
.ThiseffectcanbeavoidedbyperiodicallyusingEquation10torecomputethecorrectresidual.
BeforeanalyzingtheconvergenceofSteepestDescent,Imustdigresstoensurethatyouhaveasolid
understandingofeigenvectors.
ThinkingwithEigenvectorsandEigenvalues9
5.ThinkingwithEigenvectorsandEigenvalues
Aftermyonecourseinlinearalgebra,Ikneweigenvectorsandeigenvalueslikethebackofmyhead.Ifyour
instructorwasanythinglikemine,yourecallsolvingproblemsinvolvingeigendoohickeys,butyounever
reallyunderstoodthem.Unfortunately,withoutanintuitivegraspofthem,CGwon’tmakesenseeither.If
you’realreadyeigentalented,feelfreetoskipthissection.
Eigenvectorsareusedprimarilyasananalysistool;SteepestDescentandCGdonotcalculatethevalue
ofanyeigenvectorsaspartofthealgorithm
1
.
5.1.EigendoitifItry
Aneigenvector
ofamatrix
isanonzerovectorthatdoesnotrotatewhen
isappliedtoit(except
perhapstopointinpreciselytheoppositedirection).
maychangelengthorreverseitsdirection,butit
won’tturnsideways.Inotherwords,thereissomescalarconstant
suchthat
.Thevalue
is
aneigenvalueof
.Foranyconstant
,thevector
isalsoaneigenvectorwitheigenvalue
,because
.Inotherwords,ifyouscaleaneigenvector,it’sstillaneigenvector.
Whyshouldyoucare?Iterativemethodsoftendependonapplying
toavectoroverandoveragain.
When
isrepeatedlyappliedtoaneigenvector
,oneoftwothingscanhappen.If
1,then
willvanishas
approachesinfinity(Figure9).If
1,then
willgrowtoinfinity(Figure10).Each
time
isapplied,thevectorgrowsorshrinksaccordingtothevalueof
.
B v
B v
v
2
3
Bv
Figure9:
isaneigenvectorof
withacorrespondingeigenvalueof
0
5.As
increases,
converges
tozero.
BvvB vB v
23
Figure10:
Here,
hasacorrespondingeigenvalueof2.As
increases,
divergestoinfinity.
1
However,therearepracticalapplicationsforeigenvectors.Theeigenvectorsofthestiffnessmatrixassociatedwithadiscretized
structureofuniformdensityrepresentthenaturalmodesofvibrationofthestructurebeingstudied.Forinstance,theeigenvectors
ofthestiffnessmatrixassociatedwithaone-dimensionaluniformly-spacedmesharesinewaves,andtoexpressvibrationsasa
linearcombinationoftheseeigenvectorsisequivalenttoperformingadiscreteFouriertransform.
10JonathanRichardShewchuk
If
issymmetric(andoftenifitisnot),thenthereexistsasetof
linearlyindependenteigenvectors
of
,denoted
1
2
.Thissetisnotunique,becauseeacheigenvectorcanbescaledbyanarbitrary
nonzeroconstant.Eacheigenvectorhasacorrespondingeigenvalue,denoted
1
2
.Theseare
uniquelydefinedforagivenmatrix.Theeigenvaluesmayormaynotbeequaltoeachother;forinstance,
theeigenvaluesoftheidentitymatrix
areallone,andeverynonzerovectorisaneigenvectorof
.
Whatif
isappliedtoavectorthatisnotaneigenvector?Averyimportantskillinunderstanding
linearalgebra—theskillthissectioniswrittentoteach—istothinkofavectorasasumofothervectors
whosebehaviorisunderstood.Considerthatthesetofeigenvectors
formsabasisfor
(becausea
symmetric
has
eigenvectorsthatarelinearlyindependent).Any
-dimensionalvectorcanbeexpressed
asalinearcombinationoftheseeigenvectors,andbecausematrix-vectormultiplicationisdistributive,one
canexaminetheeffectof
oneacheigenvectorseparately.
InFigure11,avector
isillustratedasasumoftwoeigenvectors
1
and
2
.Applying
to
is
equivalenttoapplying
totheeigenvectors,andsummingtheresult.Onrepeatedapplication,wehave
1
2
1
1
2
2
.Ifthemagnitudesofalltheeigenvaluesaresmallerthanone,
will
convergetozero,becausetheeigenvectorsthatcompose
convergetozerowhen
isrepeatedlyapplied.
Ifoneoftheeigenvalueshasmagnitudegreaterthanone,
willdivergetoinfinity.Thisiswhynumerical
analystsattachimportancetothespectralradiusofamatrix:
max
isaneigenvalueof
.
Ifwewant
toconvergetozeroquickly,
shouldbelessthanone,andpreferablyassmallaspossible.
B x
B x
2
3
x
Bx
v
v
1
2
Figure11:
Thevector
(solidarrow)canbeexpressedasalinearcombinationofeigenvectors(dashed
arrows),whoseassociatedeigenvaluesare
1
0
7and
2
2.Theeffectofrepeatedlyapplying
to
isbestunderstoodbyexaminingtheeffectof
oneacheigenvector.When
isrepeatedlyapplied,one
eigenvectorconvergestozerowhiletheotherdiverges;hence,
alsodiverges.
It’simportanttomentionthatthereisafamilyofnonsymmetricmatricesthatdonothaveafullsetof
independenteigenvectors.Thesematricesareknownasdefective,anamethatbetraysthewell-deserved
hostilitytheyhaveearnedfromfrustratedlinearalgebraists.Thedetailsaretoocomplicatedtodescribein
thisarticle,butthebehaviorofdefectivematricescanbeanalyzedintermsofgeneralizedeigenvectorsand
generalizedeigenvalues.Therulethat
convergestozeroifandonlyifallthegeneralizedeigenvalues
havemagnitudesmallerthanonestillholds,butishardertoprove.
Here’sausefulfact:theeigenvaluesofapositive-definitematrixareallpositive.Thisfactcanbeproven
fromthedefinitionofeigenvalue:
Bythedefinitionofpositive-definite,the
ispositive(fornonzero
).Hence,
mustbepositivealso.
ThinkingwithEigenvectorsandEigenvalues11
5.2.Jacobiiterations
Ofcourse,aprocedurethatalwaysconvergestozeroisn’tgoingtohelpyouattractfriends.Consideramore
usefulprocedure:theJacobiMethodforsolving
.Thematrix
issplitintotwoparts:
,whose
diagonalelementsareidenticaltothoseof
,andwhoseoff-diagonalelementsarezero;and
,whose
diagonalelementsarezero,andwhoseoff-diagonalelementsareidenticaltothoseof
.Thus,
.
WederivetheJacobiMethod:
1
1
where
1
1
(14)
Because
isdiagonal,itiseasytoinvert.Thisidentitycanbeconvertedintoaniterativemethodby
formingtherecurrence
1
(15)
Givenastartingvector
0
,thisformulageneratesasequenceofvectors.Ourhopeisthateachsuccessive
vectorwillbeclosertothesolution
thanthelast.
iscalledastationarypointofEquation15,becauseif
,then
1
willalsoequal
.
Now,thisderivationmayseemquitearbitrarytoyou,andyou’reright.Wecouldhaveformedany
numberofidentitiesfor
insteadofEquation14.Infact,simplybysplitting
differently—thatis,
bychoosingadifferent
and
—wecouldhavederivedtheGauss-Seidelmethod,orthemethodof
SuccessiveOver-Relaxation(SOR).Ourhopeisthatwehavechosenasplittingforwhich
hasasmall
spectralradius.Here,IchosetheJacobisplittingarbitrarilyforsimplicity.
Supposewestartwithsomearbitraryvector
0
.Foreachiteration,weapply
tothisvector,thenadd
totheresult.Whatdoeseachiterationdo?
Again,applytheprincipleofthinkingofavectorasasumofother,well-understoodvectors.Express
eachiterate
asthesumoftheexactsolution
andtheerrorterm
.Then,Equation15becomes
1
(byEquation14)
1
(16)
Eachiterationdoesnotaffectthe“correctpart”of
(because
isastationarypoint);buteachiteration
doesaffecttheerrorterm.ItisapparentfromEquation16thatif
1,thentheerrorterm
will
convergetozeroas
approachesinfinity.Hence,theinitialvector
0
hasnoeffectontheinevitable
outcome!
Ofcourse,thechoiceof
0
doesaffectthenumberofiterationsrequiredtoconvergeto
within
agiventolerance.However,itseffectislessimportantthanthatofthespectralradius
,which
determinesthespeedofconvergence.Supposethat
istheeigenvectorof
withthelargesteigenvalue
(sothat
).Iftheinitialerror
0
,expressedasalinearcombinationofeigenvectors,includesa
componentinthedirectionof
,thiscomponentwillbetheslowesttoconverge.
12JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
7
2
1
2
Figure12:
Theeigenvectorsof
aredirectedalongtheaxesoftheparaboloiddefinedbythequadratic
form
.Eacheigenvectorislabeledwithitsassociatedeigenvalue.Eacheigenvalueisproportionalto
thesteepnessofthecorrespondingslope.
isnotgenerallysymmetric(evenif
is),andmayevenbedefective.However,therateofconvergence
oftheJacobiMethoddependslargelyon
,whichdependson
.Unfortunately,Jacobidoesnotconverge
forevery
,orevenforeverypositive-definite
.
5.3.AConcreteExample
Todemonstratetheseideas,IshallsolvethesystemspecifiedbyEquation4.First,weneedamethodof
findingeigenvaluesandeigenvectors.Bydefinition,foranyeigenvector
witheigenvalue
,
0
Eigenvectorsarenonzero,so
mustbesingular.Then,
det
0
Thedeterminantof
iscalledthecharacteristicpolynomial.Itisadegree
polynomialin
whoserootsarethesetofeigenvalues.Thecharacteristicpolynomialof
(fromEquation4)is
det
3
2
2
6
2
9
14
7
2
andtheeigenvaluesare7and2.Tofindtheeigenvectorassociatedwith
7,
4
2
21
1
2
0
4
1
2
2
0
ConvergenceAnalysisofSteepestDescent13
Anysolutiontothisequationisaneigenvector;say,
1
2
.Bythesamemethod,wefindthat
2
1
isaneigenvectorcorrespondingtotheeigenvalue2.InFigure12,weseethattheseeigenvectorscoincide
withtheaxesofthefamiliarellipsoid,andthatalargereigenvaluecorrespondstoasteeperslope.(Negative
eigenvaluesindicatethat
decreasesalongtheaxis,asinFigures5(b)and5(d).)
Now,let’sseetheJacobiMethodinaction.UsingtheconstantsspecifiedbyEquation4,wehave
1
1
3
0
0
1
6
02
20
1
3
0
0
1
6
2
8
0
2
3
1
3
0
2
3
4
3
Theeigenvectorsof
are
2
1
witheigenvalue
2
3,and
2
1
witheigenvalue
2
3.
ThesearegraphedinFigure13(a);notethattheydonotcoincidewiththeeigenvectorsof
,andarenot
relatedtotheaxesoftheparaboloid.
Figure13(b)showstheconvergenceoftheJacobimethod.Themysteriouspaththealgorithmtakescan
beunderstoodbywatchingtheeigenvectorcomponentsofeachsuccessiveerrorterm(Figures13(c),(d),
and(e)).Figure13(f)plotstheeigenvectorcomponentsasarrowheads.Theseareconvergingnormallyat
theratedefinedbytheireigenvalues,asinFigure11.
Ihopethatthissectionhasconvincedyouthateigenvectorsareusefultools,andnotjustbizarretorture
devicesinflictedonyoubyyourprofessorsforthepleasureofwatchingyousuffer(althoughthelatterisa
nicefringebenefit).
6.ConvergenceAnalysisofSteepestDescent
6.1.InstantResults
TounderstandtheconvergenceofSteepestDescent,let’sfirstconsiderthecasewhere
isaneigenvector
witheigenvalue
.Then,theresidual
isalsoaneigenvector.Equation12gives
1
0
Figure14demonstrateswhyittakesonlyonesteptoconvergetotheexactsolution.Thepoint
lies
ononeoftheaxesoftheellipsoid,andsotheresidualpointsdirectlytothecenteroftheellipsoid.Choosing
1
givesusinstantconvergence.
Foramoregeneralanalysis,wemustexpress
asalinearcombinationofeigenvectors,andwe
shallfurthermorerequiretheseeigenvectorstobeorthonormal.ItisproveninAppendixC2thatif
is
14JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
(a)
2
0
470
47
-4
-2
2
4
6
-6
-4
-2
2
4
1
(b)
2
0
-4
-2
2
4
6
-6
-4
2
4
1
(c)
2
0
-4
-2
2
4
6
-6
-4
-2
2
4
1
(d)
2
1
-4
-2
2
4
6
-6
-4
-2
2
4
1
(e)
2
2
-4
-2
2
4
6
-6
-4
-2
2
4
1
(f)
2
Figure13:
ConvergenceoftheJacobiMethod.(a)Theeigenvectorsof
areshownwiththeircorresponding
eigenvalues.Unliketheeigenvectorsof
,theseeigenvectorsarenottheaxesoftheparaboloid.(b)The
JacobiMethodstartsat
2
2
andconvergesat
2
2
.(c,d,e)Theerrorvectors
0
,
1
,
2
(solid
arrows)andtheireigenvectorcomponents(dashedarrows).(f)Arrowheadsrepresenttheeigenvector
componentsofthefirstfourerrorvectors.Eacheigenvectorcomponentoftheerrorisconvergingtozeroat
theexpectedratebasedonitseigenvalue.
ConvergenceAnalysisofSteepestDescent15
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
Figure14:
SteepestDescentconvergestotheexactsolutiononthefirstiterationiftheerrortermisan
eigenvector.
symmetric,thereexistsasetof
orthogonaleigenvectorsof
.Aswecanscaleeigenvectorsarbitrarily,
letuschoosesothateacheigenvectorisofunitlength.Thischoicegivesustheusefulpropertythat
1
0
(17)
Expresstheerrortermasalinearcombinationofeigenvectors
1
(18)
where
isthelengthofeachcomponentof
.FromEquations17and18wehavethefollowingidentities:
(19)
2
2
(20)
2
(21)
2
2
2
(22)
2
3
(23)
16JonathanRichardShewchuk
-4
-2
2
4
6
-4
-2
2
4
6
1
2
Figure15:
SteepestDescentconvergestotheexactsolutiononthefirstiterationiftheeigenvaluesareall
equal.
Equation19showsthat
toocanbeexpressedasasumofeigenvectorcomponents,andthelengthof
thesecomponentsare
.Equations20and22arejustPythagoras’Law.
Nowwecanproceedwiththeanalysis.Equation12gives
1
2
2
2
3
(24)
Wesawinthelastexamplethat,if
hasonlyoneeigenvectorcomponent,thenconvergenceis
achievedinonestepbychoosing
1
.Nowlet’sexaminethecasewhere
isarbitrary,butallthe
eigenvectorshaveacommoneigenvalue
.Equation24becomes
1
2
2
3
2
0
Figure15demonstrateswhy,onceagain,thereisinstantconvergence.Becausealltheeigenvaluesare
equal,theellipsoidisspherical;hence,nomatterwhatpointwestartat,theresidualmustpointtothecenter
ofthesphere.Asbefore,choose
1
.
ConvergenceAnalysisofSteepestDescent17
-6
-4
-2
2
-2
2
4
6
8
1
2
Figure16:
Theenergynormofthesetwovectorsisequal.
However,ifthereareseveralunequal,nonzeroeigenvalues,thennochoiceof
willeliminateallthe
eigenvectorcomponents,andourchoicebecomesasortofcompromise.Infact,thefractioninEquation24
isbestthoughtofasaweightedaverageofthevaluesof
1
.Theweights
2
ensurethatlongercomponents
of
aregivenprecedence.Asaresult,onanygiveniteration,someoftheshortercomponentsof
mightactuallyincreaseinlength(thoughneverforlong).Forthisreason,themethodsofSteepestDescent
andConjugateGradientsarecalledroughers.Bycontrast,theJacobiMethodisasmoother,becauseevery
eigenvectorcomponentisreducedoneveryiteration.SteepestDescentandConjugateGradientsarenot
smoothers,althoughtheyareoftenerroneouslyidentifiedassuchinthemathematicalliterature.
6.2.GeneralConvergence
ToboundtheconvergenceofSteepestDescentinthegeneralcase,weshalldefinetheenergynorm
1
2
(seeFigure16).ThisnormiseasiertoworkwiththantheEuclideannorm,andisin
somesenseamorenaturalnorm;examinationofEquation8showsthatminimizing
isequivalentto
18JonathanRichardShewchuk
minimizing
.Withthisnorm,wehave
1
2
1
1
(byEquation12)
2
2
(bysymmetryof
)
2
2
2
2
2
2
1
2
2
1
2
2
2
2
3
2
(byIdentities21,22,23)
2
2
2
1
2
2
2
2
3
2
(25)
Theanalysisdependsonfindinganupperboundfor
.Todemonstratehowtheweightsandeigenvalues
affectconvergence,Ishallderivearesultfor
2.Assumethat
1
2
.Thespectralconditionnumber
of
isdefinedtobe
1
2
1.Theslopeof
(relativetothecoordinatesystemdefinedbythe
eigenvectors),whichdependsonthestartingpoint,isdenoted
2
1
.Wehave
2
1
2
1
2
1
2
2
2
2
2
2
1
1
2
2
2
2
1
3
1
2
2
3
2
1
2
2
2
2
3
2
(26)
Thevalueof
,whichdeterminestherateofconvergenceofSteepestDescent,isgraphedasafunction
of
and
inFigure17.Thegraphconfirmsmytwoexamples.If
0
isaneigenvector,thentheslope
iszero(orinfinite);weseefromthegraphthat
iszero,soconvergenceisinstant.Iftheeigenvaluesare
equal,thentheconditionnumber
isone;again,weseethat
iszero.
Figure18illustratesexamplesfromneareachofthefourcornersofFigure17.Thesequadraticforms
aregraphedinthecoordinatesystemdefinedbytheireigenvectors.Figures18(a)and18(b)areexamples
withalargeconditionnumber.SteepestDescentcanconvergequicklyifafortunatestartingpointischosen
(Figure18(a)),butisusuallyatitsworstwhen
islarge(Figure18(b)).Thelatterfiguregivesusourbest
intuitionforwhyalargeconditionnumbercanbebad:
formsatrough,andSteepestDescentbounces
backandforthbetweenthesidesofthetroughwhilemakinglittleprogressalongitslength.InFigures18(c)
and18(d),theconditionnumberissmall,sothequadraticformisnearlyspherical,andconvergenceisquick
regardlessofthestartingpoint.
Holding
constant(because
isfixed),alittlebasiccalculusrevealsthatEquation26ismaximized
when
.InFigure17,onecanseeafaintridgedefinedbythisline.Figure19plotsworst-case
startingpointsforoursamplematrix
.Thesestartingpointsfallonthelinesdefinedby
2
1
.An
ConvergenceAnalysisofSteepestDescent19
0
5
10
15
20
1
20
40
60
80
100
0
0.2
0.4
0.6
0.8
5
10
15
20
Figure17:
Convergence
ofSteepestDescentasafunctionof
(theslopeof
)and
(thecondition
numberof
).Convergenceisfastwhen
or
aresmall.Forafixedmatrix,convergenceisworstwhen
.
-4
-2
2
4
-4
-2
2
4
6
-4
-2
2
4
-4
-2
2
4
6
-4
-2
2
4
-4
-2
2
4
6
-4
-2
2
4
-4
-2
2
4
6
1
(c)
2
1
(d)
2
1
(a)
2
1
(b)
2
Figure18:
Thesefourexamplesrepresentpointsnearthecorrespondingfourcornersofthegraphin
Figure17.(a)Large
,small
.(b)Anexampleofpoorconvergence.
and
arebothlarge.(c)Small
and
.(d)Small
,large
.
20JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
0
Figure19:
SolidlinesrepresentthestartingpointsthatgivetheworstconvergenceforSteepestDescent.
Dashedlinesrepresentstepstowardconvergence.Ifthefirstiterationstartsfromaworst-casepoint,sodo
allsucceedingiterations.Eachsteptakenintersectstheparaboloidaxes(grayarrows)atpreciselya45
angle.Here,
3
5.
upperboundfor
(correspondingtotheworst-casestartingpoints)isfoundbysetting
2
2
:
2
1
4
4
5
2
4
3
5
2
4
3
5
2
4
3
1
2
1
2
1
1
(27)
Inequality27isplottedinFigure20.Themoreill-conditionedthematrix(thatis,thelargeritscondition
number
),theslowertheconvergenceofSteepestDescent.InSection9.2,itisproventhatEquation27is
alsovalidfor
2,iftheconditionnumberofasymmetric,positive-definitematrixisdefinedtobe
theratioofthelargesttosmallesteigenvalue.TheconvergenceresultsforSteepestDescentare
1
1
0
and(28)
TheMethodofConjugateDirections21
20
40
60
80
100
0
0.2
0.4
0.6
0.8
1
Figure20:
ConvergenceofSteepestDescent(periteration)worsensastheconditionnumberofthematrix
increases.
0
1
2
1
2
0
0
(byEquation8)
1
1
2
7.TheMethodofConjugateDirections
7.1.Conjugacy
SteepestDescentoftenfindsitselftakingstepsinthesamedirectionasearliersteps(seeFigure8).Wouldn’t
itbebetterif,everytimewetookastep,wegotitrightthefirsttime?Here’sanidea:let’spickasetof
orthogonalsearchdirections
0
1
1
.Ineachsearchdirection,we’lltakeexactlyonestep,
andthatstepwillbejusttherightlengthtolineupevenlywith
.After
steps,we’llbedone.
Figure21illustratesthisidea,usingthecoordinateaxesassearchdirections.Thefirst(horizontal)step
leadstothecorrect
1
-coordinate;thesecond(vertical)stepwillhithome.Noticethat
1
isorthogonalto
0
.Ingeneral,foreachstepwechooseapoint
1
(29)
Tofindthevalueof
,usethefactthat
1
shouldbeorthogonalto
,sothatweneedneverstepin
22JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
0
1
1
0
Figure21:
TheMethodofOrthogonalDirections.Unfortunately,thismethodonlyworksifyoualreadyknow
theanswer.
thedirectionof
again.Usingthiscondition,wehave
1
0
0(byEquation29)
(30)
Unfortunately,wehaven’taccomplishedanything,becausewecan’tcompute
withoutknowing
;
andifweknew
,theproblemwouldalreadybesolved.
Thesolutionistomakethesearchdirections
-orthogonalinsteadoforthogonal.Twovectors
and
are
-orthogonal,orconjugate,if
0
Figure22(a)showswhat
-orthogonalvectorslooklike.Imagineifthisarticlewereprintedonbubble
gum,andyougrabbedFigure22(a)bytheendsandstretchedituntiltheellipsesappearedcircular.The
vectorswouldthenappearorthogonal,asinFigure22(b).
Ournewrequirementisthat
1
be
-orthogonalto
(seeFigure23(a)).Notcoincidentally,this
orthogonalityconditionisequivalenttofindingtheminimumpointalongthesearchdirection
,asin
TheMethodofConjugateDirections23
-4
-2
2
4
-4
-2
2
4
1
2
(a)
-4
-2
2
4
-4
-2
2
4
1
2
(b)
Figure22:
Thesepairsofvectorsare
-orthogonal
becausethesepairsofvectorsareorthogonal.
SteepestDescent.Toseethis,setthedirectionalderivativetozero:
1
0
1
1
0
1
0
1
0
FollowingthederivationofEquation30,hereistheexpressionfor
whenthesearchdirectionsare
-orthogonal:
(31)
(32)
UnlikeEquation30,wecancalculatethisexpression.Notethatifthesearchvectorweretheresidual,this
formulawouldbeidenticaltotheformulausedbySteepestDescent.
24JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
0
1
1
0
(a)
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
0
(b)
Figure23:
ThemethodofConjugateDirectionsconvergesin
steps.(a)Thefirststepistakenalongsome
direction
0
.Theminimumpoint
1
ischosenbytheconstraintthat
1
mustbe
-orthogonalto
0
.(b)
Theinitialerror
0
canbeexpressedasasumof
-orthogonalcomponents(grayarrows).Eachstepof
ConjugateDirectionseliminatesoneofthesecomponents.
Toprovethatthisprocedurereallydoescompute
in
steps,expresstheerrortermasalinear
combinationofsearchdirections;namely,
0
1
0
(33)
Thevaluesof
canbefoundbyamathematicaltrick.Becausethesearchdirectionsare
-orthogonal,it
ispossibletoeliminateallthe
valuesbutonefromExpression33bypremultiplyingtheexpressionby
:
0
0
(by
-orthogonalityof
vectors)
0
0
1
0
(by
-orthogonalityof
vectors)
(ByEquation29).(34)
ByEquations31and34,wefindthat
.Thisfactgivesusanewwaytolookattheerror
term.Asthefollowingequationshows,theprocessofbuildingup
componentbycomponentcanalsobe
TheMethodofConjugateDirections25
viewedasaprocessofcuttingdowntheerrortermcomponentbycomponent(seeFigure23(b)).
0
1
0
1
0
1
0
1
(35)
After
iterations,everycomponentiscutaway,and
0;theproofiscomplete.
7.2.Gram-SchmidtConjugation
Allthatisneedednowisasetof
-orthogonalsearchdirections
.Fortunately,thereisasimpleway
togeneratethem,calledaconjugateGram-Schmidtprocess.
Supposewehaveasetof
linearlyindependentvectors
0
1
1
.Thecoordinateaxeswill
doinapinch,althoughmoreintelligentchoicesarepossible.Toconstruct
,take
andsubtractout
anycomponentsthatarenot
-orthogonaltotheprevious
vectors(seeFigure24).Inotherwords,set
0
0
,andfor
0,set
1
0
(36)
wherethe
aredefinedfor
.Tofindtheirvalues,usethesametrickusedtofind
:
1
0
0
(by
-orthogonalityof
vectors)
(37)
ThedifficultywithusingGram-SchmidtconjugationinthemethodofConjugateDirectionsisthatall
theoldsearchvectorsmustbekeptinmemorytoconstructeachnewone,andfurthermore
3
operations
d
d
u
u
u
u
+
*
d
0
1
(0)
(0)
(1)
Figure24:
Gram-Schmidtconjugationoftwovectors.Beginwithtwolinearlyindependentvectors
0
and
1
.Set
0
0
.Thevector
1
iscomposedoftwocomponents:
,whichis
-orthogonal(orconjugate)
to
0
,and
,whichisparallelto
0
.Afterconjugation,onlythe
-orthogonalportionremains,and
1
.
26JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
Figure25:
ThemethodofConjugateDirectionsusingtheaxialunitvectors,alsoknownasGaussian
elimination.
arerequiredtogeneratethefullset.Infact,ifthesearchvectorsareconstructedbyconjugationoftheaxial
unitvectors,ConjugateDirectionsbecomesequivalenttoperformingGaussianelimination(seeFigure25).
Asaresult,themethodofConjugateDirectionsenjoyedlittleuseuntilthediscoveryofCG—whichisa
methodofConjugateDirections—curedthesedisadvantages.
AnimportantkeytounderstandingthemethodofConjugateDirections(andalsoCG)istonotice
thatFigure25isjustastretchedcopyofFigure21!Rememberthatwhenoneisperformingthemethod
ofConjugateDirections(includingCG),oneissimultaneouslyperformingthemethodofOrthogonal
Directionsinastretched(scaled)space.
7.3.OptimalityoftheErrorTerm
ConjugateDirectionshasaninterestingproperty:itfindsateverystepthebestsolutionwithinthebounds
ofwhereit’sbeenallowedtoexplore.Wherehasitbeenallowedtoexplore?Let
bethe
-dimensional
subspacespan
0
1
1
;thevalue
ischosenfrom
0
.WhatdoImeanby“best
solution”?ImeanthatConjugateDirectionschoosesthevaluefrom
0
thatminimizes
(see
Figure26).Infact,someauthorsderiveCGbytryingtominimize
within
0
.
Inthesamewaythattheerrortermcanbeexpressedasalinearcombinationofsearchdirections
TheMethodofConjugateDirections27
d
d
(0)
(1)
e
(2)
e
(0)
(1)
e
0
Figure26:
Inthisfigure,theshadedareais
0
2
0
span
0
1
.Theellipsoidisacontouron
whichtheenergynormisconstant.Aftertwosteps,ConjugateDirectionsfinds
2
,thepointon
0
2
thatminimizes
.
(Equation35),itsenergynormcanbeexpressedasasummation.
1
1
(byEquation35)
1
2
(by
-orthogonalityof
vectors).
Eachterminthissummationisassociatedwithasearchdirectionthathasnotyetbeentraversed.Anyother
vector
chosenfrom
0
musthavethesesametermsinitsexpansion,whichprovesthat
must
havetheminimumenergynorm.
Havingproventheoptimalitywithequations,let’sturntotheintuition.Perhapsthebestwaytovisualize
theworkingsofConjugateDirectionsisbycomparingthespaceweareworkinginwitha“stretched”space,
asinFigure22.Figures27(a)and27(c)demonstratethebehaviorofConjugateDirectionsin
2
and
3
;
linesthatappearperpendicularintheseillustrationsareorthogonal.Ontheotherhand,Figures27(b)and
27(d)showthesamedrawingsinspacesthatarestretched(alongtheeigenvectoraxes)sothattheellipsoidal
contourlinesbecomespherical.Linesthatappearperpendicularintheseillustrationsare
-orthogonal.
InFigure27(a),theMethodofConjugateDirectionsbeginsat
0
,takesastepinthedirectionof
0
,
andstopsatthepoint
1
,wheretheerrorvector
1
is
-orthogonalto
0
.Whyshouldweexpect
thistobetheminimumpointon
0
1
?TheanswerisfoundinFigure27(b):inthisstretchedspace,
1
appearsperpendicularto
0
becausetheyare
-orthogonal.Theerrorvector
1
isaradiusofthe
concentriccirclesthatrepresentcontoursonwhich
isconstant,so
0
1
mustbetangentat
1
tothecirclethat
1
lieson.Hence,
1
isthepointon
0
1
thatminimizes
1
.
Thisisnotsurprising;wehavealreadyseeninSection7.1that
-conjugacyofthesearchdirectionand
theerrortermisequivalenttominimizing
(andtherefore
)alongthesearchdirection.However,
afterConjugateDirectionstakesasecondstep,minimizing
alongasecondsearchdirection
1
,why
shouldweexpectthat
willstillbeminimizedinthedirectionof
0
?Aftertaking
steps,whywill
beminimizedoverallof
0
?
28JonathanRichardShewchuk
1
1
0
1
1
0
0
1
(a)
1
1
0
0
1
1
0
1
(b)
1
2
0
0
2
0
1
1
0
1
(c)
2
1
1
0
0
1
0
2
0
1
(d)
Figure27:
OptimalityoftheMethodofConjugateDirections.(a)Atwo-dimensionalproblem.Linesthat
appearperpendicularareorthogonal.(b)Thesameproblemina“stretched”space.Linesthatappear
perpendicularare
-orthogonal.(c)Athree-dimensionalproblem.Twoconcentricellipsoidsareshown;
isatthecenterofboth.Theline
0
1
istangenttotheouterellipsoidat
1
.Theplane
0
2
is
tangenttotheinnerellipsoidat
2
.(d)Stretchedviewofthethree-dimensionalproblem.
TheMethodofConjugateDirections29
InFigure27(b),
0
and
1
appearperpendicularbecausetheyare
-orthogonal.Itisclearthat
1
mustpointtothesolution
,because
0
istangentat
1
toacirclewhosecenteris
.However,a
three-dimensionalexampleismorerevealing.Figures27(c)and27(d)eachshowtwoconcentricellipsoids.
Thepoint
1
liesontheouterellipsoid,and
2
liesontheinnerellipsoid.Lookcarefullyatthesefigures:
theplane
0
2
slicesthroughthelargerellipsoid,andistangenttothesmallerellipsoidat
2
.The
point
isatthecenteroftheellipsoids,underneaththeplane.
LookingatFigure27(c),wecanrephraseourquestion.SupposeyouandIarestandingat
1
,and
wanttowalktothepointthatminimizes
on
0
2
;butweareconstrainedtowalkalongthesearch
direction
1
.If
1
pointstotheminimumpoint,wewillsucceed.Isthereanyreasontoexpectthat
1
willpointtherightway?
Figure27(d)suppliesananswer.Because
1
is
-orthogonalto
0
,theyareperpendicularinthis
diagram.Now,supposeyouwerestaringdownattheplane
0
2
asifitwereasheetofpaper;the
sightyou’dseewouldbeidenticaltoFigure27(b).Thepoint
2
wouldbeatthecenterofthepaper,
andthepoint
wouldlieunderneaththepaper,directlyunderthepoint
2
.Because
0
and
1
are
perpendicular,
1
pointsdirectlyto
2
,whichisthepointin
0
2
closestto
.Theplane
0
2
istangenttothesphereonwhich
2
lies.Ifyoutookathirdstep,itwouldbestraightdownfrom
2
to
,inadirection
-orthogonalto
2
.
AnotherwaytounderstandwhatishappeninginFigure27(d)istoimagineyourselfstandingatthe
solutionpoint
,pullingastringconnectedtoabeadthatisconstrainedtoliein
0
.Eachtimethe
expandingsubspace
isenlargedbyadimension,thebeadbecomesfreetomovealittleclosertoyou.
NowifyoustretchthespacesoitlookslikeFigure27(c),youhavetheMethodofConjugateDirections.
AnotherimportantpropertyofConjugateDirectionsisvisibleintheseillustrations.Wehaveseenthat,
ateachstep,thehyperplane
0
istangenttotheellipsoidonwhich
lies.RecallfromSection4
thattheresidualatanypointisorthogonaltotheellipsoidalsurfaceatthatpoint.Itfollowsthat
is
orthogonalto
aswell.Toshowthisfactmathematically,premultiplyEquation35by
:
1
(38)
0
(by
-orthogonalityof
-vectors).(39)
Wecouldhavederivedthisidentitybyanothertack.Recallthatoncewetakeastepinasearchdirection,
weneedneverstepinthatdirectionagain;theerrortermisevermore
-orthogonaltoalltheoldsearch
directions.Because
,theresidualisevermoreorthogonaltoalltheoldsearchdirections.
Becausethesearchdirectionsareconstructedfromthe
vectors,thesubspacespannedby
0
1
is
,andtheresidual
isorthogonaltotheseprevious
vectorsaswell(seeFigure28).Thisisproven
bytakingtheinnerproductofEquation36and
:
1
0
(40)
0
(byEquation39)
(41)
Thereisonemoreidentitywewilluselater.FromEquation40(andFigure28),
(42)
30JonathanRichardShewchuk
d
d
d
r
(0)
(1)
(2)
(2)
u
u
1
0
u
2
e
(2)
Figure28:
Becausethesearchdirections
0
1
areconstructedfromthevectors
0
1
,theyspanthe
samesubspace
2
(thegray-coloredplane).Theerrorterm
2
is
-orthogonalto
2
,theresidual
2
is
orthogonalto
2
,andanewsearchdirection
2
isconstructed(from
2
)tobe
-orthogonalto
2
.The
endpointsof
2
and
2
lieonaplaneparallelto
2
,because
2
isconstructedfrom
2
byGram-Schmidt
conjugation.
Toconcludethissection,InotethataswiththemethodofSteepestDescent,thenumberofmatrix-vector
productsperiterationcanbereducedtoonebyusingarecurrencetofindtheresidual:
1
1
(43)
8.TheMethodofConjugateGradients
ItmayseemoddthatanarticleaboutCGdoesn’tdescribeCGuntilpage30,butallthemachineryisnowin
place.Infact,CGissimplythemethodofConjugateDirectionswherethesearchdirectionsareconstructed
byconjugationoftheresiduals(thatis,bysetting
).
Thischoicemakessenseformanyreasons.First,theresidualsworkedforSteepestDescent,sowhynot
forConjugateDirections?Second,theresidualhasthenicepropertythatit’sorthogonaltotheprevious
searchdirections(Equation39),soit’sguaranteedalwaystoproduceanew,linearlyindependentsearch
directionunlesstheresidualiszero,inwhichcasetheproblemisalreadysolved.Asweshallsee,thereis
anevenbetterreasontochoosetheresidual.
Let’sconsidertheimplicationsofthischoice.Becausethesearchvectorsarebuiltfromtheresiduals,the
subspacespan
0
1
1
isequalto
.Aseachresidualisorthogonaltotheprevioussearch
directions,itisalsoorthogonaltothepreviousresiduals(seeFigure29);Equation41becomes
0
(44)
Interestingly,Equation43showsthateachnewresidual
isjustalinearcombinationoftheprevious
residualand
1
.Recallingthat
1
,thisfactimpliesthateachnewsubspace
1
isformed
fromtheunionoftheprevioussubspace
andthesubspace
.Hence,
span
0
0
2
0
1
0
span
0
0
2
0
1
0
TheMethodofConjugateGradients31
d
d
d
r
(0)
(1)
(2)
(2)
r
r
(0)
(1)
e
(2)
Figure29:
InthemethodofConjugateGradients,eachnewresidualisorthogonaltoalltheprevious
residualsandsearchdirections;andeachnewsearchdirectionisconstructed(fromtheresidual)tobe
-orthogonaltoallthepreviousresidualsandsearchdirections.Theendpointsof
2
and
2
lieona
planeparallelto
2
(theshadedsubspace).InCG,
2
isalinearcombinationof
2
and
1
.
ThissubspaceiscalledaKrylovsubspace,asubspacecreatedbyrepeatedlyapplyingamatrixtoa
vector.Ithasapleasingproperty:because
isincludedin
1
,thefactthatthenextresidual
1
isorthogonalto
1
(Equation39)impliesthat
1
is
-orthogonalto
.Gram-Schmidtconjugation
becomeseasy,because
1
isalready
-orthogonaltoalloftheprevioussearchdirectionsexcept
!
RecallfromEquation37thattheGram-Schmidtconstantsare
;letus
simplifythisexpression.Takingtheinnerproductof
andEquation43,
1
1
1
1
1
1
0
otherwise
(ByEquation44.)
1
1
1
1
1
0
1
(ByEquation37.)
Asifbymagic,mostofthe
termshavedisappeared.Itisnolongernecessarytostoreoldsearchvectors
toensurethe
-orthogonalityofnewsearchvectors.ThismajoradvanceiswhatmakesCGasimportant
analgorithmasitis,becauseboththespacecomplexityandtimecomplexityperiterationarereducedfrom
2
to
,where
isthenumberofnonzeroentriesof
.Henceforth,Ishallusetheabbreviation
1
.Simplifyingfurther:
1
1
(byEquation32)
1
1
(byEquation42)
32JonathanRichardShewchuk
-4
-2
2
4
6
-6
-4
-2
2
4
1
2
0
Figure30:
ThemethodofConjugateGradients.
Let’sputitalltogetherintoonepiecenow.ThemethodofConjugateGradientsis:
0
0
0
(45)
(byEquations32and42)
(46)
1
1
(47)
1
1
1
(48)
1
1
1
(49)
TheperformanceofCGonoursampleproblemisdemonstratedinFigure30.Thename“Conjugate
Gradients”isabitofamisnomer,becausethegradientsarenotconjugate,andtheconjugatedirectionsare
notallgradients.“ConjugatedGradients”wouldbemoreaccurate.
9.ConvergenceAnalysisofConjugateGradients
CGiscompleteafter
iterations,sowhyshouldwecareaboutconvergenceanalysis?Inpractice,
accumulatedfloatingpointroundofferrorcausestheresidualtograduallyloseaccuracy,andcancellation
ConvergenceAnalysisofConjugateGradients33
errorcausesthesearchvectorstolose
-orthogonality.Theformerproblemcouldbedealtwithasitwas
forSteepestDescent,butthelatterproblemisnoteasilycurable.Becauseofthislossofconjugacy,the
mathematicalcommunitydiscardedCGduringthe1960s,andinterestonlyresurgedwhenevidenceforits
effectivenessasaniterativeprocedurewaspublishedintheseventies.
Timeshavechanged,andsohasouroutlook.Today,convergenceanalysisisimportantbecauseCGis
commonlyusedforproblemssolargeitisnotfeasibletoruneven
iterations.Convergenceanalysisis
seenlessasawardagainstfloatingpointerror,andmoreasaproofthatCGisusefulforproblemsthatare
outofthereachofanyexactalgorithm.
ThefirstiterationofCGisidenticaltothefirstiterationofSteepestDescent,sowithoutchanges,
Section6.1describestheconditionsunderwhichCGconvergesonthefirstiteration.
9.1.PickingPerfectPolynomials
WehaveseenthatateachstepofCG,thevalue
ischosenfrom
0
,where
span
0
0
2
0
1
0
span
0
2
0
3
0
0
Krylovsubspacessuchasthishaveanotherpleasingproperty.Forafixed
,theerrortermhastheform
1
0
Thecoefficients
arerelatedtothevalues
and
,butthepreciserelationshipisnotimportanthere.
WhatisimportantistheproofinSection7.3thatCGchoosesthe
coefficientsthatminimize
.
Theexpressioninparenthesesabovecanbeexpressedasapolynomial.Let
beapolynomialof
degree
.
cantakeeitherascalaroramatrixasitsargument,andwillevaluatetothesame;forinstance,
if
2
2
2
1,then
2
2
2
.Thisflexiblenotationcomesinhandyforeigenvectors,for
whichyoushouldconvinceyourselfthat
.(Notethat
,
2
2
,andsoon.)
Nowwecanexpresstheerrortermas
0
ifwerequirethat
0
1.CGchoosesthispolynomialwhenitchoosesthe
coefficients.Let’s
examinetheeffectofapplyingthispolynomialto
0
.AsintheanalysisofSteepestDescent,express
0
asalinearcombinationoforthogonaluniteigenvectors
0
1
andwefindthat
2
2
2
34JonathanRichardShewchuk
2
7
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
2
7
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
2
7
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
2
7
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
(c)
2
(d)
2
(a)
0
(b)
1
Figure31:
TheconvergenceofCGafter
iterationsdependsonhowcloseapolynomial
ofdegree
can
betozerooneacheigenvalue,giventheconstraintthat
0
1.
CGfindsthepolynomialthatminimizesthisexpression,butconvergenceisonlyasgoodastheconvergence
oftheworsteigenvector.Letting
bethesetofeigenvaluesof
,wehave
2
min
max
2
2
min
max
2
0
2
(50)
Figure31illustrates,forseveralvaluesof
,the
thatminimizesthisexpressionforoursampleproblem
witheigenvalues2and7.Thereisonlyonepolynomialofdegreezerothatsatisfies
0
0
1,andthat
is
0
1,graphedinFigure31(a).Theoptimalpolynomialofdegreeoneis
1
1
2
9,
graphedinFigure31(b).Notethat
1
2
5
9and
1
7
5
9,andsotheenergynormoftheerror
termafteroneiterationofCGisnogreaterthan5
9itsinitialvalue.Figure31(c)showsthat,aftertwo
iterations,Equation50evaluatestozero.Thisisbecauseapolynomialofdegreetwocanbefittothree
points(
2
0
1,
2
2
0,and
2
7
0).Ingeneral,apolynomialofdegree
canfit
1points,
andtherebyaccommodate
separateeigenvalues.
TheforegoingdiscussingreinforcesourunderstandingthatCGyieldstheexactresultafter
iterations;
andfurthermoreprovesthatCGisquickerifthereareduplicatedeigenvalues.Giveninfinitefloatingpoint
precision,thenumberofiterationsrequiredtocomputeanexactsolutionisatmostthenumberofdistinct
eigenvalues.(Thereisoneotherpossibilityforearlytermination:
0
mayalreadybe
-orthogonalto
someoftheeigenvectorsof
.Ifeigenvectorsaremissingfromtheexpansionof
0
,theireigenvalues
maybeomittedfromconsiderationinEquation50.Beforewarned,however,thattheseeigenvectorsmay
bereintroducedbyfloatingpointroundofferror.)
ConvergenceAnalysisofConjugateGradients35
-1
-0.5
0.5
1
-2
-1.5
-1
-0.5
0.5
1
1.5
2
-1
-0.5
0.5
1
-2
-1.5
-1
-0.5
0.5
1
1.5
2
-1
-0.5
0.5
1
-2
-1.5
-1
-0.5
0.5
1
1.5
2
-1
-0.5
0.5
1
-2
-1.5
-1
-0.5
0.5
1
1.5
2
10
49
2
5
Figure32:
Chebyshevpolynomialsofdegree2,5,10,and49.
WealsofindthatCGconvergesmorequicklywheneigenvaluesareclusteredtogether(Figure31(d))
thanwhentheyareirregularlydistributedbetween
and
,becauseitiseasierforCGtochoosea
polynomialthatmakesEquation50small.
Ifweknowsomethingaboutthecharacteristicsoftheeigenvaluesof
,itissometimespossibleto
suggestapolynomialthatleadstoaproofofafastconvergence.Fortheremainderofthisanalysis,however,
Ishallassumethemostgeneralcase:theeigenvaluesareevenlydistributedbetween
and
,the
numberofdistincteigenvaluesislarge,andfloatingpointroundoffoccurs.
9.2.ChebyshevPolynomials
AusefulapproachistominimizeEquation50overtherange
ratherthanatafinitenumberof
points.ThepolynomialsthataccomplishthisarebasedonChebyshevpolynomials.
TheChebyshevpolynomialofdegree
is
1
2
2
1
2
1
(Ifthisexpressiondoesn’tlooklikeapolynomialtoyou,tryworkingitoutfor
equalto1or2.)Several
ChebyshevpolynomialsareillustratedinFigure32.TheChebyshevpolynomialshavethepropertythat
1(infact,theyoscillatebetween1and
1)onthedomain
1
1
,andfurthermorethat
ismaximumonthedomain
1
1
amongallsuchpolynomials.Looselyspeaking,
increasesasquicklyaspossibleoutsidetheboxesintheillustration.
36JonathanRichardShewchuk
1
2
3
4
5
6
7
8
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
2
Figure33:
Thepolynomial
2
thatminimizesEquation50for
2and
7inthegeneralcase.
ThiscurveisascaledversionoftheChebyshevpolynomialofdegree2.Theenergynormoftheerrorterm
aftertwoiterationsisnogreaterthan0.183timesitsinitialvalue.ComparewithFigure31(c),whereitis
knownthatthereareonlytwoeigenvalues.
ItisshowninAppendixC3thatEquation50isminimizedbychoosing
2
ThispolynomialhastheoscillatingpropertiesofChebyshevpolynomialsonthedomain
(seeFigure33).Thedenominatorenforcesourrequirementthat
0
1.Thenumeratorhasamaximum
valueofoneontheintervalbetween
and
,sofromEquation50wehave
1
0
1
1
1
0
2
1
1
1
1
1
0
(51)
Thesecondaddendinsidethesquarebracketsconvergestozeroas
grows,soitismorecommontoexpress
theconvergenceofCGwiththeweakerinequality
2
1
1
0
(52)
Complexity37
20
40
60
80
100
0
0.2
0.4
0.6
0.8
1
Figure34:
ConvergenceofConjugateGradients(periteration)asafunctionofconditionnumber.Compare
withFigure20.
ThefirststepofCGisidenticaltoastepofSteepestDescent.Setting
1inEquation51,weobtain
Equation28,theconvergenceresultforSteepestDescent.Thisisjustthelinearpolynomialcaseillustrated
inFigure31(b).
Figure34chartstheconvergenceperiterationofCG,ignoringthelostfactorof2.Inpractice,CG
usuallyconvergesfasterthanEquation52wouldsuggest,becauseofgoodeigenvaluedistributionsorgood
startingpoints.ComparingEquations52and28,itisclearthattheconvergenceofCGismuchquicker
thanthatofSteepestDescent(seeFigure35).However,itisnotnecessarilytruethateveryiterationofCG
enjoysfasterconvergence;forexample,thefirstiterationofCGisaniterationofSteepestDescent.The
factorof2inEquation52allowsCGalittleslackforthesepooriterations.
10.Complexity
ThedominatingoperationsduringaniterationofeitherSteepestDescentorCGarematrix-vectorproducts.
Ingeneral,matrix-vectormultiplicationrequires
operations,where
isthenumberofnon-zero
entriesinthematrix.Formanyproblems,includingthoselistedintheintroduction,
issparseand
.
Supposewewishtoperformenoughiterationstoreducethenormoftheerrorbyafactorof
;thatis,
0
.Equation28canbeusedtoshowthatthemaximumnumberofiterationsrequiredto
achievethisboundusingSteepestDescentis
1
2
ln
1
whereasEquation52suggeststhatthemaximumnumberofiterationsCGrequiresis
1
2
ln
2
38JonathanRichardShewchuk
200
400
600
800
1000
0
5
10
15
20
25
30
35
40
Figure35:
NumberofiterationsofSteepestDescentrequiredtomatchoneiterationofCG.
IconcludethatSteepestDescenthasatimecomplexityof
,whereasCGhasatimecomplexityof
.Bothalgorithmshaveaspacecomplexityof
.
Finitedifferenceandfiniteelementapproximationsofsecond-orderellipticboundaryvalueproblems
posedon
-dimensionaldomainsoftenhave
2
.Thus,SteepestDescenthasatimecomplexityof
2
fortwo-dimensionalproblems,versus
3
2
forCG;andSteepestDescenthasatimecomplexity
of
5
3
forthree-dimensionalproblems,versus
4
3
forCG.
11.StartingandStopping
IntheprecedingpresentationoftheSteepestDescentandConjugateGradientalgorithms,severaldetails
havebeenomitted;particularly,howtochooseastartingpoint,andwhentostop.
11.1.Starting
There’snotmuchtosayaboutstarting.Ifyouhavearoughestimateofthevalueof
,useitasthestarting
value
0
.Ifnot,set
0
0;eitherSteepestDescentorCGwilleventuallyconvergewhenusedtosolve
linearsystems.Nonlinearminimization(comingupinSection14)istrickier,though,becausetheremay
beseverallocalminima,andthechoiceofstartingpointwilldeterminewhichminimumtheprocedure
convergesto,orwhetheritwillconvergeatall.
11.2.Stopping
WhenSteepestDescentorCGreachestheminimumpoint,theresidualbecomeszero,andifEquation11
or48isevaluatedaniterationlater,adivisionbyzerowillresult.Itseems,then,thatonemuststop
Preconditioning39
immediatelywhentheresidualiszero.Tocomplicatethings,though,accumulatedroundofferrorinthe
recursiveformulationoftheresidual(Equation47)mayyieldafalsezeroresidual;thisproblemcouldbe
resolvedbyrestartingwithEquation45.
Usually,however,onewishestostopbeforeconvergenceiscomplete.Becausetheerrortermisnot
available,itiscustomarytostopwhenthenormoftheresidualfallsbelowaspecifiedvalue;often,this
valueissomesmallfractionoftheinitialresidual(
0
).SeeAppendixBforsamplecode.
12.Preconditioning
Preconditioningisatechniqueforimprovingtheconditionnumberofamatrix.Supposethat
isa
symmetric,positive-definitematrixthatapproximates
,butiseasiertoinvert.Wecansolve
indirectlybysolving
1
1
(53)
If
1
,oriftheeigenvaluesof
1
arebetterclusteredthanthoseof
,wecaniteratively
solveEquation53morequicklythantheoriginalproblem.Thecatchisthat
1
isnotgenerally
symmetricnordefinite,evenif
and
are.
Wecancircumventthisdifficulty,becauseforeverysymmetric,positive-definite
thereisa(not
necessarilyunique)matrix
thathasthepropertythat
.(Suchan
canbeobtained,for
instance,byCholeskyfactorization.)Thematrices
1
and
1
havethesameeigenvalues.This
istruebecauseif
isaneigenvectorof
1
witheigenvalue
,then
isaneigenvectorof
1
witheigenvalue
:
1
1
1
Thesystem
canbetransformedintotheproblem
1
1
whichwesolvefirstfor
,thenfor
.Because
1
issymmetricandpositive-definite,
canbe
foundbySteepestDescentorCG.TheprocessofusingCGtosolvethissystemiscalledtheTransformed
PreconditionedConjugateGradientMethod:
0
0
1
1
0
1
1
1
1
1
1
1
1
1
1
40JonathanRichardShewchuk
-4
-2
2
4
6
-8
-6
-4
-2
1
2
Figure36:
Contourlinesofthequadraticformofthediagonallypreconditionedsampleproblem.
Thismethodhastheundesirablecharacteristicthat
mustbecomputed.However,afewcareful
variablesubstitutionscaneliminate
.Setting
1
and
,andusingtheidentities
and
1
1
,wederivetheUntransformedPreconditionedConjugateGradient
Method:
0
0
0
1
0
1
1
1
1
1
1
1
1
1
1
1
1
Thematrix
doesnotappearintheseequations;only
1
isneeded.Bythesamemeans,itispossible
toderiveaPreconditionedSteepestDescentMethodthatdoesnotuse
.
Theeffectivenessofapreconditioner
isdeterminedbytheconditionnumberof
1
,andocca-
sionallybyitsclusteringofeigenvalues.Theproblemremainsoffindingapreconditionerthatapproximates
wellenoughtoimproveconvergenceenoughtomakeupforthecostofcomputingtheproduct
1
onceperiteration.(Itisnotnecessarytoexplicitlycompute
or
1
;itisonlynecessarytobeableto
computetheeffectofapplying
1
toavector.)Withinthisconstraint,thereisasurprisinglyrichsupply
ofpossibilities,andIcanonlyscratchthesurfacehere.
ConjugateGradientsontheNormalEquations41
Intuitively,preconditioningisanattempttostretchthequadraticformtomakeitappearmorespherical,
sothattheeigenvaluesareclosetoeachother.Aperfectpreconditioneris
;forthispreconditioner,
1
hasaconditionnumberofone,andthequadraticformisperfectlyspherical,sosolutiontakesonly
oneiteration.Unfortunately,thepreconditioningstepissolvingthesystem
,sothisisn’tauseful
preconditioneratall.
Thesimplestpreconditionerisadiagonalmatrixwhosediagonalentriesareidenticaltothoseof
.The
processofapplyingthispreconditioner,knownasdiagonalpreconditioningorJacobipreconditioning,is
equivalenttoscalingthequadraticformalongthecoordinateaxes.(Bycomparison,theperfectprecondi-
tioner
scalesthequadraticformalongitseigenvectoraxes.)Adiagonalmatrixistrivialtoinvert,
butisoftenonlyamediocrepreconditioner.Thecontourlinesofoursampleproblemareshown,after
diagonalpreconditioning,inFigure36.ComparingwithFigure3,itisclearthatsomeimprovementhas
occurred.Theconditionnumberhasimprovedfrom3
5toroughly2
8.Ofcourse,thisimprovementis
muchmorebeneficialforsystemswhere
2.
AmoreelaboratepreconditionerisincompleteCholeskypreconditioning.Choleskyfactorizationisa
techniqueforfactoringamatrix
intotheform
,where
isalowertriangularmatrix.Incomplete
Choleskyfactorizationisavariantinwhichlittleornofillisallowed;
isapproximatedbytheproduct
,where
mightberestrictedtohavethesamepatternofnonzeroelementsas
;otherelementsof
are
thrownaway.Touse
asapreconditioner,thesolutionto
iscomputedbybacksubstitution
(theinverseof
isneverexplicitlycomputed).Unfortunately,incompleteCholeskypreconditioningis
notalwaysstable.
Manypreconditioners,somequitesophisticated,havebeendeveloped.Whateveryourchoice,itis
generallyacceptedthatforlarge-scaleapplications,CGshouldnearlyalwaysbeusedwithapreconditioner.
13.ConjugateGradientsontheNormalEquations
CGcanbeusedtosolvesystemswhere
isnotsymmetric,notpositive-definite,andevennotsquare.A
solutiontotheleastsquaresproblem
min
2
(54)
canbefoundbysettingthederivativeofExpression54tozero:
(55)
If
issquareandnonsingular,thesolutiontoEquation55isthesolutionto
.If
isnotsquare,
and
isoverconstrained—thatis,hasmorelinearlyindependentequationsthanvariables—then
theremayormaynotbeasolutionto
,butitisalwayspossibletofindavalueof
thatminimizes
Expression54,thesumofthesquaresoftheerrorsofeachlinearequation.
issymmetricandpositive(forany
,
2
0).If
isnotunderconstrained,
then
isnonsingular,andmethodslikeSteepestDescentandCGcanbeusedtosolveEquation55.The
onlynuisanceindoingsoisthattheconditionnumberof
isthesquareofthatof
,soconvergenceis
significantlyslower.
Animportanttechnicalpointisthatthematrix
isneverformedexplicitly,becauseitislesssparse
than
.Instead,
ismultipliedby
byfirstfindingtheproduct
,andthen
.Also,numerical
42JonathanRichardShewchuk
stabilityisimprovedifthevalue
(inEquation46)iscomputedbytakingtheinnerproductof
withitself.
14.TheNonlinearConjugateGradientMethod
CGcanbeusednotonlytofindtheminimumpointofaquadraticform,buttominimizeanycontinuous
function
forwhichthegradient
canbecomputed.Applicationsincludeavarietyofoptimization
problems,suchasengineeringdesign,neuralnettraining,andnonlinearregression.
14.1.OutlineoftheNonlinearConjugateGradientMethod
ToderivenonlinearCG,therearethreechangestothelinearalgorithm:therecursiveformulafortheresidual
cannotbeused,itbecomesmorecomplicatedtocomputethestepsize
,andthereareseveraldifferent
choicesfor
.
InnonlinearCG,theresidualisalwayssettothenegationofthegradient;
.Thesearch
directionsarecomputedbyGram-SchmidtconjugationoftheresidualsaswithlinearCG.Performingaline
searchalongthissearchdirectionismuchmoredifficultthaninthelinearcase,andavarietyofprocedures
canbeused.AswiththelinearCG,avalueof
thatminimizes
isfoundbyensuring
thatthegradientisorthogonaltothesearchdirection.Wecanuseanyalgorithmthatfindsthezerosofthe
expression
.
InlinearCG,thereareseveralequivalentexpressionsforthevalueof
.InnonlinearCG,thesedifferent
expressionsarenolongerequivalent;researchersarestillinvestigatingthebestchoice.Twochoicesarethe
Fletcher-Reevesformula,whichweusedinlinearCGforitseaseofcomputation,andthePolak-Ribi
`
ere
formula:
1
1
1
1
1
1
TheFletcher-Reevesmethodconvergesifthestartingpointissufficientlyclosetothedesiredminimum,
whereasthePolak-Ribi
`
eremethodcan,inrarecases,cycleinfinitelywithoutconverging.However,Polak-
Ribi
`
ereoftenconvergesmuchmorequickly.
Fortunately,convergenceofthePolak-Ribi
`
eremethodcanbeguaranteedbychoosing
max
0
.
UsingthisvalueisequivalenttorestartingCGif
0.TorestartCGistoforgetthepastsearch
directions,andstartCGanewinthedirectionofsteepestdescent.
HereisanoutlineofthenonlinearCGmethod:
0
0
0
Find
thatminimizes
1
1
1
1
1
1
or
1
max
1
1
0
TheNonlinearConjugateGradientMethod43
1
1
1
NonlinearCGcomeswithfewoftheconvergenceguaranteesoflinearCG.Thelesssimilar
istoa
quadraticfunction,themorequicklythesearchdirectionsloseconjugacy.(Itwillbecomeclearshortlythat
theterm“conjugacy”stillhassomemeaninginnonlinearCG.)Anotherproblemisthatageneralfunction
mayhavemanylocalminima.CGisnotguaranteedtoconvergetotheglobalminimum,andmaynot
evenfindalocalminimumif
hasnolowerbound.
Figure37illustratesnonlinearCG.Figure37(a)isafunctionwithmanylocalminima.Figure37(b)
demonstratestheconvergenceofnonlinearCGwiththeFletcher-Reevesformula.Inthisexample,CGis
notnearlyaseffectiveasinthelinearcase;thisfunctionisdeceptivelydifficulttominimize.Figure37(c)
showsacross-sectionofthesurface,correspondingtothefirstlinesearchinFigure37(b).Noticethatthere
areseveralminima;thelinesearchfindsavalueof
correspondingtoanearbyminimum.Figure37(d)
showsthesuperiorconvergenceofPolak-Ribi
`
ereCG.
BecauseCGcanonlygenerate
conjugatevectorsinan
-dimensionalspace,itmakessensetorestart
CGevery
iterations,especiallyif
issmall.Figure38showstheeffectwhennonlinearCGisrestarted
everyseconditeration.(Forthisparticularexample,boththeFletcher-ReevesmethodandthePolak-Ribi
`
ere
methodappearthesame.)
14.2.GeneralLineSearch
Dependingonthevalueof
,itmightbepossibletouseafastalgorithmtofindthezerosof
.For
instance,if
ispolynomialin
,thenanefficientalgorithmforpolynomialzero-findingcanbeused.
However,wewillonlyconsidergeneral-purposealgorithms.
Twoiterativemethodsforzero-findingaretheNewton-RaphsonmethodandtheSecantmethod.Both
methodsrequirethat
betwicecontinuouslydifferentiable.Newton-Raphsonalsorequiresthatitbe
possibletocomputethesecondderivativeof
withrespectto
.
TheNewton-RaphsonmethodreliesontheTaylorseriesapproximation
0
2
2
2
2
0
(56)
2
2
(57)
where
istheHessianmatrix
2
1
1
2
1
2
2
1
2
2
1
2
2
2
2
2
.
.
.
.
.
.
.
.
.
2
1
2
2
2
44JonathanRichardShewchuk
-4
-2
0
2
4
6
-2
0
2
4
6
-250
0
250
500
-2
0
2
4
6
(a)
1
2
1
-4
-2
2
4
6
-2
2
4
6
1
(b)
2
0
-0.04
-0.02
0.02
0.04
-200
200
400
600
(c)
-4
-2
2
4
6
-2
2
4
6
1
(d)
2
0
Figure37:
ConvergenceofthenonlinearConjugateGradientMethod.(a)Acomplicatedfunctionwithmany
localminimaandmaxima.(b)ConvergencepathofFletcher-ReevesCG.UnlikelinearCG,convergence
doesnotoccurintwosteps.(c)Cross-sectionofthesurfacecorrespondingtothefirstlinesearch.(d)
ConvergencepathofPolak-Ribi`ereCG.
TheNonlinearConjugateGradientMethod45
-4
-2
2
4
6
-2
2
4
6
1
2
0
Figure38:
NonlinearCGcanbemoreeffectivewithperiodicrestarts.
-1
-0.5
0.5
1
1.5
2
-1
-0.75
-0.5
-0.25
0.25
0.5
0.75
1
Figure39:
TheNewton-Raphsonmethodforminimizingaone-dimensionalfunction(solidcurve).Starting
fromthepoint
,calculatethefirstandsecondderivatives,andusethemtoconstructaquadraticapproxi-
mationtothefunction(dashedcurve).Anewpoint
ischosenatthebaseoftheparabola.Thisprocedure
isiterateduntilconvergenceisreached.
46JonathanRichardShewchuk
Thefunction
isapproximatelyminimizedbysettingExpression57tozero,giving
ThetruncatedTaylorseriesapproximates
withaparabola;westeptothebottomoftheparabola
(seeFigure39).Infact,if
isaquadraticform,thenthisparabolicapproximationisexact,because
is
justthefamiliarmatrix
.Ingeneral,thesearchdirectionsareconjugateiftheyare
-orthogonal.The
meaningof“conjugate”keepschanging,though,because
varieswith
.Themorequickly
varies
with
,themorequicklythesearchdirectionsloseconjugacy.Ontheotherhand,thecloser
istothe
solution,theless
variesfromiterationtoiteration.Thecloserthestartingpointistothesolution,the
moresimilartheconvergenceofnonlinearCGistothatoflinearCG.
Toperformanexactlinesearchofanon-quadraticfunction,repeatedstepsmustbetakenalongtheline
until
iszero;hence,oneCGiterationmayincludemanyNewton-Raphsoniterations.Thevaluesof
and
mustbeevaluatedateachstep.Theseevaluationsmaybeinexpensiveif
canbe
analyticallysimplified,butifthefullmatrix
mustbeevaluatedrepeatedly,thealgorithmisprohibitively
slow.Forsomeapplications,itispossibletocircumventthisproblembyperforminganapproximateline
searchthatusesonlythediagonalelementsof
.Ofcourse,therearefunctionsforwhichitisnotpossible
toevaluate
atall.
Toperformanexactlinesearchwithoutcomputing
,theSecantmethodapproximatesthesecond
derivativeof
byevaluatingthefirstderivativeatthedistinctpoints
0and
,where
is
anarbitrarysmallnonzeronumber:
2
2
0
0
(58)
whichbecomesabetterapproximationtothesecondderivativeas
and
approachzero.Ifwesubstitute
Expression58forthethirdtermoftheTaylorexpansion(Equation56),wehave
Minimize
bysettingitsderivativetozero:
(59)
LikeNewton-Raphson,theSecantmethodalsoapproximates
withaparabola,butinstead
ofchoosingtheparabolabyfindingthefirstandsecondderivativeatapoint,itfindsthefirstderivativeat
twodifferentpoints(seeFigure40).Typically,wewillchooseanarbitrary
onthefirstSecantmethod
iteration;onsubsequentiterationswewillchoose
tobethevalueof
fromthepreviousSecant
methoditeration.Inotherwords,ifwelet
denotethevalueof
calculatedduringSecantiteration
,
then
1
.
BoththeNewton-RaphsonandSecantmethodsshouldbeterminatedwhen
isreasonablyclosetothe
exactsolution.Demandingtoolittleprecisioncouldcauseafailureofconvergence,butdemandingtoo
fineprecisionmakesthecomputationunnecessarilyslowandgainsnothing,becauseconjugacywillbreak
TheNonlinearConjugateGradientMethod47
-1
1
2
3
4
-1.5
-1
-0.5
0.5
1
Figure40:
TheSecantmethodforminimizingaone-dimensionalfunction(solidcurve).Startingfromthe
point
,calculatethefirstderivativesattwodifferentpoints(here,
0and
2),andusethemto
constructaquadraticapproximationtothefunction(dashedcurve).Noticethatbothcurveshavethesame
slopeat
0,andalsoat
2.AsintheNewton-Raphsonmethod,anewpoint
ischosenatthebase
oftheparabola,andtheprocedureisiterateduntilconvergence.
downquicklyanywayif
variesmuchwith
.Therefore,aquickbutinexactlinesearchisoften
thebetterpolicy(forinstance,useonlyafixednumberofNewton-RaphsonorSecantmethoditerations).
Unfortunately,inexactlinesearchmayleadtotheconstructionofasearchdirectionthatisnotadescent
direction.Acommonsolutionistotestforthiseventuality(is
nonpositive?),andrestartCGifnecessary
bysetting
.
Abiggerproblemwithbothmethodsisthattheycannotdistinguishminimafrommaxima.Theresult
ofnonlinearCGgenerallydependsstronglyonthestartingpoint,andifCGwiththeNewton-Raphsonor
Secantmethodstartsnearalocalmaximum,itislikelytoconvergetothatpoint.
Eachmethodhasitsownadvantages.TheNewton-Raphsonmethodhasabetterconvergencerate,andis
tobepreferredif
canbecalculated(orwellapproximated)quickly(i.e.,in
time).TheSecant
methodonlyrequiresfirstderivativesof
,butitssuccessmaydependonagoodchoiceoftheparameter
.
Itiseasytoderiveavarietyofothermethodsaswell.Forinstance,bysampling
atthreedifferentpoints,
itispossibletogenerateaparabolathatapproximates
withouttheneedtocalculateevenafirst
derivativeof
.
14.3.Preconditioning
NonlinearCGcanbepreconditionedbychoosingapreconditioner
thatapproximates
andhasthe
propertythat
1
iseasytocompute.InlinearCG,thepreconditionerattemptstotransformthequadratic
formsothatitissimilartoasphere;anonlinearCGpreconditionerperformsthistransformationforaregion
near
.
EvenwhenitistooexpensivetocomputethefullHessian
,itisoftenreasonabletocomputeitsdiagonal
48JonathanRichardShewchuk
-4
-2
2
4
6
-2
2
4
6
1
2
0
Figure41:
ThepreconditionednonlinearConjugateGradientMethod,usingthePolak-Ribi`ereformulaand
adiagonalpreconditioner.Thespacehasbeen“stretched”toshowtheimprovementincircularityofthe
contourlinesaroundtheminimum.
foruseasapreconditioner.However,beforewarnedthatif
issufficientlyfarfromalocalminimum,the
diagonalelementsoftheHessianmaynotallbepositive.Apreconditionershouldbepositive-definite,so
nonpositivediagonalelementscannotbeallowed.Aconservativesolutionistonotprecondition(set
)
whentheHessiancannotbeguaranteedtobepositive-definite.Figure41demonstratestheconvergenceof
diagonallypreconditionednonlinearCG,withthePolak-Ribi
`
ereformula,onthesamefunctionillustrated
inFigure37.Here,Ihavecheatedbyusingthediagonalof
atthesolutionpoint
topreconditionevery
iteration.
ANotes
ConjugateDirectionmethodswereprobablyfirstpresentedbySchmidt[14]in1908,andwereindependently
reinventedbyFox,Huskey,andWilkinson[7]in1948.Intheearlyfifties,themethodofConjugateGradients
wasdiscoveredindependentlybyHestenes[10]andStiefel[15];shortlythereafter,theyjointlypublished
whatisconsideredtheseminalreferenceonCG[11].ConvergenceboundsforCGintermsofChebyshev
polynomialsweredevelopedbyKaniel[12].AmorethoroughanalysisofCGconvergenceisprovidedby
vanderSluisandvanderVorst[16].CGwaspopularizedasaniterativemethodforlarge,sparsematrices
byReid[13]in1971.
CGwasgeneralizedtononlinearproblemsin1964byFletcherandReeves[6],basedonworkby
Davidon[4]andFletcherandPowell[5].ConvergenceofnonlinearCGwithinexactlinesearcheswas
analyzedbyDaniel[3].Thechoiceof
fornonlinearCGisdiscussedbyGilbertandNocedal[8].
AhistoryandextensiveannotatedbibliographyofCGtothemid-seventiesisprovidedbyGoluband
O’Leary[9].Mostresearchsincethattimehasfocusedonnonsymmetricsystems.Asurveyofiterative
methodsforthesolutionoflinearsystemsisofferedbyBarrettetal.[1].
CannedAlgorithms49
BCannedAlgorithms
Thecodegiveninthissectionrepresentsefficientimplementationsofthealgorithmsdiscussedinthisarticle.
B1.SteepestDescent
Giventheinputs
,
,astartingvalue
,amaximumnumberofiterations
,andanerrortolerance
1:
0
0
While
and
2
0
do
If
isdivisibleby50
else
1
Thisalgorithmterminateswhenthemaximumnumberofiterations
hasbeenexceeded,orwhen
0
.
Thefastrecursiveformulafortheresidualisusuallyused,butonceevery50iterations,theexactresidual
isrecalculatedtoremoveaccumulatedfloatingpointerror.Ofcourse,thenumber50isarbitrary;forlarge
,
mightbeappropriate.Ifthetoleranceislarge,theresidualneednotbecorrectedatall(inpractice,
thiscorrectionisrarelyused).Ifthetoleranceisclosetothelimitsofthefloatingpointprecisionofthe
machine,atestshouldbeaddedafter
isevaluatedtocheckif
2
0
,andifthistestholdstrue,theexact
residualshouldalsoberecomputedand
reevaluated.Thispreventstheprocedurefromterminatingearly
duetofloatingpointroundofferror.
50JonathanRichardShewchuk
B2.ConjugateGradients
Giventheinputs
,
,astartingvalue
,amaximumnumberofiterations
,andanerrortolerance
1:
0
0
While
and
2
0
do
If
isdivisibleby50
else
1
SeethecommentsattheendofSectionB1.
CannedAlgorithms51
B3.PreconditionedConjugateGradients
Giventheinputs
,
,astartingvalue
,a(perhapsimplicitlydefined)preconditioner
,amaximum
numberofiterations
,andanerrortolerance
1:
0
1
0
While
and
2
0
do
If
isdivisibleby50
else
1
1
Thestatement“
1
”impliesthatoneshouldapplythepreconditioner,whichmaynotactually
beintheformofamatrix.
SeealsothecommentsattheendofSectionB1.
52JonathanRichardShewchuk
B4.NonlinearConjugateGradientswithNewton-RaphsonandFletcher-Reeves
Givenafunction
,astartingvalue
,amaximumnumberofCGiterations
,aCGerrortolerance
1,amaximumnumberofNewton-Raphsoniterations
,andaNewton-Raphsonerrortolerance
1:
0
0
0
While
and
2
0
do
0
Do
1
while
and
2
2
1
If
or
0
0
1
Thisalgorithmterminateswhenthemaximumnumberofiterations
hasbeenexceeded,orwhen
0
.
EachNewton-Raphsoniterationadds
to
;theiterationsareterminatedwheneachupdate
falls
belowagiventolerance(
),orwhenthenumberofiterationsexceeds
.Afastinexactline
searchcanbeaccomplishedbyusingasmall
and/orbyapproximatingtheHessian
withits
diagonal.
NonlinearCGisrestarted(bysetting
)wheneverasearchdirectioniscomputedthatisnota
descentdirection.Itisalsorestartedonceevery
iterations,toimproveconvergenceforsmall
.
Thecalculationof
mayresultinadivide-by-zeroerror.Thismayoccurbecausethestartingpoint
0
isnotsufficientlyclosetothedesiredminimum,orbecause
isnottwicecontinuouslydifferentiable.In
theformercase,thesolutionistochooseabetterstartingpointoramoresophisticatedlinesearch.Inthe
lattercase,CGmightnotbethemostappropriateminimizationalgorithm.
CannedAlgorithms53
B5.PreconditionedNonlinearConjugateGradientswithSecantandPolak-Ribi
`
ere
Givenafunction
,astartingvalue
,amaximumnumberofCGiterations
,aCGerrortolerance
1,aSecantmethodstepparameter
0
,amaximumnumberofSecantmethoditerations
,anda
Secantmethoderrortolerance
1:
0
0
Calculateapreconditioner
1
0
While
and
2
0
do
0
0
0
Do
1
while
and
2
2
Calculateapreconditioner
1
1
If
or
0
0
else
1
Thisalgorithmterminateswhenthemaximumnumberofiterations
hasbeenexceeded,orwhen
0
.
EachSecantmethoditerationadds
to
;theiterationsareterminatedwheneachupdate
fallsbelow
agiventolerance(
),orwhenthenumberofiterationsexceeds
.Afastinexactlinesearchcan
beaccomplishedbyusingasmall
.Theparameter
0
determinesthevalueof
inEquation59forthe
firststepofeachSecantmethodminimization.Unfortunately,itmaybenecessarytoadjustthisparameter
54JonathanRichardShewchuk
toachieveconvergence.
ThePolak-Ribi
`
ere
parameteris
1
1
1
1
1
1
1
.Caremust
betakenthatthepreconditioner
isalwayspositive-definite.Thepreconditionerisnotnecessarilyinthe
formofamatrix.
NonlinearCGisrestarted(bysetting
)wheneverthePolak-Ribi
`
ere
parameterisnegative.Itis
alsorestartedonceevery
iterations,toimproveconvergenceforsmall
.
NonlinearCGpresentsseveralchoices:Preconditionedornot,Newton-RaphsonmethodorSecant
methodoranothermethod,Fletcher-ReevesorPolak-Ribi
`
ere.Itshouldbepossibletoconstructanyofthe
variationsfromtheversionspresentedabove.(Polak-Ribi
`
ereisalmostalwaystobepreferred.)
CUglyProofs
C1.TheSolutionto
MinimizestheQuadraticForm
Suppose
issymmetric.Let
beapointthatsatisfies
andminimizesthequadraticform
(Equation3),andlet
beanerrorterm.Then
1
2
(byEquation3)
1
2
1
2
(bysymmetryof
)
1
2
1
2
1
2
If
ispositive-definite,thenthelattertermispositiveforall
0;therefore
minimizes
.
C2.ASymmetricMatrixHas
OrthogonalEigenvectors.
Anymatrixhasatleastoneeigenvector.Toseethis,notethatdet
isapolynomialin
,and
thereforehasatleastonezero(possiblycomplex);callit
.Thematrix
hasdeterminantzero
andisthereforesingular,sotheremustbeanonzerovector
forwhich
0.Thevector
is
aneigenvector,because
.
Anysymmetricmatrixhasafullcomplementof
orthogonaleigenvectors.Toprovethis,Iwill
demonstrateforthecaseofa4
4matrix
andletyoumaketheobviousgeneralizationtomatricesof
arbitrarysize.Ithasalreadybeenproventhat
hasatleastoneeigenvector
witheigenvalue
.Let
1
,whichhasunitlength.Choosethreearbitraryvectors
2
3
4
sothatthe
aremutually
orthogonalandallofunitlength(suchvectorscanbefoundbyGram-Schmidtorthogonalization).Let
1
2
3
4
.Becausethe
areorthonormal,
,andso
1
.Notealsothatfor
UglyProofs55
1,
1
1
1
0.Thus,
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
000
0
0
0
where
isa3
3symmetricmatrix.
musthavesomeeigenvalue
witheigenvalue
.Let
bea
vectoroflength4whosefirstelementiszero,andwhoseremainingelementsaretheelementsof
.Clearly,
1
000
0
0
0
Inotherwords,
,andthus
isaneigenvectorof
.Furthermore,
1
1000
0,so
1
and
areorthogonal.Thus,
hasatleasttwoorthogonaleigenvectors!
Themoregeneralstatementthatasymmetricmatrixhas
orthogonaleigenvectorsisprovenbyinduction.
Fortheexampleabove,assumeany3
3matrix(suchas
)has3orthogonaleigenvectors;callthem
1
,
2
,and
3
.Then
1
,
2
,and
3
areeigenvectorsof
,andtheyareorthogonalbecause
hasorthonormalcolumns,andthereforemapsorthogonalvectorstoorthogonalvectors.Thus,
has4
orthogonaleigenvectors.
C3.OptimalityofChebyshevPolynomials
ChebyshevpolynomialsareoptimalforminimizationofexpressionslikeExpression50becausethey
increaseinmagnitudemorequicklyoutsidetherange
1
1
thananyotherpolynomialthatisrestrictedto
havemagnitudenogreaterthanoneinsidetherange
1
1
.
TheChebyshevpolynomialofdegree
,
1
2
2
1
2
1
canalsobeexpressedontheregion
1
1
as
cos
cos
1
1
1
Fromthisexpression(andfromFigure32),onecandeducethattheChebyshevpolynomialshavethe
property
1
1
1
andoscillaterapidlybetween
1and1:
cos
1
0
1
56JonathanRichardShewchuk
Noticethatthe
zerosof
mustfallbetweenthe
1extremaof
intherange
1
1
.Foranexample,
seethefivezerosof
5
inFigure32.
Similarly,thefunction
2
oscillatesintherange
1
onthedomain
.
alsosatisfiestherequirement
that
0
1.
Theproofreliesonacutetrick.Thereisnopolynomial
ofdegree
suchthat
0
1and
isbetterthan
ontherange
.Toprovethis,supposethatthereissuchapolynomial;then,
1
ontherange
.Itfollowsthatthepolynomial
hasazero
at
0,andalsohas
zerosontherange
.Hence,
isapolynomialofdegree
with
atleast
1zeros,whichisimpossible.Bycontradiction,IconcludethattheChebyshevpolynomialof
degree
optimizesExpression50.
DHomework
Forthefollowingquestions,assume(unlessotherwisestated)thatyouareusingexactarithmetic;thereis
nofloatingpointroundofferror.
1.(Easy.)ProvethatthepreconditionedConjugateGradientmethodisnotaffectedifthepreconditioning
matrixisscaled.Inotherwords,foranynonzeroconstant
,showthatthesequenceofsteps
0
1
takenusingthepreconditioner
isidenticaltothestepstakenusingthepreconditioner
.
2.(Hard,butinteresting.)Supposeyouwishtosolve
forasymmetric,positive-definite
matrix
.Unfortunately,thetraumaofyourlinearalgebracoursehascausedyoutorepress
allmemoriesoftheConjugateGradientalgorithm.Seeingyouindistress,theGoodEigenfairy
materializesandgrantsyoualistofthe
distincteigenvalues(butnottheeigenvectors)of
.
However,youdonotknowhowmanytimeseacheigenvalueisrepeated.
Cleverpersonthatyouare,youmumbledthefollowingalgorithminyoursleepthismorning:
Chooseanarbitrarystartingpoint
0
For
0to
1
Removeanarbitraryeigenvaluefromthelistandcallit
1
1
Noeigenvalueisusedtwice;ontermination,thelistisempty.
(a)Showthatuponterminationofthisalgorithm,
isthesolutionto
.Hint:ReadSection
6.1carefully.Graphtheconvergenceofatwo-dimensionalexample;drawtheeigenvectoraxes,
andtryselectingeacheigenvaluefirst.(Itismostimportanttointuitivelyexplainwhatthe
algorithmisdoing;ifyoucan,alsogiveequationsthatproveit.)
Homework57
(b)Althoughthisroutinefindstheexactsolutionafter
iterations,youwouldlikeeachintermediate
iterate
tobeasclosetothesolutionaspossible.Giveacruderuleofthumbforhowyou
shouldchooseaneigenvaluefromthelistoneachiteration.(Inotherwords,inwhatorder
shouldtheeigenvaluesbeused?)
(c)Whatcouldgoterriblywrongwiththisalgorithmiffloatingpointroundoffoccurs?
(d)Givearuleofthumbforhowyoushouldchooseaneigenvaluefromthelistoneachiterationto
preventfloatingpointroundofferrorfromescalating.Hint:Theanswerisnotthesameasthat
ofquestion(b).
58JonathanRichardShewchuk
References
[1]RichardBarrett,MichaelBerry,TonyChan,JamesDemmel,JuneDonato,JackDongarra,Victor
Eijkhout,RoldanPozo,CharlesRomine,andHenkvanderVorst,TemplatesfortheSolutionofLinear
Systems:BuildingBlocksforIterativeMethods,SIAM,Philadelphia,Pennsylvania,1993.
[2]WilliamL.Briggs,AMultigridTutorial,SIAM,Philadelphia,Pennsylvania,1987.
[3]JamesW.Daniel,ConvergenceoftheConjugateGradientMethodwithComputationallyConvenient
Modifications,NumerischeMathematik10(1967),125–131.
[4]W.C.Davidon,VariableMetricMethodforMinimization,Tech.ReportANL-5990,ArgonneNational
Laboratory,Argonne,Illinois,1959.
[5]R.FletcherandM.J.D.Powell,ARapidlyConvergentDescentMethodforMinimization,Computer
Journal6(1963),163–168.
[6]R.FletcherandC.M.Reeves,FunctionMinimizationbyConjugateGradients,ComputerJournal7
(1964),149–154.
[7]L.Fox,H.D.Huskey,andJ.H.Wilkinson,NotesontheSolutionofAlgebraicLinearSimultaneous
Equations,QuarterlyJournalofMechanicsandAppliedMathematics1(1948),149–173.
[8]JeanCharlesGilbertandJorgeNocedal,GlobalConvergencePropertiesofConjugateGradient
MethodsforOptimization,SIAMJournalonOptimization2(1992),no.1,21–42.
[9]GeneH.GolubandDianneP.O’Leary,SomeHistoryoftheConjugateGradientandLanczosAlgo-
rithms:1948–1976,SIAMReview31(1989),no.1,50–102.
[10]MagnusR.Hestenes,IterativeMethodsforSolvingLinearEquations,JournalofOptimizationTheory
andApplications11(1973),no.4,323–334,Originallypublishedin1951asNAMLReportNo.52-9,
NationalBureauofStandards,Washington,D.C.
[11]MagnusR.HestenesandEduardStiefel,MethodsofConjugateGradientsforSolvingLinearSystems,
JournalofResearchoftheNationalBureauofStandards49(1952),409–436.
[12]ShmuelKaniel,EstimatesforSomeComputationalTechniquesinLinearAlgebra,Mathematicsof
Computation20(1966),369–378.
[13]JohnK.Reid,OntheMethodofConjugateGradientsfortheSolutionofLargeSparseSystemsof
LinearEquations,LargeSparseSetsofLinearEquations(LondonandNewYork)(JohnK.Reid,ed.),
AcademicPress,LondonandNewYork,1971,pp.231–254.
[14]E.Schmidt,Titleunknown,RendicontidelCircoloMatematicodiPalermo25(1908),53–77.
[15]EduardStiefel,
¨
UbereinigeMethodenderRelaxationsrechnung,Zeitschriftf
¨
urAngewandteMathe-
matikundPhysik3(1952),no.1,1–33.
[16]A.vanderSluisandH.A.vanderVorst,TheRateofConvergenceofConjugateGradients,Numerische
Mathematik48(1986),no.5,543–560.
Автор
Redmegaman
Документ
Категория
Другое
Просмотров
33
Размер файла
489 Кб
Теги
Газовая динамика, математические методы, гидродинамика, method, the, 1994, without, introduction, pain, shewchuk, gradient, conjugate, agonizing
1/--страниц
Пожаловаться на содержимое документа