close

Вход

Забыли?

вход по аккаунту

?

Coevolving game st r at egies: How t o win and how t o lose

код для вставки
Coevolving game st r at egies:
How t o win and how t o lose
Dr Evan J . Hughes
Senior Lect ur er
Radar Syst ems Gr oup
Dept ar t ment of Aer ospace, Power , and Sensor s,
Cr anf ield Univer sit y,
Royal Milit ar y College of Science,
Shr ivenham, Swindon, England.
ej hughes@iee.or g
Abst r act
• This t ut or ial descr ibes t he applicat ion of co-evolut ion t o t wo
player games in or der t o discover ef f ect ive value f unct ions.
• The t ut or ial will descr ibe t he r equir ement s f or co-evolving
value f unct ions and t ypical co-evolut ionar y algor it hm
st r uct ur es, wit h an emphasis on implement at ion issues.
• The main case st udy will look at t he ef f ect s of dif f er ent
value f unct ion st r uct ur es f or t he game of checker s, and give
examples of st r uct ur es t hat have been used f or ot her
games.
• The t ut or ial will examine value f unct ion st r uct ur es t hat look
ver y pr omising init ially, but ar e ext r emely dif f icult t o evolve
- how t o lose.
• We will t hen f ocus on st r uct ur es t hat wor k, and descr ibe
why t hey wor k - how t o evolve t o win.
2
Out line
•
•
•
•
•
Classic Player Algor it hms
Value Funct ions
Typical Evolut ionar y Algor it hm St r uct ur es
Value f unct ions - Winner s and Loser s
Case st udy of Checker s
3
Player Algor it hms
Dir ect game playing
• A f unct ion is used t o decide which move
should be made next , given t he cur r ent
boar d st at e, and possibly some hist or y of
t he play-t o-dat e.
• Genet ic pr ogr amming has been used t o
cr eat e f unct ions (e.g. Ahlschwede wit h
Mancala), but t hey would have t o be
incr edibly complex t o achieve st r ong play.
5
Minimax game playing
• Minimax is an exhaust ive sear ch st r at egy t hat
expands t he game t r ee f r om t he cur r ent boar d
posit ion f or a f ixed number of piece moves (ply).
• The st r at egy assumes t he opponent is going t o
make t he wor st possible move f or t he player and is
t her ef or e pessimist ic.
• As t he game t r ee can r ar ely be expanded t o an
act ual win/ dr aw/ lose decision, a f unct ion has t o be
used t o appr oximat e t he ant icipat ed payof f .
6
Mont e-Car lo Met hods
• For complex games such as Go, t he game
t r ee is t oo vast and minimax is less
pr act ical, so t he t r ee is j ust sampled.
• The Mont e-Car lo appr oach is t o make many
t housands of r andom move sequences,
st ar t ing wit h each of t he possible moves
f r om t he cur r ent boar d posit ion.
• The expect ed value of each of t he cur r ent
moves is t hen appr oximat ed and t he
highest appar ent payof f t aken.
7
Player algor it hms - Summar y
• Dir ect game playing is only r eally suit able f or t he
simplest of games (t ic-t ac-t oe et c.) and wher e
high speed decisions ar e r equir ed.
• Minimiax is t he wor khor se of t he games indust r y.
The play per f or mance is gover ned by t he qualit y
of t he evaluat ion f unct ion, and t he dept h of t r ee
t hat is sear ched.
• Mont e-Car lo met hods ar e ver y pr ocessor
int ensive, but can cr eat e good play on complex
games.
• All met hods can be enhanced wit h opening book
move and end-game dat abases.
8
Value Funct ions f or MiniMax
Hand-Coded Heur ist ic f unct ions
• Classically t he payof f f or each boar d conf igur at ion
has been decided using hand-t uned heur ist ics.
• St r uct ur e is of t en a weight ed sum wit h one player
posit ive, t he opponent negat ive and a �dr aw’
r epr esent ed by zer o.
• Key st r at egies f or many games ar e mobilit y and piece
dif f er ent ial i.e. who has t he lar gest quant it y of
usef ul pieces t hat can be ut ilised at shor t not ice.
• Classic examples:
– Chess - Deep blue
– Checker s - Chinook
10
Genet ic Pr ogr ammes
• Genet ic pr ogr amming has been used t o evolve f ast
evaluat ion f unct ions.
• The chr omosomes ar e of t en t r ee st r uct ur es
compr ised of f unct ion nodes and inf or mat ion
leaves, e.g. nodes such as +-* / <>= I f _t hen_else and
leaves being const ant values, or st at es of given
boar d locat ions.
• Lar ge t r ees ar e of t en gener at ed and car ef ul
choice of node and leaf t ypes is impor t ant .
• Poor choice of leaf r epr esent at ions can make it
dif f icult t o r ecognise complex piece r elat ionships.
11
Piece Count er s
• Based par t ly on heur ist ic knowledge t hat
t he piece dif f er ence is f undament al.
• Weight ed piece count er s give each boar d
locat ion a weight .
• The weight s may be t ime dependent , or
change wit h who is pr edict ed t o win (e.g
aggr essive play if winning, def ensive if
losing).
• Evaluat ion is of t en f ast , but play can be
�mechanical’.
12
Ar t if icial Neur al Net wor ks
• ANNs allow non-linear piece-count er value
f unct ions t o be const r uct ed.
• Bot h Evolut ionar y and Reinf or cement
lear ning met hods have been used t o
successf ully t r ain net wor ks.
• The non-linear it y of t en allows mor e
�inspir ing’ play t o be expr essed, when
compar ed t o linear piece count er s.
• For lar ge net wor ks, t he pr ocessing bur den
can be ver y high.
13
Boar d Repr esent at ions
• I n many games, dif f er ent piece t ypes exist wit h
dif f er ent char act er ist ics.
• Ther e ar e many dif f er ent ways t o r epr esent t he
r elat ive mer it of t he dif f er ent pieces wit hin t he
evaluat ion, f or example, in checker s, a king is
of t en included as equivalent t o 1.3 men. A Queen
in chess is of t en r epr esent ed as wor t h 9 pawns.
• I n essence, t he evaluat ion is a mult i-obj ect ive
pr oblem!
14
Mat hemat ical Descr ipt ion
• The Ut ilit y f unct ion is of t en def ined as 1 f or a
win, 0 f or a dr aw and -1 f or a loss.
• At each boar d conf igur at ion, t he Expect ed Value
of t he ut ilit y f unct ion is t he pr obabilit y weight ed
aver age of t he possible ut ilit y values,
– eg. E=0.6(1) + 0.3 (0) + 0.1 (-1) = 0.5
– Est imat ing pr obabilit ies is dif f icult (eg wit h
Mont e Car lo)
• Mat er ial Value is of t en what is calculat ed
– M = 0.7(sum(b)-sum(w)) + 0.2(sum(back_r ank))
+...
15
Evolut ionar y St r uct ur es
Nat ur e of Evaluat or Evolut ion
• The obj ect ive is of t en calculat ed t hr ough
coevolut ion as no specif ic t r aining dat a is available.
• The evaluat ion is usually t he aggr egat e of a small
number of games and is t her ef or e slow.
• The coevolut ionar y pr ocess leads t o a noisy
obj ect ive f unct ion.
• Of t en par allel EA algor it hms ar e r equir ed.
• Gener alisat ion of player s is desir ed, r at her t han
lear ning t o def eat a single opponent .
• A small amount of r andomness is usually added t o
t he evaluat or t o pr event st agnat ion t hr ough
excessive number s of dr aws.
17
EAs f or Noisy Co-evolut ion
• Simple Evolut ionar y Pr ogr ammes ar e usef ul
as t hey wor k well on r ugged/ noisy f it ness
landscapes. The added advant age is t hey
ar e simple.
• Evolut ionar y st r at egies have also been
used, alt hough t he self adapt ion of t he
mut at ion sizes can be upset by t he noise.
• Gener ally, st eady st at e wit h binar y
t our nament or populat ions t hat r eplace only
a small pr opor t ion ar e mor e consist ent .
18
Obj ect ive Funct ions
• Examples of t r aining evaluat or s t o r e-cr eat e
decisions based on opening book moves of t en lead
t o non-gener alised player s - be car ef ul.
• Of t en say 5 games ar e played and an aggr egat e
scor e given - mor e games = less noise = slower
evolut ion.
• Of t en win=1, dr aw =0, loss = -1 or similar
• Alt er nat ives based on number of moves t o win/ loss
can be ef f ect ive at impr oving gr adient , but can
pot ent ially modif y play behaviour .
19
Value Funct ions - How t o Cr eat e
Winner s and Loser s
Value f unct ion Requir ement s - Winner s
• The f ast er it can be pr ocessed, t he deeper
t he minimax sear ch can be
• The mor e non-linear t he r epr esent at ion,
t he mor e scope t her e is f or int er est ing
play.
• BUT
– The f unct ion must be evolvable ….
21
Heur ist ics - Winner s
• The mor e heur ist ic and exper t knowledge
t hat can be incor por at ed indir ect ly, t he
bet t er t he play is likely t o be.
• Pr ime example of key heur ist ic knowledge
is adding piece dif f er ence t o t he f inal
out put t hr ough a single evolved weight .
• Piece dif f er ence can be evolved dir ect ly in
a weight ed piece count er , but it is not
easy.
22
Obser vabilit y - Loser s
• The evaluat or can only be t r ained t o elicit an
appr opr iat e r esponse t o a given st at e if it has
eit her :
– visit ed t he st at e r egular ly dur ing evolut ion
– visit ed a similar st at e and bot h st at es elicit a
ver y dominant r esponse (eg. Def init e win/ loss)
• Any def iciencies in t he t r aining st at es will lead t o
some par amet er s having a ver y low evolut ionar y
dr ive t o set t le at specif ic values.
• These par amet er s may dr if t er r at ically, or
conver ge t o an ar bit r ar y value.
23
Obj ect ive Gr adient - Loser s
• I f minor changes in t he evaluat or do not elicit
changes in t he obj ect ive f unct ion t hat st and pr oud
of t he noise of t en enough, t he evolut ionar y pr ocess
will be ver y slow and er r at ic, if it ever st ar t s at all.
• Of t en t he pr oblem can be over come by
�boot st r apping’ t he evaluat or wit h some heur ist ic
knowledge - eg. 1 gene t o cont r ol t he degr ee of
piece dif f er ence added in.
• I f t he gr adient is low, all init ial r andom evaluat or s
ar e awf ul and most games ar e ver y shor t - t his
compounds t he st ar t up pr oblem as many gene values
t hen have low obser vabilit y.
24
Examples of Winner s
• Genet ic pr ogr ammes which have piece dif f er ence
and ot her key heur ist ics available dir ect ly as leaf
nodes.
• Neur al net wor ks wit h heur ist ic values (e.g. piece
dif f er ence) f ed t o out put neur on dir ect ly t hr ough
a single weight .
• Gener ally, t he shor t er t he chr omosome, t he
smaller t he decision space, t he mor e successf ul
t he evolut ion can be.
• Pat t er n mat ching st r uct ur es such as convolut ional
neur al net wor ks can be ef f ect ive at cover ing lar ge
boar ds wit h f ew genes.
25
Examples of Loser s
• Genet ic pr ogr amming wher e t he only leaf nodes
t hat r ef er t o t he boar d ar e t he value of individual
t iles: t o r e-cr eat e piece dif f er ence r equir es a
lar ge complex chr omosome.
• ANNs wit hout any heur ist ic advant ages: even
t hough each hidden neur on can be evolved t o be a
piece count er , it is not easy. A second hidden
layer will of t en st op evolut ion unless heur ist ic
knowledge is applied.
26
Case St udy - Checker s
Case St udy - Checker s
• Two t r ials:
– Simple weight ed piece count er -st r at egy
evolut ion pr oblem
– Complex ar t if icial neur al net wor k
28
St r at egy evolut ion pr oblem
• Many at t empt s at evolving player s f or
games have used piece dif f er ence dir ect ly,
or enf or ced symmet r y in t he evaluat ion
f unct ion.
• How simple is it t o evolve t he ant isymmet r y r equir ed in a simple weight ed
piece count er pr ocess?
• How well can a simple weight ed piece
count er play?
29
Boar d st r uct ur e f or checker s
1
5
2
6
9
13
10
17
12
16
19
23
26
30
8
15
22
4
11
18
25
29
7
14
21
3
20
24
27
31
28
32
Black moves first, only diagonal moves. If a jump available, it
must be taken. Draw called after 100 moves each player.
30
Checker s and minimax
• A minimax st r at egy wit h D-E pr uning was used, and
r un t o 4 ply in t he EA. No ply ext ension was used
if t he sear ch t r ee ended on a f or ced move.
• The evaluat ion f unct ion consist ed of f our weight s
f or each boar d locat ion, depending on whet her a
black or whit e man or king was pr esent . The sum
of t he act ive weight s was passed t hr ough t anh() t o
limit t o В±1.
• A small amount of r andomness was added t o help
pr event play becoming st uck in r epet it ive loops.
31
EA st r uct ur e - piece count er
• A par allel ES(100+100) was used wit h each set of
weight s playing a t ot al of 5 games against r andom
player s. A f ar ming model was used wit h t he
populat ion spr ead acr oss 6 DEC Alpha pr ocessor s.
• The chr omosome consist ed of 120 weight s
(28+32+32+28 allowing f or t he f ar r ow wher e men
ar e pr omot ed t o kings)
• A win scor ed 1, dr aw scor ed 0 and loss scor ed -2.
• Games wer e played as black or whit e at r andom.
• A t ot al of 7388 gener at ions wer e used t o evolve
t he set of weight s.
32
Weight Dist r ibut ions af t er 100 Gener at ions
weights, view from black squares - men (white dashed)
1
weight value
0.5
0
-0.5
-1
5
10
15
square number
20
25
30
weights, view from black squares - kings (white dashed)
1
weight value
0.5
0
-0.5
-1
5
10
15
square number
20
25
30
33
Rot at ed-r ever sed weight dist r ibut ions af t er 100 gener at ions
weights, view from black squares - men (white dashed)
0.8
weight value
0.6
0.4
0.2
0
0
5
15
square number
20
25
30
weights, view from black squares - kings (white dashed)
1
weight value
10
0.5
0
-0.5
0
5
10
15
20
square number
25
30
35
34
Rot at ed-r ever sed weight dist r ibut ions af t er 7388 gener at ions
weights, view from black squares - men (white dashed)
1.6
weight value
1.4
1.2
1
0.8
0.6
0
5
15
square number
20
25
30
weights, view from black squares - kings (white dashed)
1.5
weight value
10
1
0.5
0
0
5
10
15
20
square number
25
30
35
35
Summar y of weight evolut ion exper iment
• Af t er 100 gener at ions, A r ot at ional ant i-symmet r y
is evolving wit h t he whit e weight s becoming
negat ive.
• Af t er 7388 gener at ions, t he magnit ude of t he
equivalent weight ing of black and whit e squar es is
ver y close.
• The weight s of t he kings have not evolved as well
as t he men, pr obably as t hey ar e seldom in play.
• The back r ank of each player is f avour ed, wit h
play quit e unif or m acr oss ot her ar eas.
• Tr y evolut ion wit h r ot at ional ant i-symmet r y
enf or ced
• A second EA t hat enf or ced r ot at ional ant isymmet r y was used wit h 60 weight s and r an f or
16593 gener at ions
36
Weight s wit h enf or ced symmet r y
weights, view from black squares - men
1.6
weight value
1.4
1.2
1
0.8
0
5
10
20
25
30
weights, view from black squares - kings
2
weight value
15
square number
1.5
1
0.5
0
5
10
15
20
square number
25
30
35
37
King Weight s Relat ive t o Men
King/Man, view from black squares
1.8
1.6
weight ratio
1.4
1.2
1
0.8
0.6
0.4
0
5
10
15
square number
20
25
30
38
Piece locat ion f or black men
Piece location map for black men
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
39
Piece locat ion f or black kings
Piece location map for black kings
1
2
3
4
5
6
7
8
1
2
3
4
5
6
7
8
40
Compar ison wit h or iginal t r ial
weights, view from black squares - men (trial 1 dashed)
1.6
weight value
1.4
1.2
1
0.8
0.6
5
10
15
square number
20
25
weights, view from black squares - kings (trial 1 dashed)
2
weight value
1.5
1
0.5
0
5
10
15
square number
20
'DVKHG OLQH LV PD[LPXP RI EODFNZKLWH IRU RULJLQDO WULDO
25
41
Summar y of 2 nd weight evolut ion exper iment
• Wit h ant i-symmet r y enf or ced, t he playing abilit y
evolved much f ast er .
• The weight s f or t he kings st ill seem f ar less
evolved t han f or t he men.
• The aver age king weight wit h r espect t o a man is
1.23, alt hough it appear s t o incr ease as play moves
t owar ds t he opponent s ar ea.
• The values of t he weight s f r om bot h exper iment s
have a similar pr of ile and amplit ude.
42
Playing Abilit y
• Played 1000 games t o 8 ply against simple piece
dif f er ence
– We won 68%, dr ew 30%, lost 2%
• Against Xchecker s
– We won 22%, dr ew 66%, lost 12%
– Our play appear s t o impr ove wit h incr easing ply.
• Played on Micr osof t Gaming Zone, 30 games t o 10
ply and r eached a r anking of 2108. Player also
compet ed in Evolut ionar y Checker s compet it ion at
WCCI 2002 and declar ed winner .
43
I nit ial Conclusions
• Piece dif f er ence can be evolved, but it may
t ake a while t o get good r esult s
• I t is clear t hat t he back-r ank is f avour ed in
t he weight s, but ot her st r at egies ar e
dif f icult t o see.
• I f r ot at ional ant i-symmet r y is enf or ced, play
is simpler t o evolve, and capable of
signif icant r esult s f or such a simple
algor it hm.
44
Complex Ar t if icial Net wor k
• Fogel’s Blondie24 - lar ge ar t if icial neur al
net wor k, 5048 r eal-valued genes t o evolve.
• Used ES wit h populat ion of 15 and 840
gener at ions t o evolve a player .
• Has been compar ed f avour ably against
Chinook and per f or ms well on Micr osof t
Gaming Zone, r eaching Exper t level.
45
Net wor k st r uct ur e
Reproduced from: “Anaconda Defeats Hoyle 6-0: A Case Study Competing an Evolved
Checkers Program against Commercially Available Software”
Kumar Chellapilla and David B. Fogel, IEEE 2000
46
Our t r ials - Br unet t e24
• Similar net wor k st r uct ur e t o Blondie24
evolved f or same 840 gener at ions wit h
populat ion of 15.
• Small dif f er ences t o Blondie:
– Play at black/ whit e was r andom, r at her
t han biased t o black,
– small amount of r andomness was added
t o evaluat ion out put .
– No ply ext ension used when f inal move is
a capt ur e.
47
Playabilit y of Br unet t e24
• Played 6 ply successf ully on MSN gaming zone, but
not enough games t o gener at e a r eliable r anking.
• Played t wice against Blondie24:
– f ir st game Blondie24 black, Br unet t e24 whit e.
Blondie scor ed nar r ow win.
– Second game colour s r ever sed, played t o dr aw,
Br unet t e24 a king up.
• Result s inconclusive, but pr omising given t hat
Blondie24 ef f ect ively plays deeper sear ch.
48
•
•
•
•
•
•
Evolut ionar y st at e of Br unet t e24
Af t er 840 gener at ions, wit h a populat ion of 15, many of t he
games wer e playing t o a dr aw, so evolut ion was slowing
r apidly.
The mean weight f or t he piece dif f er ence and t he r andom
number af t er 840 gener at ions wer e 0.83 and 0.17
r espect ively. A king was wor t h 1.20 men.
Af t er 1882 gener at ions, t he weight s wer e 0.86 and 0.01
r espect ively. A king was wor t h 1.36 men
The ext r a 1042 gener at ions appear t o have a signif icant
f ine-t uning ef f ect on t he weight s
The 1882 player plays bet t er t han t he 840 player , alt hough
mor e t r ials ar e r equir ed t o see if t he dif f er ence is
signif icant .
�boot st r apping’ ef f ect of piece dif f er ence is signif icant - if
t he gene is set t o zer o, t he gr adient is poor and evolut ion
never st ar t s.
49
Key Conclusions
• Evolut ion is ver y capable of discover ing pr imar y
st r at egies t o games - as long as t he evolving
st r uct ur e is simple enough, or has simple pat hways
t hat can act as a �boot st r ap’ t o pr ovide suf f icient
init ial gr adient t o t he obj ect ive sur f ace, when
compar ed t o t he noise induced by coevolut ion.
• Genes r elat ed t o r egions/ pieces not of t en in play
ar e less obser vable and t her ef or e dif f icult t o
evolve.
• Many evolved player s do incor por at e human
exper t ise t hat was cr it ical t o t heir achieved
success.
50
51
Документ
Категория
Без категории
Просмотров
5
Размер файла
665 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа