close

Вход

Забыли?

вход по аккаунту

?

Презентация

код для вставкиСкачать
CONTEXT-FREE GRAMMARS
c
Chuen-Liang Chen
Department of Computer Science
and Information Engineering
National Taiwan University
Taipei, TAIWAN
Chuen-Liang Chen, NTUCS&IE / ‹#›
Parsing
пЃ¬ function: checking syntactically validity of the input string
producing structure of the corresponding parse tree
пЃ¬ callee: scanner (when need a token)
semantic routine (when match a production rule)
пЃ¬ theoretical basis: context-free grammar
пЃ¬ executor: parser, syntax analyzer
пЃµ top-down parsing
– beginning at the start symbol, expanding nonterminals in depthfirst manner (predictive in nature)
c
– left-most derivation
– pre-order traversal of parse tree
– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads],
recursive descent parsing
пЃµ bottom-up parsing
– beginning from terminal string, determining the production used
to generate leaves
– right-most derivation in reverse order
– post-order traversal of parse tree
– e.g. LR(k) [read from Left; Right-most derivation; k lookaheads]
Chuen-Liang Chen, NTUCS&IE / ‹#›
Definitions about context-free grammar (1/2)
пЃ¬ context-free grammar -- G = (Vt, Vn, S, P)
пЃµ Vt -- set of terminal symbols
пЃµ Vn -- set of nonterminal symbols
– a, b, c, ...  Vt
– A, B, C, ...  Vn – U, V, W, ...  V = Vt  Vn
– u, v, w, ...  Vt* – a, b, g, ...  V*
пЃµ S -- start symbol, goal symbol; S пѓЋ Vn
c
пЃµ P -- set of production rules of the form : A п‚® a
пЃ¬ derivation by production rule A п‚®пЂ gпЂ пЃµ one step derivation : aпЂ A bпЂ пѓћ aпЂ gпЂ пЂ bпЂ пЃµ left-most derivation : u A b пѓћlm u g b
пЃµ right-most derivation : a A v пѓћrm a gпЂ пЂ v
пЃµ one or more steps derivation : пѓћ+ пѓћ+lm пѓћ+rm
пЃµ zero or more steps derivation : пѓћ* пѓћ*lm пѓћ*rm
Chuen-Liang Chen, NTUCS&IE / ‹#›
Definitions about context-free grammar (2/2)
пЃ¬ set of sentential forms -- SF(G) = { b | S пѓћ* b }
пЃµ left-most sentential form -- the b so that S пѓћ*lm b
пЃµ right-most sentential form -- the b so that S пѓћ*rm b
пЃ¬ context-free language -- L(G) = SF(G) пѓ‡ Vt*
пЃ¬ parse tree, derivation tree -пЃµ graphic representation of derivations
c
пЃµ root -- start symbol
пЃµ leaf nodes -- grammar symbols or l
пЃµ interior nodes -- nonterminals
пЃµ offspring of a nonterminal -- a production
пЃ¬ for a given sentential form -пЃµ phrase -- a sequence of symbols derived from a single nonterminal
пЃµ simple phrase, prime phrase -- minimal phrase
пЃµ handle -- left-most simple phrase
Chuen-Liang Chen, NTUCS&IE / ‹#›
Example of context-free grammar
пЃ¬ grammar G0 --
E
п‚® Prefix ( E ) | V Tail
Prefix п‚® F | l
Tail
п‚®+E|l
пЃ¬ left-most derivation -E пѓћlm Prefix ( E )
пѓћlm F ( E )
пѓћlm F ( V Tail )
пѓћlm F ( V + E )
пѓћlm F ( V + V Tail )
пѓћlm F ( V + V )
пЃ¬ right-most derivation -E пѓћrm Prefix ( E )
c пѓћrm Prefix ( V Tail )
пѓћrm Prefix ( V + E )
пѓћrm Prefix ( V + V Tail )
пѓћrm Prefix ( V + V )
пѓћrm F ( V + V )
пЃ¬ right-most sentential forms -- 1. E 2. Prefix ( E ) 3. Prefix ( V Tail )
4. Prefix ( V + E ) 5. Prefix ( V + V Tail ) 6. Prefix ( V + V ) 7. F ( V + V )
8. and so on
пЃ¬ L(G0) пѓ‰ { F ( V + V ) }
Chuen-Liang Chen, NTUCS&IE / ‹#›
Example of left-most derivation
пЃ¬ parse trees of left-most derivations
E
E
Prefix (
E
E
E
)
Prefix (
E
E
)
F
Prefix (
F
E
V
)
+
F
Tail
l
пЃµ
E
V
)
Tail
E
Prefix (
E
V
F
c
Tail
Prefix (
E
V
)
E
Tail
+
Prefix (
E
V
F
Tail
E
V
)
Tail
+
E
blue symbols : left-most sentential forms
Chuen-Liang Chen, NTUCS&IE / ‹#›
Parsing
пЃ¬ function: checking syntactically validity of the input string
producing structure of the corresponding parse tree
пЃ¬ callee: scanner (when need a token)
semantic routine (when match a production rule)
пЃ¬ theoretical basis: context-free grammar
пЃ¬ executor: parser, syntax analyzer
пЃµ top-down parsing
– beginning at the start symbol, expanding nonterminals in depthfirst manner (predictive in nature)
c
– left-most derivation
– pre-order traversal of parse tree
– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads],
recursive descent parsing
пЃµ bottom-up parsing
– beginning from terminal string, determining the production used
to generate leaves
– right-most derivation in reverse order
– post-order traversal of parse tree
– e.g. LR(k) [read from Left; Right-most derivation; k lookaheads]
Chuen-Liang Chen, NTUCS&IE / ‹#›
Example of top-down parsing
пЃ¬ trace of top-down parsing (left-most derivation)
E
E
E
Prefix (
E
E
)
Prefix (
E
)
F
Prefix (
F
E
V
пЃµ
E
V
)
Tail
E
c
Tail
Prefix (
E
V
Prefix (
F
)
+
E
F
Tail
E
V
)
Tail
+
E
Prefix (
E
F
E
V
)
Tail
V Tail
+
E
l
orange : just derived (predicted) blue : just read (matched)
black : derived or read
green : un-processed (parse stack)
Chuen-Liang Chen, NTUCS&IE / ‹#›
Definitions about context-free grammar (2/2)
пЃ¬ set of sentential forms -- SF(G) = { b | S пѓћ* b }
пЃµ left-most sentential form -- the b so that S пѓћ*lm b
пЃµ right-most sentential form -- the b so that S пѓћ*rm b
пЃ¬ context-free language -- L(G) = SF(G) пѓ‡ Vt*
пЃ¬ parse tree, derivation tree -пЃµ graphic representation of derivations
c
пЃµ root -- start symbol
пЃµ leaf nodes -- grammar symbols or l
пЃµ interior nodes -- nonterminals
пЃµ offspring of a nonterminal -- a production
пЃ¬ for a given sentential form -пЃµ phrase -- a sequence of symbols derived from a single nonterminal
пЃµ simple phrase, prime phrase -- minimal phrase
пЃµ handle -- left-most simple phrase
Chuen-Liang Chen, NTUCS&IE / ‹#›
Example of right-most derivation (1/2)
пЃ¬ parse trees of right-most derivations and corresponding sentential
form, phrases, simple phrases, handle
пЃµ blue symbols : sentential form
пЃµ
: phrase
пЃµ
: simple phrase
пЃµ
: handle
E
c
E
Prefix (
E
)
E
Prefix (
E
V
E
Prefix ( E )
E
)
Tail
Prefix ( V Tail )
Prefix (
E
V
)
Tail
+
E
Prefix ( V + E )
Chuen-Liang Chen, NTUCS&IE / ‹#›
Example of right-most derivation (2/2)
E
Prefix (
E
E
V
)
Prefix (
Tail
+
E
V
)
Tail
Prefix ( V + V Tail )
Prefix (
Tail
c +
E
V
E
F
E
V
E
V
)
Tail
+
Tail
l
Prefix ( V + V l )
E
V
Tail
l
F ( V + V l )
Chuen-Liang Chen, NTUCS&IE / ‹#›
Parsing
пЃ¬ function: checking syntactically validity of the input string
producing structure of the corresponding parse tree
пЃ¬ callee: scanner (when need a token)
semantic routine (when match a production rule)
пЃ¬ theoretical basis: context-free grammar
пЃ¬ executor: parser, syntax analyzer
пЃµ top-down parsing
– beginning at the start symbol, expanding nonterminals in depthfirst manner (predictive in nature)
c
– left-most derivation
– pre-order traversal of parse tree
– e.g. LL(k) [read from Left; Left-most derivation; k lookaheads],
recursive descent parsing
пЃµ bottom-up parsing
– beginning from terminal string, determining the production used
to generate leaves
– right-most derivation in reverse order
– post-order traversal of parse tree
– e.g. LR(k) [read from Left; Right-most derivation; k lookaheads]
Chuen-Liang Chen, NTUCS&IE / ‹#›
Example of bottom-up parsing
пЃ¬ trace of bottom-up parsing (inverse order of right-most derivation)
F ( V + V l )
Prefix ( V + V l )
Prefix ( V + V Tail )
Prefix ( V + E )
F
F
l
E
F
V Tail
l
Prefix ( V Tail )
F
+
E
V
пЃµ
Prefix (
E
)
E
)
c
F
V
Tail
+
Tail
Prefix (
F
E
V
V
Tail
+
Tail
E
V
l
l
blue : just read (shifted) orange : just derived (reduced to)
pink : not read
green : derived or read (parse stack)
Tail
l
Chuen-Liang Chen, NTUCS&IE / ‹#›
Examples - 排骨麵特餐
пЃ¬ example 1
пЃµ
排骨麵特餐  冰紅茶 排骨麵 柳丁切片
жЋ’йЄЁйєµ
п‚® з‚ёжЋ’йЄЁ ж№Їйєµ
пЃµ
lookahead is unnecessary
c
пЃ¬ example 2
пЃµ
排骨麵特餐 п‚®пЂ пЂ е†°зґ…иЊ¶ жЋ’йЄЁйєµ service 柳丁切片
жЋ’йЄЁйєµ
п‚®пЂ пЂ з‚ёжЋ’йЄЁ ж№Їйєµ | ж№Їйєµ з‚ёжЋ’йЄЁ
service
 芋仔冰 | 別想了(l)
пЃµ
lookahed is required
Chuen-Liang Chen, NTUCS&IE / ‹#›
Ambiguity of grammar
пЃ¬ a string with two different parse trees (i.e., two different structures)
пЃ¬ example : <exp> п‚® <exp> - <exp>
<exp> п‚® id
<exp>
<exp>
<exp> - <exp>c <exp> - <exp>
<exp> - <exp> id
id
id
id <exp> - <exp>
id
id
пЃ¬ for an unambiguous grammar, parse trees of leftmost derivation and
right-most derivation are the same
Chuen-Liang Chen, NTUCS&IE / ‹#›
First set and Follow set (1/2)
пЃ¬ First(a) = { a пѓЋ Vt | a пѓћ* a b } пѓ€ ( if a пѓћ* l then {l} else пѓ† )
пЃµ
set of all terminals that can begin a sentential form derived from a
пЃµ
Firstk(a) -- set of k-symbol terminal strings that can begin a
sentential form derived from a
пЃµ
QUIZ: for what?
c
пЃ¬ Follow(A) = { a пѓЋ Vt | S пѓћ+ a A a b } пѓ€ ( if S пѓћ+ a A then {l} else пѓ† )
пЃµ
set of all terminals that may follow A in some sentential form
пЃµ
Followk(A) -- set of k-symbol terminal strings that may follow A in
some sentential form
пЃµ
QUIZ: for what?
Chuen-Liang Chen, NTUCS&IE / ‹#›
First set and Follow set (2/2)
пЃ¬ example 1 -E
п‚® Prefix ( E )
E
п‚® V Tail
Prefix п‚® F | l
Tail п‚® + E | l
пЃ¬ example 2 -S
п‚®aSe|B
B
п‚®bBe|C
C
п‚®cCe|d
пЃ¬ example 3 -S
п‚®ABc
A
п‚®a|l
B
п‚®b|l
E
Prefix
Tail
First_set
{ V, F, ( }
{ +, l }
Follow_set
{ l, ) }
{ F, l }
{(}
S
B
C
c
First_set
{ a, b, c, d } { b, c, d }
Follow_set
{ e, l }
{ e, l }
{ l, ) }
{ c, d }
{ e, l }
S
A
B
First_set
{ a, b, c }
Follow_set
{l}
{ a, l }
{ b, c }
{ b, l }
{c}
Chuen-Liang Chen, NTUCS&IE / ‹#›
Algorithms for First & Follow sets (1/6)
typedef int symbol;
/* a symbol in the grammar */
/* The symbolic constants used
* below, NUM_TERMINALS,
* NUM_NONTERMINALS, and
* NUM_PRODUCTIONS are
* determined by the grammar.
* MAX_RHS_LENGTH should
* simply be "big enough."
*/
#define VOCABULARY
(NUM_NONTERMINALS +
NUM_TERMINALS)
typedef struct gram {
symbol terminals[NUM_TERMINALS];
symbol nonterminals[NUM_NONTERMINALS];
symbol start_symbol;
int num_productions;
struct prod {
symbol lhs;
c int rhs_length;
symbol rhs[MAX_RHS_LENGTH];
} productions[NUM_PRODUCTIONS];
symbol vocabulary[VOCABULARY];
} grammar;
typedef struct prod production;
typedef symbol terminal;
typedef symbol nonterminal;
Chuen-Liang Chen, NTUCS&IE / ‹#›
Algorithms for First & Follow sets (2/6)
typedef short boolean;
typedef boolean marked_vocabulary[VOCABULARY];
/*
* Mark those vocabulary symbols found to derive l (directly or indirectly).
*/
marked_vocabulary mark_lambda(const grammar g)
{
static marked_vocabulary derives_lambda;
c
boolean changes;
/* any changes during last iteration? */
boolean rhs_derives_lambda; /* does the RHS derive l? */
symbol v;
/* a word in the vocabulary */
production p;
/* a production in the grammar */
int i, j;
/* loop variables */
for (v = 0; v < VOCABULARY; v++)
derives_lambda[v] = FALSE;
/* initially, nothing is marked */
Chuen-Liang Chen, NTUCS&IE / ‹#›
Algorithms for First & Follow sets (3/6)
do {
changes = FALSE;
for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];
if (! derives_lambda[p.lhs]) {
if (p.rhs_length == 0) {
/* derives l directly */
changes = derives_lambda[p.lhs] = TRUE;
continue;
}
c
/* does each part of RHS derive l? */
rhs_derives_lambda = derives_lambda[p.rhs[0]];
for (j = 1; j < p.rhs_length, j++)
rhs_derives_lambda = rhs_derives_lambda && derives_lambda[p.rhs[j]];
if (rhs_derives_lambda)
changes = derives_lambda[p.lhs] = TRUE;
}
}
} while (changes);
return derives_lambda;
}
Chuen-Liang Chen, NTUCS&IE / ‹#›
Algorithms for First & Follow sets (4/6)
typedef set_of_terminal_or_lambda termset;
termset follow_set[NUM_NONTERMINAL];
termset first_set[SYMBOL];
marked_vocabulary derives_lambda = mark_lambda(g);
/* mark_lambda(g) as defined above */
termset compute_first(string_of_symbols alpha)
{
int
i, k;
termset result;
k = length(alpha);
c
if (k == 0)
result = SET_OF( l );
else {
result = first_set[alpha[0]] - SET_OF( l ) ;
for (i = 1; i < k && lпЂ пѓЋпЂ first_set[alpha[i-1] ]; i++)
result = result пѓ€ ( first_set[alpha[i]] - SET_OF( l ) );
if (i == k && lпЂ пѓЋ first_set[alpha[k - 1]])
result = result пѓ€ SET_OF( l );
}
return result;
}
Chuen-Liang Chen, NTUCS&IE / ‹#›
Algorithms for First & Follow sets (5/6)
extern grammar g;
void fill_first_set(void)
{
nonterminal A;
terminal
a;
production p;
boolean
changes;
int
i, j;
for (i = 0; i < NUM_NONTERMINAL;
i++) {
A = g.nonterminals[i];
if (derives_lambda[A])
first_set[A] = SET_OF( l );
else
first_set[A] = пѓ†пЂ ;
}
пЃ¬ QUIZ: termination?
пЃ¬ QUIZ: correctness?
}
for (i = 0; i < NUM_TERMINAL; i++) {
a = g.terminals[i];
first_set[a] = SET_OF( a );
for (j = 0; j < NUM_NONTERMINAL; j++) {
A = g.nonterminals[j];
if (there exists a production Aп‚®ab)
first_set[A] = first_set[A] пѓ€ SET_OF( a );
}
}
do {c
changes = FALSE;
for (i = 0; i < g.num_productions; i++) {
p = g.productions[i];
first_set[p.lhs] = first_set[p.lhs] пѓ€
compute_first(p.rhs);
if ( first_set changed )
changes = TRUE;
}
} while (changes);
Chuen-Liang Chen, NTUCS&IE / ‹#›
Algorithms for First & Follow sets (6/6)
void fill_follow_set(void)
{
nonterminal A, B;
int
i;
boolean changes;
do {
changes = FALSE;
for (each production A п‚®пЂ a B b ) {
/*
* I.e. for each production and each
* occurrence of a nonterminal in its
for (i = 0; i < NUM_NONTERMINAL; i++) {
* right-hand side.
A = g.nonterminals[i];
*/
c
follow_set[A] = пѓ†;
follow_set[B] = follow_set[B] пѓ€
}
(compute_first(b) - SET_OF( l ));
follow_set[g.start_symbol] = SET_OF( l );
if ( lпЂ пѓЋ compute_first(b) )
follow_set[B] = follow_set[B] пѓ€
follow_set[A];
if ( follow_set[B] changed )
changes = TRUE;
}
} while (changes);
}
пЃ¬ QUIZ: termination?
пЃ¬ QUIZ: correctness?
Chuen-Liang Chen, NTUCS&IE / ‹#›
Tracing examples
пЃ¬ example 1 -E
п‚® Prefixп‚Ѓ( Eп‚‚)п‚Њ
E
п‚® V Tailп‚ѓп‚Ќ
Prefix п‚® Fп‚Ћ| lп‚Џ
Tail п‚® + Eп‚„п‚ђ | lп‚‘
пЃ¬ example 2 -S п‚® a Sп‚Ѓeп‚Њ | Bп‚‚п‚Ќп‚’
B п‚® b Bп‚ѓeп‚Ћ| Cп‚„п‚Џ
C п‚® c Cп‚…eп‚ђ | dп‚‘
пЃ¬ example 3 -S п‚® Aп‚ЃBп‚‚cп‚Њ
A п‚® aп‚Ќ | lп‚Ћ
B п‚® bп‚Џ | lп‚ђ
E п‚Ќ п‚Њп‚Њ
First_set
{ V, F, ( }
Follow_set
{ l, ) }
п‚‚
п‚„п‚„
Prefix
п‚Ћп‚Џ
Tailп‚ђ п‚‘
{ F, l }
{(}
{ +, l }
п‚Ѓ
п‚ѓп‚ѓ
{ l, ) }
c
S
п‚Њп‚Ќп‚’п‚’
First_set
Follow_set
B п‚Ћп‚Џ п‚Џ
{ a, b, c, d } { b, c, d }
{ l, e }
п‚Ѓ
Sп‚Њ п‚Њ п‚Њ
First_set
{ a, b, c }
Follow_set
{l}
{ e, l }
п‚‚п‚‚
п‚ѓ
A п‚Ќ п‚Ћ
C п‚ђп‚‘
{ c, d }
{ e, l }
п‚„п‚„
п‚…
B п‚Џп‚ђ
{ a, l }
{ b, c }
{ b, l }
{c}
п‚Ѓп‚Ѓ
п‚‚
Chuen-Liang Chen, NTUCS&IE / ‹#›
From extended BNF to CFG
пЃ¬ <statement list> п‚® <statement> { <statement> }
пѓЄ
<statement list> п‚® <statement> <statement tail>
c
<statement tail> п‚® <statement> <statement tail>
<statement tail> п‚® l
пЃ¬ QUIZ: how, systematically?
Chuen-Liang Chen, NTUCS&IE / ‹#›
Other types of grammars
пЃ¬ regular grammar -пЃµ QUIZ: how?
A п‚® a B or C п‚® l
пЃ¬ context-free grammar --
Aп‚®a
пЃ¬ context-sensitive grammar -- a A b п‚® a d b
пЃ¬ type 0 grammar --
aп‚®
cb
пЃ¬ regular grammar : too simple, e.g., { [ i ] i | i п‚і 1 }
пЃµ QUIZ: how to specify { [ i ] i | i п‚і 1 } by context-free grammar?
пЃ¬ context-sensitive, type 0 : without sufficient parser
пЃ¬ context-free grammar : a balance between generality and practicality
Chuen-Liang Chen, NTUCS&IE / ‹#›
Документ
Категория
Презентации
Просмотров
13
Размер файла
318 Кб
Теги
1/--страниц
Пожаловаться на содержимое документа