Context Free Grammars
Context Free Grammars (CFG or simply grammar)
Many programming language constructs have an inherently recursive structure that can be defined by context-free grammars.
e.g., conditional statement defined by a rule such as;
if S1 and S2 are statements and E is an expression, then
“if E then S1 else S2” is a statement
This form of conditional statement cannot be specified using the notation of regular expressions.
CFG consists of terminals, nonterminals, a start symbol, and productions.
1. Terminals are basic symbols from which strings are formed.
e.g., in programming language; if, then, and else is a terminal.
2. Non-terminals are syntactic variables that denote set of strings.
e.g., in production: stmt → if expr then stmt else stmt, expr and stmt are nonterminals.
The non terminals define sets of strings that define the language generated by the grammar.
They also impose hierarchical structure on the language that is useful for both syntax analysis and translation.
3. In a grammar, one non terminal is distinguished as the start symbol, and the set of strings it denotes is the language defined by the grammar.
4. The production of a grammar specifies the manner in which the terminals and nonterminals are combined to form strings.
Each production consists of a nonterminal, followed by an arrow, followed by a string of nonterminals and terminals.
In the following grammar find terminals, nonterminals, start symbols, and productions:
expr → expr op expr
expr → ( expr )
expr → - expr
expr → id
op → +
op → -
op → *
op → /
op → ^
Notational Conventions:
To avoid having to state that “these are terminals” and “these are nonterminals”, the following notational conventions with regard to grammars are used:
1. These symbols are terminals:
i) lower-case letters early in the alphabet such as a,b,c
ii) operator symbols such as +, -, etc.,
iii) punctuation symbols such as parenthesis, comma, etc.,
iv) boldface strings such as id or if
2. These symbols are nonterminals:
i) upper-case letters early in the alphabet such as A, B, C.
ii) The letter S, when it appears, is usually the start symbol.
iii) Lower-case italic names such as expr or stmt.
3. Upper-case letters late in the alphabet, such as X, Y, Z represent grammar symbols, i.e., either nonterminals or terminals.
4. Lower-case Greek letters, α, β, γ, for example, represent strings of grammar symbols.
5. If A → α1, A → α2, …, A → αk are all productions with A on the left (A-productions), then we can write as; A → α1| α2 | … | αk. (α1, α2, … , αk, the alternatives for A).
e.g., A sample grammar:
E → E A E | ( E ) | - E | id
A → + | - | * | / | ^
Derivations:
Derivation gives a precise description of the top-down constructions of a parse tree.
In derivation, a production is treated as a rewriting rule in which the nonterminal on the left is replaced by the string on the right side of the production.
e.g. E → E + E | E * E | - E | ( E ) | id
tell us we can replace one instance of an E in any string of grammar symbols by E + E or E * E or – E or ( E ) or id
E => - E read as “E derives – E”
Take single E and repeatedly apply productions in any order to obtain a sequence of replacements. e.g., E => - E => - ( E ) => - ( id )
Ambiguity: A grammar that produces more than one parse tree for some sentence is said to be ambiguous.
An ambiguous grammar is one that produces more than one leftmost or more than one rightmost derivation for the same sentence.
No comments:
Post a Comment