Coding a Grammar with ANTLR (Mutual-Left Recursion) - antlr3

I have this grammar for code in ANTLR.
grammar Booleanos;
//lexico
AND : 'AND' || 'and' ;
OR : 'OR' || 'or' ;
NOT : 'NOT' || 'not';
TRUE : 'TRUE' || 'true' ;
FALSE : 'FALSE' || 'false' ;
LPAREN : '(' ;
RPAREN : ')' ;
//sintactico
start : bexpr;
bexpr : bexpr OR bterm | bterm;
bterm : bterm AND bfactor | bfactor;
bfactor : NOT bfactor | LPAREN bexpr RPAREN | TRUE | FALSE;
But i have a problem of Mutual-Left Recursion on bexpr: and bterm: . How can i remove this warnings? I can't compile. Thanks for your help.

ANTLR3 cannot implicitly handle this situation, hence you get the errors. With ANTLR4 direct left recursion (those not spreading over multiple rules) are handled automatically. So if you can upgrade, consider that.
However, it's not that difficult to solve left recursion. The simplest way is probably to use ANTLRWorks 1.5 which has a menu entry to start resolving left recursion.

You can try to rewrite your bexpr and bterm rules like this:
bexpr : bterm (OR bterm)*;
bterm : bfactor (AND bfactor)*;

Related

Antr3 rule rewriting in Antlr4

I were upgrading my antlr3 grammar to antlr4 but found the rule rewiring is not supported in antrl3, appreciate any advice to make below grammar work in Antlr4?
fragment date
: DATE (MINUS DATE)* -> ^(TO DATE+)
;
fragment simpleExpression
: expr (OR expr)* -> expr+
;
fragment simpleExpressionWithLiteral
: exprWithLiteral (OR exprWithLiteral)* -> exprWithLiteral+
;
fragment conditionalExpression
: orExpression -> ^(COND orExpression)?
;
fragment orExpression
: andExpression (OR^ andExpression)*
;
fragment andExpression
: atom (AND^ atom)*
;
fragment atom
: exprWithLiteral
| NOT exprWithLiteral -> ^(NOT exprWithLiteral)
| NOT LPAREN orExpression RPAREN-> ^(NOT orExpression)
| LPAREN orExpression RPAREN -> orExpression
;
fragment exprWithLiteral
: expr
| StringLiteral
;
fragment expr
: WORD
| NUMBER
;
The part after -> is not rule rewiring but tree rewriting. ANTLR3 produced an AST which you could manually change using this tree rewriting syntax. ANTLR4 no longer produces ASTs but parse trees, which you cannot change (as they represent the path taken through the grammar).
So the simple solution is to remove everything on a line starting with ->, example:
fragment date
: DATE (MINUS DATE)* -> ^(TO DATE+)
;
becomes
fragment date
: DATE (MINUS DATE)*
;

Grammar LaTeX like with mixed whitespace utf and commands

I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :
\title{Un pré é"'§è" \VAR state \draw( 200\if{expression kjlkjé} ) bis tèr }
As you can see, the \title{ } can contain several kind of items :
string in utf8 without quotes and with whitespace which I'd like to
keep in one token
a variable call as : \variable_name
some \keyword following by parentheses or other with braces : for instance \draw( utf8 \var \if{ } ... ) or \if{ idem }.
These items can be nested.
I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the \variable_name ( I get a : extraneous input ' ').
Here my lexer gramar code :
lexer grammar OEFLexer;
// Default mode rules (the SEA)
SEA_WS : (' '|'\t'|'\r'? '\n')+ ;
TITLE : '\\title';
OB : '{';
OP : '(';
BSLASH : '\\' -> mode(CALLREFERENCE) ;
TEXT : ~[\\({]+; // clump all text together
// ----------------- Everything Callreference ---------------------
mode CALLREFERENCE;
CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
CB : '}' -> mode(DEFAULT_MODE) ; // back to SEA mode
CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode
DRAW : 'draw' OP;
IF : 'if' OB;
ID : [a-zA-Z]+ ; // match/send ID in tag to parser
Here my parser grammar
parser grammar OEFParser;
options { tokenVocab=OEFLexer; }
document: TITLE OB ( callreference | string )* CB;
string : TEXT;
var : ID;
commandDraw : DRAW ( callreference | string )* CP ;
commandIf : IF ( callreference | string )* CB ;
callreference : BSLASH ID | BSLASH commandDraw CP | BSLASH commandIf CP;
When I tried to parse the \title code mentionned at the beginning I obtain :
line 1:25 extraneous input ' ' expecting {'\', TEXT, '}'}
line 1:37 extraneous input ' ' expecting {'\', TEXT, ')'}
line 1:45 mismatched input 'expression' expecting {'\', TEXT, '}'}
line 1:75 extraneous input '<EOF>' expecting {'\', TEXT, ')'}
With this generated tree generated by Grun
Thanks for your help to help me tackle this issue.
Chris
The problem is the space after expression:
\title{Un pré é"'§è" \VAR state \draw( 200\if{expression kjlkjé} ) bis tèr }
^
^
^
which causes the mode to go back to the DEFAULT_MODE:
CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;
Something that you don't want because you're (obviously) still in the CALLREFERENCE context.
One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a \... ( and \... { you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or }.
A quick lexer grammar demo:
lexer grammar OEFLexer;
TITLE : '\\title' S? OB -> pushMode(CALLREFERENCE);
fragment OB : '{';
fragment OP : '(';
fragment S : [ \t\r\n]+;
mode CALLREFERENCE;
CB : '}' -> popMode;
CP : ')' -> popMode;
DRAW : '\\draw' S? OP -> pushMode(CALLREFERENCE);
IF : '\\if' S? OB -> pushMode(CALLREFERENCE);
BSLASH : '\\';
ID : [a-zA-Z]+;
CR_OTHER : .;
and the parser grammar:
parser grammar OEFParser;
options { tokenVocab=OEFLexer; }
document
: TITLE ( callreference | string )* CB EOF
;
string
: CR_OTHER+
| ID
;
commandDraw
: DRAW ( callreference | string )* CP
;
commandIf
: IF ( callreference | string )* CB
;
callreference
: BSLASH ID
| commandDraw
| commandIf
;
Parsing you example input will result in the following parse tree:

Xtext grammar QualifiedName ambiguity

I have the following problem. Part of my grammar looks like this
RExpr
: SetOp
;
SetOp returns RExpr
: PrimaryExpr (({Union.left=current} '+'|{Difference.left=current} '-'|{Intersection.left=current} '&') right = PrimaryExpr)*
;
PrimaryExpr returns RExpr
: '(' RExpr ')'
| (this = 'this.')? slot = [Slot | QualifiedName]
| (this = 'this' | ensName = [Ensemble | QualifiedName])
| 'All'
;
When generating Xtext artifacts ANTLR says that due to some ambiguity it disables an option(3). The ambiguity is because of QualifiedName slot and ensemble share. How do I refactor this kind of cases? I guess syntactic predicate wont help here since it'll force only one(Slot/Ensemble) to be resolved only.
Thanks.
Xtext can't choose between your two references slot and ensemble.
You can merge these references into one reference by adding this rule to your grammar:
SlotOrEnsemble:
Slot | Ensemble
;
Then your primaryExpr rule will be something like:
PrimaryExpr returns RExpr
: '(' RExpr ')'
| ((this = 'this.')? ref= [SlotOrEnsemble | QualifiedName])
| this = 'this'
| 'All'
;

stuck making simple grammar for filter language

I have tried and get close but keep getting stuck. Input language is like this
('aaa' eq '42') and ('bbb' gt 'zzz') or (....) and (....)
ie a set of clauses of the form left op right joined by 'and' or 'or'. THere can be 1 or more clauses
This seemed simple to me but I am sure I have started getting too complicated
grammar filter;
options {
language=CSharp2;
output=AST;
}
tokens {
ROOT;
}
OPEN_PAREN
: '(';
CLOSE_PAREN
: ')';
SINGLE_QUOTE
: '\'' ;
AND : 'and';
OR : 'or';
GT : 'gt';
GE : 'ge';
EQ : 'eq';
LT : 'lt';
LE : 'le';
fragment
ID : ('a'..'z' | 'A'..'Z' )+;
STRING : SINGLE_QUOTE ID SINGLE_QUOTE;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = Hidden; } ;
//public root : filter -> ^(ROOT filter);
public filter
: clause^
| lc=clause join rc=clause ->^(join $lc $rc)
;
left : STRING;
right : STRING;
clause
: OPEN_PAREN left op right CLOSE_PAREN //-> ^(op left right)
;
join : AND
| OR
;
op : GT|GE|LT|LE|EQ;
when I run this in C# I get 'more than one node as root'
Also I am not sure how I can do the N joins

ANTLR: field access and evaluation

I'm trying to write a piece of grammar to express field access for a hierarchical structure, something like a.b.c where c is a field of a.b and b is a field of a.
To evaluate the value of a.b.c.d.e we need to evaluate the value of a.b.c.d and then get the value of e.
To evalutate the value of a.b.c.d we need to evalute the value of a.b.c and then get the value of d and so on...
If you have a tree like this (the arrow means "lhs is parent of rhs"):
Node(e) -> Node(d) -> Node(c) -> Node(b) -> Node(a)
the evaluation is quite simple. Using recursion, we just need to resolve the value of the child and then access to the correct field.
The problem is: I have this 3 rules in my ANTLR grammar file:
tokens {
LBRACE = '{' ;
RBRACE = '}' ;
LBRACK = '[' ;
RBRACK = ']' ;
DOT = '.' ;
....
}
reference
: DOLLAR LBRACE selector RBRACE -> ^(NODE_VAR_REFERENCE selector)
;
selector
: IDENT access -> ^(IDENT access)
;
access
: DOT IDENT access? -> ^(IDENT<node=com.at.cson.ast.FieldAccessTree> access?)
| LBRACK IDENT RBRACK access? -> ^(IDENT<node=com.at.cson.ast.FieldAccessTree> access?)
| LBRACK INTEGER RBRACK access? -> ^(INTEGER<node=com.at.cson.ast.ArrayAccessTree> access?)
;
As expected, my tree has this form:
ReferenceTree
IdentTree[a]
FieldAccessTree[b]
FieldAccessTree[c]
FieldAccessTree[d]
FieldAccessTree[e]
The evaluation is not that easy as in the other case because I need to get the value of the current node and then give it to the child and so on...
Is there any way to reverse the order of the tree using ANTLR or I need to do it manually?
You can only do this by using the inline tree operator1, ^, instead of a rewrite rule.
A demo:
grammar T;
options {
output=AST;
}
tokens {
ROOT;
LBRACK = '[' ;
RBRACK = ']' ;
DOT = '.' ;
}
parse
: selector+ EOF -> ^(ROOT selector+)
;
selector
: IDENT (access^)*
;
access
: DOT IDENT -> IDENT
| LBRACK IDENT RBRACK -> IDENT
| LBRACK INTEGER RBRACK -> INTEGER
;
IDENT : 'a'..'z'+;
INTEGER : '0'..'9'+;
SPACE : ' ' {skip();};
Parsing the input:
a.b.c a[1][2][3]
will produce the following AST:
1 for more info about inline tree operators and rewrite rules, see: How to output the AST built using ANTLR?

Resources