stuck making simple grammar for filter language - antlr3

I have tried and get close but keep getting stuck. Input language is like this
('aaa' eq '42') and ('bbb' gt 'zzz') or (....) and (....)
ie a set of clauses of the form left op right joined by 'and' or 'or'. THere can be 1 or more clauses
This seemed simple to me but I am sure I have started getting too complicated
grammar filter;
options {
language=CSharp2;
output=AST;
}
tokens {
ROOT;
}
OPEN_PAREN
: '(';
CLOSE_PAREN
: ')';
SINGLE_QUOTE
: '\'' ;
AND : 'and';
OR : 'or';
GT : 'gt';
GE : 'ge';
EQ : 'eq';
LT : 'lt';
LE : 'le';
fragment
ID : ('a'..'z' | 'A'..'Z' )+;
STRING : SINGLE_QUOTE ID SINGLE_QUOTE;
WHITESPACE : ( '\t' | ' ' | '\r' | '\n'| '\u000C' )+ { $channel = Hidden; } ;
//public root : filter -> ^(ROOT filter);
public filter
: clause^
| lc=clause join rc=clause ->^(join $lc $rc)
;
left : STRING;
right : STRING;
clause
: OPEN_PAREN left op right CLOSE_PAREN //-> ^(op left right)
;
join : AND
| OR
;
op : GT|GE|LT|LE|EQ;
when I run this in C# I get 'more than one node as root'
Also I am not sure how I can do the N joins

Related

Antr3 rule rewriting in Antlr4

I were upgrading my antlr3 grammar to antlr4 but found the rule rewiring is not supported in antrl3, appreciate any advice to make below grammar work in Antlr4?
fragment date
: DATE (MINUS DATE)* -> ^(TO DATE+)
;
fragment simpleExpression
: expr (OR expr)* -> expr+
;
fragment simpleExpressionWithLiteral
: exprWithLiteral (OR exprWithLiteral)* -> exprWithLiteral+
;
fragment conditionalExpression
: orExpression -> ^(COND orExpression)?
;
fragment orExpression
: andExpression (OR^ andExpression)*
;
fragment andExpression
: atom (AND^ atom)*
;
fragment atom
: exprWithLiteral
| NOT exprWithLiteral -> ^(NOT exprWithLiteral)
| NOT LPAREN orExpression RPAREN-> ^(NOT orExpression)
| LPAREN orExpression RPAREN -> orExpression
;
fragment exprWithLiteral
: expr
| StringLiteral
;
fragment expr
: WORD
| NUMBER
;
The part after -> is not rule rewiring but tree rewriting. ANTLR3 produced an AST which you could manually change using this tree rewriting syntax. ANTLR4 no longer produces ASTs but parse trees, which you cannot change (as they represent the path taken through the grammar).
So the simple solution is to remove everything on a line starting with ->, example:
fragment date
: DATE (MINUS DATE)* -> ^(TO DATE+)
;
becomes
fragment date
: DATE (MINUS DATE)*
;

Grammar LaTeX like with mixed whitespace utf and commands

I've tried to implement a LaTeX like grammar that could allow me to parse this kind of sentence :
\title{Un pré é"'§è" \VAR state \draw( 200\if{expression kjlkjé} ) bis tèr }
As you can see, the \title{ } can contain several kind of items :
string in utf8 without quotes and with whitespace which I'd like to
keep in one token
a variable call as : \variable_name
some \keyword following by parentheses or other with braces : for instance \draw( utf8 \var \if{ } ... ) or \if{ idem }.
These items can be nested.
I get inspiration from the XML parser presented in ANTLR 4 book and try to use mode. I meet a problem concerning the recognition of the closing braces of closing parentheses. I also meet a problem with some whitespaces, for instance the one who follows the \variable_name ( I get a : extraneous input ' ').
Here my lexer gramar code :
lexer grammar OEFLexer;
// Default mode rules (the SEA)
SEA_WS : (' '|'\t'|'\r'? '\n')+ ;
TITLE : '\\title';
OB : '{';
OP : '(';
BSLASH : '\\' -> mode(CALLREFERENCE) ;
TEXT : ~[\\({]+; // clump all text together
// ----------------- Everything Callreference ---------------------
mode CALLREFERENCE;
CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ; // back to SEA mode
CB : '}' -> mode(DEFAULT_MODE) ; // back to SEA mode
CP : ')' -> mode(DEFAULT_MODE) ; // back to SEA mode
DRAW : 'draw' OP;
IF : 'if' OB;
ID : [a-zA-Z]+ ; // match/send ID in tag to parser
Here my parser grammar
parser grammar OEFParser;
options { tokenVocab=OEFLexer; }
document: TITLE OB ( callreference | string )* CB;
string : TEXT;
var : ID;
commandDraw : DRAW ( callreference | string )* CP ;
commandIf : IF ( callreference | string )* CB ;
callreference : BSLASH ID | BSLASH commandDraw CP | BSLASH commandIf CP;
When I tried to parse the \title code mentionned at the beginning I obtain :
line 1:25 extraneous input ' ' expecting {'\', TEXT, '}'}
line 1:37 extraneous input ' ' expecting {'\', TEXT, ')'}
line 1:45 mismatched input 'expression' expecting {'\', TEXT, '}'}
line 1:75 extraneous input '<EOF>' expecting {'\', TEXT, ')'}
With this generated tree generated by Grun
Thanks for your help to help me tackle this issue.
Chris
The problem is the space after expression:
\title{Un pré é"'§è" \VAR state \draw( 200\if{expression kjlkjé} ) bis tèr }
^
^
^
which causes the mode to go back to the DEFAULT_MODE:
CLOSECALLVAR : ' ' -> mode(DEFAULT_MODE) ;
Something that you don't want because you're (obviously) still in the CALLREFERENCE context.
One way to handle this is to use -> pushMode(...) and -> popMode directives that causes a stack of CALLREFERENCE modes to be created. Whenever you stumble upon a \... ( and \... { you push a new CALLREFERENCE onto this stack, and then pop one off when you see a ) or }.
A quick lexer grammar demo:
lexer grammar OEFLexer;
TITLE : '\\title' S? OB -> pushMode(CALLREFERENCE);
fragment OB : '{';
fragment OP : '(';
fragment S : [ \t\r\n]+;
mode CALLREFERENCE;
CB : '}' -> popMode;
CP : ')' -> popMode;
DRAW : '\\draw' S? OP -> pushMode(CALLREFERENCE);
IF : '\\if' S? OB -> pushMode(CALLREFERENCE);
BSLASH : '\\';
ID : [a-zA-Z]+;
CR_OTHER : .;
and the parser grammar:
parser grammar OEFParser;
options { tokenVocab=OEFLexer; }
document
: TITLE ( callreference | string )* CB EOF
;
string
: CR_OTHER+
| ID
;
commandDraw
: DRAW ( callreference | string )* CP
;
commandIf
: IF ( callreference | string )* CB
;
callreference
: BSLASH ID
| commandDraw
| commandIf
;
Parsing you example input will result in the following parse tree:

antl3:Java heap space when testing parser

I'm trying to build a simple config-file reader to read files of this format:
A .-
B -...
C -.-.
D -..
E .
This is the grammar I have so far:
grammar def;
#header {
package mypackage.parser;
}
#lexer::header { package mypackage.parser; }
file
: line+;
line : ID WS* CODE NEWLINE;
ID : ('A'..'Z')*
;
CODE : ('-'|'.')*;
COMMENT
: '//' ~('\n'|'\r')* '\r'? '\n' {$channel=HIDDEN;}
| '/*' ( options {greedy=false;} : . )* '*/' {$channel=HIDDEN;}
;
WS : ( ' '
| '\t'
) {$channel=HIDDEN;}
;
NEWLINE:'\r'? '\n' ;
And this is my test rig (junit4)
#Test
public void BasicGrammarCheckGood() {
String CorrectlyFormedLine="A .-;\n";
ANTLRStringStream input;
defLexer lexer;
defParser parser;
input = new ANTLRStringStream(CorrectlyFormedLine);
lexer = new defLexer(input);
CommonTokenStream tokens = new CommonTokenStream(lexer);
parser = new defParser(tokens);
try {
parser.line();
}
catch(RecognitionException re) { fail(re.getMessage()); }
}
If I run this test right with a corrected formatted string - the code exits without any exception or output.
However if feed the parser with an invalid string like this : "xA .-;\n", the code spins for a while then exits with a "Java heap space".
(If I start my test with the top-level rule 'file', then I get the same result - with the additional (repeated) output of "line 1:0 mismatched input '' expecting CODE")
What's going wrong here ? I never seem to get the "RecognitionException" for the invalid output ?
EDIT: Here's my grammar file (Fragment), after being provided advice here - this avoids the 'Java heap space' issue.
file
: line+ EOF;
line : ID WS* CODE NEWLINE;
ID : ('A'..'Z')('A'..'Z')*
;
CODE : ('-'|'.')('-'|'.')*;
Some of your lexer rules match zero characters (an empty string):
ID : ('A'..'Z')*
;
CODE : ('-'|'.')*;
There are, of course, an infinite amount of empty strings in your input, causing your lexer to keep producing tokens, resulting in a heap space error after a while.
Always let lexer rules match at least 1 character.
EDIT
Two (small) remarks:
since you put the WS token on the hidden channel, you don't need to add them in your parser rules. So line becomes line : ID CODE NEWLINE;
something like ('A'..'Z')('A'..'Z')* can be written like this: ('A'..'Z')+

ANTLR: field access and evaluation

I'm trying to write a piece of grammar to express field access for a hierarchical structure, something like a.b.c where c is a field of a.b and b is a field of a.
To evaluate the value of a.b.c.d.e we need to evaluate the value of a.b.c.d and then get the value of e.
To evalutate the value of a.b.c.d we need to evalute the value of a.b.c and then get the value of d and so on...
If you have a tree like this (the arrow means "lhs is parent of rhs"):
Node(e) -> Node(d) -> Node(c) -> Node(b) -> Node(a)
the evaluation is quite simple. Using recursion, we just need to resolve the value of the child and then access to the correct field.
The problem is: I have this 3 rules in my ANTLR grammar file:
tokens {
LBRACE = '{' ;
RBRACE = '}' ;
LBRACK = '[' ;
RBRACK = ']' ;
DOT = '.' ;
....
}
reference
: DOLLAR LBRACE selector RBRACE -> ^(NODE_VAR_REFERENCE selector)
;
selector
: IDENT access -> ^(IDENT access)
;
access
: DOT IDENT access? -> ^(IDENT<node=com.at.cson.ast.FieldAccessTree> access?)
| LBRACK IDENT RBRACK access? -> ^(IDENT<node=com.at.cson.ast.FieldAccessTree> access?)
| LBRACK INTEGER RBRACK access? -> ^(INTEGER<node=com.at.cson.ast.ArrayAccessTree> access?)
;
As expected, my tree has this form:
ReferenceTree
IdentTree[a]
FieldAccessTree[b]
FieldAccessTree[c]
FieldAccessTree[d]
FieldAccessTree[e]
The evaluation is not that easy as in the other case because I need to get the value of the current node and then give it to the child and so on...
Is there any way to reverse the order of the tree using ANTLR or I need to do it manually?
You can only do this by using the inline tree operator1, ^, instead of a rewrite rule.
A demo:
grammar T;
options {
output=AST;
}
tokens {
ROOT;
LBRACK = '[' ;
RBRACK = ']' ;
DOT = '.' ;
}
parse
: selector+ EOF -> ^(ROOT selector+)
;
selector
: IDENT (access^)*
;
access
: DOT IDENT -> IDENT
| LBRACK IDENT RBRACK -> IDENT
| LBRACK INTEGER RBRACK -> INTEGER
;
IDENT : 'a'..'z'+;
INTEGER : '0'..'9'+;
SPACE : ' ' {skip();};
Parsing the input:
a.b.c a[1][2][3]
will produce the following AST:
1 for more info about inline tree operators and rewrite rules, see: How to output the AST built using ANTLR?

Inner join the same column of a table to multiple tables

I have a main table Fruit, and I'd like to join it to tables ApplePrice, PearPrice, and BananaPrice.
Fruit
Id Type Date
--------------------
1 Apple 1/1
2 Apple 1/3
3 Banana 1/5
4 Pear 1/7
Common Denominator of [Apple/Pear/Banana]Price (there are many more specific fields for each table):
Date Price F1 F2 ...
-----------------------
1/1 p1
1/2 p2
....
To get the price of each piece of Fruit, I join the Fruit table with each price table separately, then concatenate the results together.
If the Price tables can't be merged into one, do you have a better approach to this problem? For example, construct one Linq query that returns all the information instead of concatenating results from multiple queries.
Appreciate your ideas.
You need to use join into then DefaultIfEmpty. The LINQ equivalent of a SQL LEFT JOIN.
from fruit in fruits
join ap in applePrices
on (fruit.Type + fruit.Date.ToShortDateString()) equals ("Apple" + ap.Date.ToShortDateString())
into aps
from applePrice in aps.DefaultIfEmpty()
This will give:
Fruit | Apple Price | Banana Price | Pear Price
--------+-------------+--------------+------------
Apple | applePrice | null | null
Apple | applePrice | null | null
Banana | null | bananaPrice | null
Pear | null | null | pearPrice
Then select the valid fruitPrice values by the below:
applePrice != null
? applePrice.Price
: bananaPrice != null
? bananaPrice.Price
: pearPrice != null
? pearPrice.Price
: 0 // Default value here if all 3 are null
And use LINQ to select the wanted fields.
The complete result below, I've used an anonymous class to hold my values:
var prices = from fruit in fruits
join ap in applePrices
on (fruit.Type + fruit.Date.ToShortDateString()) equals ("Apple" + ap.Date.ToShortDateString())
into aps
from applePrice in aps.DefaultIfEmpty()
join bp in bananaPrices
on (fruit.Type + fruit.Date.ToShortDateString()) equals ("Banana" + bp.Date.ToShortDateString())
into bps
from bananaPrice in bps.DefaultIfEmpty()
join pp in pearPrices
on (fruit.Type + fruit.Date.ToShortDateString()) equals ("Pear" + pp.Date.ToShortDateString())
into pps
from pearPrice in pps.DefaultIfEmpty()
select new
{
Id = fruit.Id,
Type = fruit.Type,
Date = fruit.Date,
Price =
applePrice != null
? applePrice.Price
: bananaPrice != null
? bananaPrice.Price
: pearPrice != null
? pearPrice.Price
: 0
};
var prices = from T in bank.students
join O in bank.dovres
on (T.code) equals
(O.codestu)
into aps
from applePrice in aps.DefaultIfEmpty()
join Y in bank.rotbes
on (T.code) equals (Y.codestu)
into bps
from bananaPrice in bps.DefaultIfEmpty()
select new
{
Id = T.code,
Type =T.name,
Date = T.family,
father=T.fathername,
T.adi_date, T.faal_date,
// = applePrice.sal + " ماه و " + O.mah + " روز", hk = Y.sal + " ماه و " + Y.mah + " روز"
hj = applePrice != null
? applePrice.sal + " ماه و " + applePrice.mah + " روز"
:"",
hj1 = bananaPrice!= null
? bananaPrice.sal + " ماه و " +bananaPrice.mah + " روز"
: "",
};
dataGridView1.DataSource = prices;
dataGridView1.Columns[0].HeaderText = "کد ";
dataGridView1.Columns[1].HeaderText = "نام";
dataGridView1.Columns[2].HeaderText = "نام خانوادگی";
dataGridView1.Columns[3].HeaderText = "نام پدر";
dataGridView1.Columns[4].HeaderText = " عضویت عادی";
dataGridView1.Columns[5].HeaderText = "عضویت فعال";
dataGridView1.Columns[6].HeaderText = "کسری بسیج";
dataGridView1.Columns[7].HeaderText = "کسری جبهه";
This is the best code for joining 3 or more tables.

Resources