Why does Antlr think there is a missing bracket? - antlr3

I've created a grammar to parse simple ldap query syntax. The grammer is:
expression : LEFT_PAREN! ('&' | '||' | '!')^ (atom | expression)* RIGHT_PAREN! EOF ;
atom : LEFT_PAREN! left '='^ right RIGHT_PAREN! ;
left : ITEM;
right : ITEM;
ITEM : ALPHANUMERIC+;
LEFT_PAREN : '(';
RIGHT_PAREN : ')';
fragment ALPHANUMERIC
: ('a'..'z' | 'A'..'Z' | '0'..'9');
WHITESPACE : (' ' | '\t' | '\r' | '\n') { skip(); };
Now this grammar works fine for:
(!(attr=hello2))
(&(attr=hello2)(attr2=12))
(||(attr=hello2)(attr2=12))
However, when I try and run:
(||(attr=hello2)(!(attr2=12)))
It fails with: line 1:29 extraneous input ')' expecting EOF
If I remove the EOF off the expression grammar, everything passes, but then wrong numbers of brackets are not caught as being a syntax error. (This is being parsed into a tree, hence the ^ and ! after tokens) What have I missed?

As already mentioned by others, your expression has to end with a EOF, but a nested expression cannot end with an EOF, of course.
Remove the EOF from expression, and create a proper "entry point" for your parser that ends with the EOF.
file: T.g
grammar T;
options {
output=AST;
}
parse
: expression EOF!
;
expression
: '('! ('&' | '||' | '!')^ (atom | expression)* ')'!
;
atom
: '('! ITEM '='^ ITEM ')'!
;
ITEM
: ALPHANUMERIC+
;
fragment ALPHANUMERIC
: ('a'..'z' | 'A'..'Z' | '0'..'9')
;
WHITESPACE
: (' ' | '\t' | '\r' | '\n') { skip(); }
;
file: Main.java
import org.antlr.runtime.*;
import org.antlr.runtime.tree.*;
import org.antlr.stringtemplate.*;
public class Main {
public static void main(String[] args) throws Exception {
String source = "(||(attr=hello2)(!(attr2=12)))";
TLexer lexer = new TLexer(new ANTLRStringStream(source));
TParser parser = new TParser(new CommonTokenStream(lexer));
CommonTree tree = (CommonTree)parser.parse().getTree();
DOTTreeGenerator gen = new DOTTreeGenerator();
StringTemplate st = gen.toDOT(tree);
System.out.println(st);
}
}
To run the demo, do:
*nix/MacOS:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .:antlr-3.3.jar Main
Windows:
java -cp antlr-3.3.jar org.antlr.Tool T.g
javac -cp antlr-3.3.jar *.java
java -cp .;antlr-3.3.jar Main
which produces the DOT code representing the following AST:
image created using graphviz-dev.appspot.com

In your definition of expression, there can be parentheses containing a nested expression, but the nested expression has to end in EOF. In your sample input, the nested expression doesn't end in EOF.

Related

Gocc to ignore things in lexical parser

Is there ways to tell gocc to ignore things in lexical parser? E.g., for
2022-01-18 11:33:21.9885 [21] These are strings that I need to egnore, until - MYKW - Start Active One: 1/18/2022 11:33:21 AM
I want to tell gocc to ignore from [21] all the way to until. Here is what I've been trying:
/* Lexical part */
_letter : 'A'-'Z' | 'a'-'z' | '_' ;
_digit : '0'-'9' ;
_timestamp1 : _digit | ' ' | ':' | '-' | '.' ;
_timestamp2 : _digit | ' ' | ':' | '/' | 'A' | 'P' | 'M' ;
_ignore : '[' { . } ' ' '-' ' ' 'M' 'Y' 'K' 'W' ' ' '-' ' ' ;
_lineend : [ '\r' ] '\n' ;
timestamp : _timestamp1 { _timestamp1 } _ignore ;
taskLogStart : 'S' 't' 'a' 'r' 't' ' ' ;
jobName : { . } _timestamp2 { _timestamp2 } _lineend ;
/* Syntax part */
Log
: timestamp taskLogStart jobName ;
However, the parser failed at:
error: expected timestamp; got: unknown/invalid token "2022-01-18 11:33:21.9885 [21] T"
The reason I think it should be working is that, the following ignore rule works perfectly fine for white spaces:
!lineComment : '/' '/' { . } '\n' ;
!blockComment : '/' '*' { . | '*' } '*' '/' ;
and I'm just applying the above rule into my normal text parsing.
It doesn't work that way --
The EBNF looks very much like regular expressions but it does not work like regular expression at all -- what I mean is,
The line,
2022-01-18 11:33:21.9885 [21] These are strings that I need to ignore, until - MYKW - Start Active One: 1/18/2022 11:33:21 AM
If to match with regular expression, it can simply be:
([0-9.: -]+).*? - MYKW - Start ([^:]+):.*$
However, that cannot be directly translate into EBNF definition just like that, because the regular expression relies on the context in between each elements to ping point a match (e.g., the .*? matching rule is a local rule that only works based on the context it is in), however, gocc is a LR parser, which is a context-free grammar!!!
Basically a context-free grammar means, each time it is trying to do a .* match to all existing lexical symbols (i.e., each lexical symbol can be considered a global rule that is not affected by the context it is in). I cannot quite describe it but there is no previous context (or the symbol following it) involved in next match. That's the reason why the OP fails.
For a real sample of how the '{.}' can be used, see
How to describe this event log in formal BNF?

Need an example for how Go syntax for assignment operator uses the grammar rules specified using EBNF

As mentioned in the docs, syntax in Go is specified using Extended Backus-Naur Form (EBNF):
Production = production_name "=" [ Expression ] "." .
Expression = Alternative { "|" Alternative } .
Alternative = Term { Term } .
Term = production_name | token [ "…" token ] | Group | Option | Repetition .
Group = "(" Expression ")" .
Option = "[" Expression "]" .
Repetition = "{" Expression "}" .
I am trying to understand how Go syntax grammar is defined, how to breakdown/derive/understand the expression i++ and i+=1 using these grammar rules. How would these production rules be substituted step by step for the purpose of illustration?
The expression i++ uses the grammar rule for IncDec statements:
IncDecStmt = Expression ( "++" | "--" ) .
Here, production_name would be IncDecStmt and Term would be "++" or "--".

Is there used incorrect terminology in description of a compile error as to 'for' syntax in golang?

I tried to use something like
for i := 0; i < len(bytes); ++i {
...
}
It is not correct and I got an error
syntax error: unexpected ++, expecting expression
It was because of ++i is not an expression I thought.
Then I found out that i++ (it works in for loop) is not an expression as well according to the documentation.
Also I met that in some cases (now I think in all cases) a statement can not be used instead of expression.
Now if we come back to the error we see that for loop requires an expression. I was confused with that. I checked one more part of the documentation it turns out for requires a statement.
For statements with for clause
A "for" statement with a ForClause is also controlled by its
condition, but additionally it may specify an init and a post
statement
I started with question (which I liked more than the final question because it was about language non-acquaintance as I thought)
Is it special case for loop syntax that statement are accepted as expression or are there other cases in golang?
During writing the question and checking the documentation I end up to a questions
Is there used incorrect terminology in description of the error that should be fixed not to confuse? Or is it normally in some cases to substitute such terms as statement and expression?
The Go Programming Language Specification
Primary expressions
Primary expressions are the operands for unary and binary expressions.
PrimaryExpr =
Operand |
Conversion |
PrimaryExpr Selector |
PrimaryExpr Index |
PrimaryExpr Slice |
PrimaryExpr TypeAssertion |
PrimaryExpr Arguments .
Selector = "." identifier .
Index = "[" Expression "]" .
Slice = "[" [ Expression ] ":" [ Expression ] "]" |
"[" [ Expression ] ":" Expression ":" Expression "]" .
TypeAssertion = "." "(" Type ")" .
Arguments = "(" [ ( ExpressionList | Type [ "," ExpressionList ] ) [ "..." ] [ "," ] ] ")" .
Operators and punctuation
The following character sequences represent operators:
++
--
Operators
Operators combine operands into expressions.
Expression = UnaryExpr | Expression binary_op Expression .
UnaryExpr = PrimaryExpr | unary_op UnaryExpr .
binary_op = "||" | "&&" | rel_op | add_op | mul_op .
rel_op = "==" | "!=" | "<" | "<=" | ">" | ">=" .
add_op = "+" | "-" | "|" | "^" .
mul_op = "*" | "/" | "%" | "<<" | ">>" | "&" | "&^" .
unary_op = "+" | "-" | "!" | "^" | "*" | "&" | "<-" .
Operator precedence
The ++ and -- operators form statements, not expressions.
IncDec statements
The "++" and "--" statements increment or decrement their operands by
the untyped constant 1. As with an assignment, the operand must be
addressable or a map index expression.
IncDecStmt = Expression ( "++" | "--" ) .
++ and -- are operators. The ++ and -- operators form statements, not expressions.
IncDecStmt = Expression ( "++" | "--" ) .
When the compiler encounters an ++ operator, it expects it to be immediately preceded by an expresssion.
For example,
package main
func main() {
// syntax error: unexpected ++, expecting expression
for i := 0; i < 1; ++i {}
}
Playground: https://play.golang.org/p/y2d9ijeMdw
Output:
main.go:6:21: syntax error: unexpected ++, expecting expression
The compiler complains about the syntax. It found a ++ operator without an immediately preceding expression: syntax error: unexpected ++, expecting expression.
The Go Spec says the post statement of a for clause accepts (among other things) a IndDec statement.
The IncDec statement is defined as: IncDecStmt = Expression ( "++" | "--" ) .
The parser finds an IndDec statement but an empty expression and thus spits out the error "expecting expression".
Edit: this probably fails because the fallback node to parse for a SimplStmt is an expression. The IncDecStmt failed, so it moves on to the default. The error accurately reflects the latest error that is bubbled up.
While the error message is correct, it is a little bit misleading. However, fixing it would involve passing more context about the current tree being parsed. eg: bad ForClause: bad PostStmt: bad SimpleStmt: expected expression.
There's still the problem that the expected expression is the last error encountered. Before that, it failed to parse the IncDecStmt but that error is swallowed because it falls back on an expression. The same applies at higher levels of the tree.
Even without that problem it would be rather heavy-handed and probably even more confusing than the current error messages. You may want to ask for input from the Go folks though.

“IF ELSE” statement inside basic calculator

I’m trying to implement my own calculator with “IF ELSE” statements.
Here is the basic calculator example:
/* description: Parses end executes mathematical expressions. */
/* lexical grammar */
%lex
%%
\s+ /* skip whitespace */
[0-9]+("."[0-9]+)?\b return 'NUMBER'
"*" return '*'
"/" return '/'
"-" return '-'
"+" return '+'
"^" return '^'
"(" return '('
")" return ')'
"PI" return 'PI'
"E" return 'E'
<<EOF>> return 'EOF'
. return 'INVALID'
/lex
/* operator associations and precedence */
%left '+' '-'
%left '*' '/'
%left '^'
%left UMINUS
%start expressions
%% /* language grammar */
expressions
: e EOF
{return $1;}
;
e
: e '+' e
{$$ = $1+$3;}
| e '-' e
{$$ = $1-$3;}
| e '*' e
{$$ = $1*$3;}
| e '/' e
{$$ = $1/$3;}
| e '^' e
{$$ = Math.pow($1, $3);}
| '-' e %prec UMINUS
{$$ = -$2;}
| '(' e ')'
{$$ = $2;}
| NUMBER
{$$ = Number(yytext);}
| E
{$$ = Math.E;}
| PI
{$$ = Math.PI;}
;
I don’t understand if I add the “IF” statements like this:
IfStatement
: "IF" "(" Expression ")" Statement
{
$$ = new IfStatementNode($3, $5, null, createSourceLocation(null, #1, #5));
}
| "IF" "(" Expression ")" Statement "ELSE" Statement
{
$$ = new IfStatementNode($3, $5, $7, createSourceLocation(null, #1, #7));
}
;
The parser generates well.
So how I can use the statement like this IF(5>2)THEN (5+2) ELSE (5*2).
The calculator’s functionality works well of course, but “IF” doesn’t.
It seems that you are looking for two sorts of constructs: an IF statement and an IF expression. Fortunately, your example uses the THEN keyword to distinguish them. Your IF expression production would be something like:
IfExpression
: "IF" "(" Expression ")" "THEN" "(" Expression ")"
{
$$ = new IfExpressionNode(/* pass arguments as desired */);
}
| "IF" "(" Expression ")" "THEN" "(" Expression ")" "ELSE" "(" Expression ")"
{
$$ = new IfExpressionNode(/* arguments */);
}
;
You don't show how your two pieces of grammar are bound together, so it's hard to answer. Have you also looked at other questions, such as Reforming the grammar to remove shift reduce conflict in if-then-else?

ANTLR comment problem

I am trying to write a comment matching rule in ANTLR, which is currently the following:
LINE_COMMENT
: '--' (options{greedy=false;}: .)* NEWLINE {Skip();}
;
NEWLINE : '\r'|'\n'|'\r\n' {Skip();};
This code works fine except in the case that a comment is the last characters of a file, in which case it throws a NoViableAlt exception. How can i fix this?
Why not:
LINE_COMMENT : '--' (~ NEWLINE)* ;
fragment NEWLINE : '\r' '\n'? | '\n' ;
If you haven't come across this yet, lexical rules (all uppercase) can only consist of constants and tokens, not other lexemes. You need a parser rule for that.
I'd go for:
LINE_COMMENT
: '--' ~( '\r' | '\n' )* {Skip();}
;
NEWLINE
: ( '\r'? '\n' | '\r' ) {Skip();}
;

Resources