XText Validator shows Parse Error in wrong line - validation

I am currently developing a small dsl with the following (shortend) grammar:
grammar mydsl with org.eclipse.xtext.common.Terminals hidden(WS, SL_COMMENT)
generate mydsl "uri::mydsl"
CommandSet:
(commands+=Command)*
;
Command:
(commandName=CommandName LBRACKET (args=ArgumentList)? RBRACKET EOL ) |
;
terminal LBRACKET:
'('
;
terminal RBRACKET:
')'
;
terminal EOL:
';'
;
As you can see, I use a semicolon as a EOL seperator and it works just fine for me. The problem occurs with the built-in syntax validator when working with the dsl in eclipse. When I miss a semicolon, the validator throws an syntax error in the wrong line:
Is there an error with my grammar? Thanks ;)

Here is a small DSL loosely based on your example. Basically, I do not consider linebreaks as "hidden" any longer (i.e. they will no longer be ignored by the parser), only the whitespaces. Note new terminals MY_WS and MY_NL as well as modified hidden statement in the grammar header (I also added some comments at relevant places). This approach just gives you some general idea and you can experiment with it to achieve what you want. Note, that if linebreaks are no longer hidden, you will need to take account of them in your grammar rules.
grammar org.xtext.example.mydsl.MyDsl
with org.eclipse.xtext.common.Terminals
hidden( MY_WS, SL_COMMENT ) // ---> hide whitespaces and comments only, not linebreaks!
generate mydsl "uri::mydsl"
CommandSet:
(commands+=Command)*
;
CommandName:
name=ID
;
ArgumentList:
arguments += STRING (',' STRING)*
;
Command:
(commandName=CommandName LBRACKET (args=ArgumentList)? RBRACKET EOL);
terminal LBRACKET:
'('
;
terminal RBRACKET:
')'
;
terminal EOL:
';' MY_NL? // ---> now an optional linebreak at the end!
;
terminal MY_WS: (' '|'\t')+; // ---> whitespace characters (formerly part of WS)
terminal MY_NL: ('\r'|'\n')+; // ---> linebreak characters (no longer hidden)
Here is an image demonstrating the resulting behavior.

Related

Why does CMD ignore the character `;`?

I'm just wondering. When I type ; in cmd, it will just ignore it.
I can type ;;;;;;;;;;;;;;; and it will do the same thing but, if I do ;a it will say error.
Why is that?
; is a delimiter.
Delimiters separate one parameter from the next - they split the command line up into words.
More info on https://ss64.com/nt/syntax-esc.html
The semicolon is not ignored by cmd.exe; rather is it even particularly recognised, namely as a token separator, which are used to separate commands from its arguments and arguments from each other. Here are all such characters:
SPACE (code 0x20)
TAB (horizontal tabulator, code 0x09)
, (comma, code 0x2C)
; (semicolon, code 0x3B)
= (equal-to sign, code 0x3D)
VTAB (vertical tabulator, code 0x0B)
FF (form-feed or page-break, code 0x0C)
NBSP (non-breaking space, code 0xFF)
Note that multiple consecutive token separators are collapsed to a single one.
Command prompt does not ignore the character ";", ";" is a delimeter and cmd recognizes it as so so it doesn't "ignore" the character, but reads it similar to a space so nothing appears when you write it alone.

Strange behaviour for comments in Antlr4 grammar

When adding a comment line under ID is ok, however adding one under WS, causes an error to be raised. Entire file Hello.g4 listed below.
/**
* Define a grammar called Hello
*/
grammar Hello;
r : 'hello' ID ; // match keyword hello followed by an identifier
ID : [a-z]+ ; // match lower-case identifiers
/**********************************************************************************************/
WS : [ \t\r\n]+ -> skip ; // skip spaces, tabs, newlines
/**********************************************************************************************/
The output i get in the console is as below:
ANTLR Tool v4.4 (/tmp/antlr-4.4-complete.jar)
Hello.g4 -o /home/me/workspace/TestComment/target/generated-sources/antlr4 -listener -no-visitor -encoding UTF-8
error(50): Hello.g4:13:0: syntax error: '<EOF>' came as a complete surprise to me
1 error(s)
BUILD FAIL
Total time: 168 millisecond(s)
Running Eclipse Version: Neon.3 Release (4.6.3), Default ANTLR4 project.
Why should ANTLR4, care about a trailing comment line ?
The ANTLR 4 grammar defines JavaDoc comments as optionally allowed as a header and on each rule. No rule follows the last 'comment line', so it is interpreted an invalid beginning of a rule.
Change your comment line to /*----*/ to avoid the error.

Unexpected behavior with ANTLR3

I am experiencing an unexpected behavior with ANTLR3. This is my grammar:
grammar Onto;
****parser rules****
predicate
: VERB
;
****lexer rules****
VERB
: 'VB' WS
;
PREPOSITION
: 'TO' WS
;
WS
: (' ' | '\t' | '\r'| '\n')
;
When I parse the string "VB TO", ANTLR3 exits without flagging an error. This is unexpected because the given string does not match any rule in the grammar.
However when I retry the same after removing the PREPOSITION rule from the grammar, ANTLR3 flags the following error which is the expected result:
line 1:3 no viable alternative at character 'T'
line 1:4 no viable alternative at character 'O'
You made the classic mistake. Your main rule has no EOF at the end, so your parser currently also matches only a part of your input and sees that as valid. In your case it matches VERB and then expects nothing more. That PREPOSITION matches your "TO" input is part of the behavior as this returns the PREPOSIITON token to the parser. But since the parser is already happy with the VERB input it considers the parse done successfully.
Without the PREPOSITION lexer rule however, the lexer returns an error token as it cannot match that input. Which is what the error above is about.

XTEXT: Controlling when whitespace is allowed

I have a custom scripting language, that I am attempting to use XTEXT for syntax checking. It boils down to single line commands in the format
COMMAND:PARAMETERS
For the most part, xtext is working great. The only problem I have currently run into is how to handle wanted (or unwanted) white spaces. The language cannot have a space to begin a line, and there cannot be a space following the colon. As well, I need to allow white space in the parameters, as it could be a string of text, or something similar.
I have used a datatype to allow white space in the parameter:
UNQUOTED_STRING:
(ID | INT | WS | '.' )+
;
This works, but has the side effect of allowing spaces throughout the line.
Does anyone know a way to limit where white spaces are allowed?
Thanks in advance for any advice!
You can disallow whitespace globally for your grammar by using an empty set of hidden tokens, e.g.
grammar org.xyz.MyDsl with org.eclipse.xtext.common.Terminals hidden()
Then you can enable it at specific rules, e.g.
XParameter hidden(WS):
'x' '=' value=ID
;
Note that this would allow linebreaks as well. If you don't want that you can either pass a custom terminal rule or overwrite the default WSrule.
Here is a more complete example (not perfect):
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals hidden()
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
(commands+=Command '\r'? '\n')+
;
Command:
SampleCommand
;
SampleCommand:
command='get' ':' parameter=Parameter
;
Parameter:
'{' x=XParameter '}'
;
XParameter hidden(WS):
'x' '=' value=ID
;
This will parse commands such as:
get:{x=TEST}
get:{ x = TEST}
But will reject:
get:{x=TEST}
get: {x=TEST}
Hope that gives you an idea. You can also do this the other way around by limiting the whitespace only for certain rules, e.g.
CommandList hidden():
(commands+=Command '\r'? '\n')+
;
If that works better for your grammar.

GoldParser: Accept programs not ending with an empty line

I'm rewriting a GoldParser Grammar for VBScript. In VBScript Statements are terminated using either a newline or ':'. Therefore i use the following terminal:
NewLine = {All Newline}
| ':'
Because every statement has to end with the Newline terminal, only programs ending with an empty line are accepted. How can i extend the newline terminal to also accept programs not ending with an empty line? I tried the following:
NewLine = {All Newline}
| ':'
| {EOF}
This does not work because the {EOF} (End of File) group does not exist.
EOF is a special token and I'm not aware of any syntax allowing you to use it in a production rule. It is emitted when the tokenizer receives no more data, and as such it is not a control character you could use in a terminal definition either.
That being said, you have different possibilities to parse the (strictly speaking invalid) input. The simplest may be to just append a newline at the end of the string or text being tokenized. While this will not make it parse correctly in the GOLD Builder test window, it will make your code process the data as expected and it will not add complexity to the grammar.

Resources