I have a custom scripting language, that I am attempting to use XTEXT for syntax checking. It boils down to single line commands in the format
COMMAND:PARAMETERS
For the most part, xtext is working great. The only problem I have currently run into is how to handle wanted (or unwanted) white spaces. The language cannot have a space to begin a line, and there cannot be a space following the colon. As well, I need to allow white space in the parameters, as it could be a string of text, or something similar.
I have used a datatype to allow white space in the parameter:
UNQUOTED_STRING:
(ID | INT | WS | '.' )+
;
This works, but has the side effect of allowing spaces throughout the line.
Does anyone know a way to limit where white spaces are allowed?
Thanks in advance for any advice!
You can disallow whitespace globally for your grammar by using an empty set of hidden tokens, e.g.
grammar org.xyz.MyDsl with org.eclipse.xtext.common.Terminals hidden()
Then you can enable it at specific rules, e.g.
XParameter hidden(WS):
'x' '=' value=ID
;
Note that this would allow linebreaks as well. If you don't want that you can either pass a custom terminal rule or overwrite the default WSrule.
Here is a more complete example (not perfect):
grammar org.xtext.example.mydsl.MyDsl with org.eclipse.xtext.common.Terminals hidden()
generate myDsl "http://www.xtext.org/example/mydsl/MyDsl"
Model:
(commands+=Command '\r'? '\n')+
;
Command:
SampleCommand
;
SampleCommand:
command='get' ':' parameter=Parameter
;
Parameter:
'{' x=XParameter '}'
;
XParameter hidden(WS):
'x' '=' value=ID
;
This will parse commands such as:
get:{x=TEST}
get:{ x = TEST}
But will reject:
get:{x=TEST}
get: {x=TEST}
Hope that gives you an idea. You can also do this the other way around by limiting the whitespace only for certain rules, e.g.
CommandList hidden():
(commands+=Command '\r'? '\n')+
;
If that works better for your grammar.
Related
Bracket expressions within case patterns seem to disallow [() &;]. However I can't seem to find any such restrictions (or escaping workarounds) in the POSIX shell spec, or in the bash manual for that matter.
case '&' in
# *[&]*) echo y ;; # won't parse
*[\&]*) echo y ;; # will parse & work
esac
# similar for ';', ' ', '(', ')'
# not a problem for ${var#[&; ()]}
This is in a sh shell script function that can't afford to call external utilities (but I'm curious about bash too). So... is there any spec that describes backslash-ing these characters within a bracket expression pattern?
No, I don't think it is explicitly documented anywhere.
But it can be deduced that Token Recognition Rule 6 is applied while the pattern list is being parsed. That is, unless quoted, control operators, redirection operators, and end of input are recognized as operators, and delimit a pattern. The shell expects | (indicates that another pattern follows) or ) (marks the end of the pattern list) to do that; and anything else causes a parse error.
As square brackets have no special meaning to the parser during tokenization, whether an operator occurs between them is irrelevant. And ${var#[&; ()]} is a different case; covered in Token Recognition Rule 5 and Parameter Expansion.
I am currently developing a small dsl with the following (shortend) grammar:
grammar mydsl with org.eclipse.xtext.common.Terminals hidden(WS, SL_COMMENT)
generate mydsl "uri::mydsl"
CommandSet:
(commands+=Command)*
;
Command:
(commandName=CommandName LBRACKET (args=ArgumentList)? RBRACKET EOL ) |
;
terminal LBRACKET:
'('
;
terminal RBRACKET:
')'
;
terminal EOL:
';'
;
As you can see, I use a semicolon as a EOL seperator and it works just fine for me. The problem occurs with the built-in syntax validator when working with the dsl in eclipse. When I miss a semicolon, the validator throws an syntax error in the wrong line:
Is there an error with my grammar? Thanks ;)
Here is a small DSL loosely based on your example. Basically, I do not consider linebreaks as "hidden" any longer (i.e. they will no longer be ignored by the parser), only the whitespaces. Note new terminals MY_WS and MY_NL as well as modified hidden statement in the grammar header (I also added some comments at relevant places). This approach just gives you some general idea and you can experiment with it to achieve what you want. Note, that if linebreaks are no longer hidden, you will need to take account of them in your grammar rules.
grammar org.xtext.example.mydsl.MyDsl
with org.eclipse.xtext.common.Terminals
hidden( MY_WS, SL_COMMENT ) // ---> hide whitespaces and comments only, not linebreaks!
generate mydsl "uri::mydsl"
CommandSet:
(commands+=Command)*
;
CommandName:
name=ID
;
ArgumentList:
arguments += STRING (',' STRING)*
;
Command:
(commandName=CommandName LBRACKET (args=ArgumentList)? RBRACKET EOL);
terminal LBRACKET:
'('
;
terminal RBRACKET:
')'
;
terminal EOL:
';' MY_NL? // ---> now an optional linebreak at the end!
;
terminal MY_WS: (' '|'\t')+; // ---> whitespace characters (formerly part of WS)
terminal MY_NL: ('\r'|'\n')+; // ---> linebreak characters (no longer hidden)
Here is an image demonstrating the resulting behavior.
I've got some source code like the following where I call a function in C:
void myFunction (
&((int) table[1, 0]),
&((int) table[2, 0]),
&((int) table[3, 0])
);
...the only problem is that the function has >300 parameters (it's an auto-generated wrapper for initialising and calling a whole module; it was given to me and I cannot change it). And as you can see: I began accessing the array with a 1 instead of a 0... Great times, modifying all the 300 parameters, i.e. decrasing 300 x the x-coordinate of the array, by hand.
The solution I am looking for is how I could force sed to to do the work for me ;)
EDIT: Please note that the syntax above for accessing a two-dimensional array in C is wrong anyway! Of course it should be [1][0]... (so don't just copy-and-paste ;))
Basically, the command I came up with, was the following:
sed -r 's/(.*)(table\[)([0-9]+)(,)(.*)/echo "\1\2$((\3-1))\4\5"/ge' inputfile.c > outputfile.c
Well, this does not look very intuitive on the first sight - and I was missing good explanations for nearly every example I found.
So I will try to give a detailed explanation on this:
sed
--> basic command
-r
--> most examples you find are using -e; however, the -r parameter (only works with GNU sed) enables extended regular expressions and brings support for the + in a regex. It basically means "one or more matches".
's/input/output/ge'
--> this is the basic replacement syntax. It basically means "replace 'input' by 'output'". The /g is a "global" flag, i.e. sed will replace all occurences and not only the first one. You can add an additional e to execute the result in the bash. This is what we want to do here to handle the calculation.
(.*)
--> this matches "everthing" from the last match to the next match
(table\[)
--> the \ is to escape the bracket. This part of the expression will match Strings like table[
([0-9]+)
--> this one matches numbers with at least one digit, however, it can also match higher numbers with more than only one digit.
(,)
--> this simply matches the comma ,
(.*)
--> and again: the rest of the line
And now the interesting part:
echo "\1\2$((\3-1))\4\5"
the echo is a bash command
the \n (you can use every value from \1 up to \9) is some kind of "variable" for the inputs: \1 will contain the first match, \2 the seconds match, ... --> this helps you to preserve parts of the input string
the $((1+1)) is a simple bash syntax to calculate the value of the term inside the double brackets (in the complete sed command above, the \3 will of course be automatically replaced by the 3rd match, i.e. the 1st part inside the brackets to access the table's cells)
please note that we use quotation marks around the echo content to also be able to process lines with characters like & which would otherwise not work
The already mentioned e of \ge at the end will trigger the execution of the result in the bash. E.g. the first two lines of the example source code in the question would produce the following bash statements:
echo "void myFunction ("
echo " &((int) table[$((1-1)), 0]),"
which is being executed and results in the following output:
void myFunction (
&((int) table[0, 0]),
...which is exatcly what I wanted :)
BTW:
text > output.c
is simple bash syntax to output text (or in this case the sed-processed source code) to a file called output.c.
Good links about this topic are:
sed basics
regular expressions basics
Ahh and one more thing: You can also use sed in the git-Bash on Windows - if you are "forced" to use Windows at work like me ;)
PS: In the meantime I could have easily done this by hand but using sed was a lot more fun ;)
Here's another way you could do it, using Perl:
perl -pe 's/(table\[)(\d+)(,)/$1.($2-1).$3/e' file.c
This uses the e modifier to execute an expression in the replacement. The capture groups are concatenated together but the middle group has 1 subtracted from its value.
This will output to standard output so you can check that it does what you want. When you're happy, you can add the -i switch to overwrite the original file.
I'm rewriting a GoldParser Grammar for VBScript. In VBScript Statements are terminated using either a newline or ':'. Therefore i use the following terminal:
NewLine = {All Newline}
| ':'
Because every statement has to end with the Newline terminal, only programs ending with an empty line are accepted. How can i extend the newline terminal to also accept programs not ending with an empty line? I tried the following:
NewLine = {All Newline}
| ':'
| {EOF}
This does not work because the {EOF} (End of File) group does not exist.
EOF is a special token and I'm not aware of any syntax allowing you to use it in a production rule. It is emitted when the tokenizer receives no more data, and as such it is not a control character you could use in a terminal definition either.
That being said, you have different possibilities to parse the (strictly speaking invalid) input. The simplest may be to just append a newline at the end of the string or text being tokenized. While this will not make it parse correctly in the GOLD Builder test window, it will make your code process the data as expected and it will not add complexity to the grammar.
The following script is showing me "unexpected end of file" error. I have no clue why am I facing this error. My all the quotes are closed properly.
#!/usr/bin/sh
insertsql(){
#sqlite3 /mnt/rd/stats_flow_db.sqlite <<EOF
echo "insert into flow values($1,$2,$3,$4,$5,$6,$7,$8,$9,$10,$11,$12,$13,$14,$15,$16,$17,$18)"
#.quit
}
for i in {1..100}
do
src_ip = "10.1.2."+$i
echo $src_ip
src_ip_octets = ${src_ip//,/}
src_ip_int = $src_ip_octets[0]*1<<24+$src_ip_octets[1]*1<<16+$src_ip_octets[2]*1<<8+$src_ip_octets[3]
dst_ip = "10.1.1."+$i
dst_ip_octets = ${dst_ip//,/}
dst_ip_int = $dst_ip_octets[0]*1<<24+$dst_ip_octets[1]*1<<16+$dst_ip_octets[2]*1<<8+$dst_ip_octets[3]
insertsql(1, 10000, $dst_ip, 20000, $src_ip, "2012-08-02,12:30:25.0","2012-08-02,12:45:25.0",0,0,0,"flow_a010105_a010104_47173_5005_1_50183d19.rrd",0,12,$src_ip_int,$dst_ip_int,3,50000000,80000000)
done
That error is caused by <<. When encountering that, the script tries to read until it finds a line which has exactly (starting in the first column) what is found after the <<. As that is never found, the script searches to the end and then complains that the file ended unexpectedly.
That will not be your only problem, however. I see at least the following other problems:
You can only use $1 to $9 for positional parameters. If you want to go beyond that, the use of the shift command is required or, if your version of the shell supports it, use braces around the variable name; e.g. ${10}, ${11}...
Variable assignments must not have whitespace arount the equal sign
To call your insertsql you must not use ( and ); you'd define a new function that way.
The cass to your insertsql function must pass the parameters whitespace separated, not comma separated.
A couple of problems:
There should be no space between equal sign and two sides of an assignment: e.g.,: dst_ip="10.1.1.$i"
String concatenation is not done using plus sign e.g., dst_ip="10.1.1.$i"
There is no shift operator in bash, no <<: $dst_ip_octets[0]*1<<24 can be done with expr $dst_ip_octets[0] * 16777216 `
Functions are called just like shell scripts, arguments are separated by space and no parenthesis: insertsql 1 10000 ...
That is because you don't follow shell syntax.
To ser variable you are not allowed to use space around = and to concatenate two parts of string you shouldn't use +. So the string
src_ip = "10.1.2."+$i
become
src_ip="10.1.2.$i"
Why you're using the string
src_ip_octets = ${src_ip//,/}
I don't know. There is absolutely no commas in you variable. So even to delete all commas it should look like (the last / is not required in case you're just deleting symbols):
src_ip_octets=${src_ip//,}
The next string has a lot of symbols that shell intepreter at its own way and that's why you get the error about unexpected end of file (especially due to heredoc <<)
src_ip_int = $src_ip_octets[0]*1<<24+$src_ip_octets[1]*1<<16+$src_ip_octets[2]*1<<8+$src_ip_octets[3]
So I don't know what exactly did you mean, though it seems to me it should be something like
src_ip_int=$(( ${src_ip_octets%%*.}+$(echo $src_ip_octets|sed 's/[0-9]\+\.\(\[0-9]\+\)\..*/\1/')+$(echo $src_ip_octets|sed 's/\([0-9]\+\.\)\{2\}\(\[0-9]\+\)\..*/\1/')+${src_ip_octets##*.} ))
The same stuff is with the next strings.
You can't do this:
dst_ip_int = $dst_ip_octets[0]*1<<24+$dst_ip_octets[1]*1<<16+$dst_ip_octets[2]*1<<8+$dst_ip_octets[3]
The shell doesn't do math. This isn't C. If you want to do this sort of calculation, you'll need to use something like bc, dc or some other tool that can do the sort of math you're attempting here.
Most of those operators are actually shell metacharacters that mean something entirely different. For example, << is input redirection, and [ and ] are used for filename globbing.