Unordered result because of an ambiguous grammar using Flex and Bison - compilation

I'm trying to create a section for variable declaration (a bit similar to HTML) using Flex and Bison, my grammar is correct (no lexical or syntax errors), but the displayed result isn't ordered.
example.txt:
<SUB VARIABLE>
< a AS INT />;
<string AS STR />;
< x | y AS FLT />;
<bool AS BOL />;
<char AS CHR />;
</SUB VARIABLE>
the result I get (the incorrect one):
a ---> 1
x ---> 2
y ---> 2
string ---> 4
char ---> 3
bool ---> 5
the result I want to display (the correct one):
a ---> 1
string ---> 4
x ---> 2
y ---> 2
bool ---> 5
char ---> 3
Here's my code:
synt.y:
DECLARATION: DECLARATION '<' SUB VARIABLE '>' SUITE
|
;
SUITE: '<' idf SUITE_VAR {inserer($2,getType());}
| '<' '/' SUB VARIABLE '>'
;
SUITE_VAR: '|' idf SUITE_VAR {inserer($2, getType());}
| AS INT '/' '>' ';' SUITE {setType(1);}
| AS FLT '/' '>' ';' SUITE {setType(2);}
| AS CHR '/' '>' ';' SUITE {setType(3);}
| AS STR '/' '>' ';' SUITE {setType(4);}
| AS BOL '/' '>' ';' SUITE {setType(5);}
;
My grammar may be ambiguous, I tried many other grammars but I had the same problem.
Could you please tell me how I should write my grammar to have an ordered result?
Thanks a lot.

This is not a mistake in your grammar. It's the semantics of the inserer function where you probably have an issue.

Related

Bug in a simple parser specification in F#

I wonder where the parser specification below went wrong. The parser aims to parse and evaluate an expression like 2+3*4 to 14. It is to be run with FsLexYacc.
%{
%}
%token <int> CSTINT
%token PLUS MINUS MUL
%token LPAR RPAR
%token EOF
%left MINUS PLUS /* lowest precedence */
%left TIMES DIV /* highest precedence */
%start Main
%type int Main
%%
Main:
Expr EOF { $1 }
;
Expr:
| CSTINT { $1 }
| MINUS CSTINT { - $2 }
| LPAR Expr RPAR { $2 }
| Expr MUL Expr { $1 * $3 }
| Expr PLUS Expr { $1+$3 }
| Expr MINUS Expr { $1-$3 }
;
I got the error
ExprPar.fsy(18,0): error: Unexpected character '%'%
The line 18 refers to the line up before "Main". Where is the bug?
I believe the type specified by %type should be in angle brackets:
%type <int> Main

Grepping particular pattern using sed command

I have one file (Let say a.txt) whose contents is as shown below. I want to grep only errors name only before colon (:) like DK2.a.Iq_abc_vu, LAP.ABCD.1 but not grep "11xAB2_B_1" error as violation value is 0 except there is one special case mentioned at last of the question. We have to grep only those errors whose value is non zero (like DK2.a.Iq_abc_vu,LAP.ABCD.1 but not 11xAB2_B_1 as it value is showing 0 violations). The format of a.txt file is remain same across different files also. Here there is one special case when "violation" word is coming in that case we have grep "text_abcd" and "text_jkl" as error not "violation". Can you please help me how can grep I these errors as shown in below output.
$ cat a.txt file
DK2.a.Iq_abc_vu : To avoid > 500 um x 500.0 um Metal empty space after IP abutment empty space must on IP boundary corner
interacting ........................................ 1 violation found.
interacting ........................................ 1 violation found.
DM3.a.7.abc_vu : To avoid > 100.0 um x 100.0 um Metal empty space after TV boundary corner having some thing
interacting ........................................ 2 violations found.
LAP.ABCD.1 : Voltage high this is one type of error coming some thing violations. This error can be removed by providing spacing
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 10 violation found.
net_abcd:net_abcd .............................. 1 violation found.
11xAB2_B_1 : 10xAB area >= 100um2
not_inside ......................................... 0 violations found.
Violation
text_abcd:text_pqrs .......................... 2 violations found.
text_jkl:jkl_jkl ............................. 2 violations found.
Desired output:
DK2.a.Iq_abc_vu
DM3.a.7.abc_vu
LAP.ABCD.1
text_abcd
text_jkl
Assuming the answer doesn't have to be based on sed ...
We can use egrep to keep only those lines that meet one of the following criteria:
line contains a colon with leading/trailing space (:) or the word violation (case insensitive)
from the resulting lines we then discard lines that contain 0 violations
At this point we have:
$ egrep -i " : |violation" a.txt | egrep -v " 0 violations"
DK2.a.Iq_abc_vu : To avoid > 500 um x 500.0 um Metal empty space after IP abutment empty space must on IP boundary corner
interacting ........................................ 1 violation found.
interacting ........................................ 1 violation found.
DM3.a.7.abc_vu : To avoid > 100.0 um x 100.0 um Metal empty space after TV boundary corner having some thing
interacting ........................................ 2 violations found.
LAP.ABCD.1 : Voltage high this is one type of error coming some thing violations. This error can be removed by providing spacing
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 1 violation found.
net_abcd:net_abcd .............................. 10 violation found.
net_abcd:net_abcd .............................. 1 violation found.
11xAB2_B_1 : 10xAB area >= 100um2
Violation
text_abcd:text_pqrs .......................... 2 violations found.
text_jkl:jkl_jkl ............................. 2 violations found.
Now we can use awk to keep track of 2 types of errors:
(1) line contains [space]:[space], so we store the error name and if the next line contains the string violation we print the error name and then clear the error name (to keep from printing it twice)
(2) line starts with ^Violation in which case we'll obtain/print the error name from each follow-on line that contains the string violation (error name is the portion of the line before the :)
The awk code to implement this looks like:
awk '
/ : / { errname = $1 ; next }
/^Violation/ { errname = $1 ; next }
/violation/ { if ( errname == "Violation" ) { split($1,a,":") ; print a[1] ; next }
if ( errname != "" ) { print errname ; errname="" ; next }
}
'
Pulling the egrep and awk snippets together gives us:
$ egrep -i " : |violation" a.txt | egrep -v " 0 violations" | awk '
/ : / { errname = $1 ; next }
/^Violation/ { errname = $1 ; next }
/violation/ { if ( errname == "Violation" ) { split($1,a,":") ; print a[1] ; next }
if ( errname != "" ) { print errname ; errname="" ; next }
}
'
With the following results:
DK2.a.Iq_abc_vu
DM3.a.7.abc_vu
LAP.ABCD.1
text_abcd
text_jkl

Insert formatted text into VIM for every letter of alphabet

In VIM, for every letter of the English alphabet, I want to insert a line in the following format:
fragment {upper(letter)}: '{lower(letter)}' | '{upper(letter)'};
So, for example, for the letter a, it would be:
fragment A: 'a' | 'A';
Writing 26 lines like this is tedious, and you shouldn't repeat yourself. How can I do that?
In vim:
for i in range(65,90) " ASCII codes
let c = nr2char(i) " Character
echo "fragment" c ": '"tolower(c)"' | '" c "';"
endfor
Or as a oneliner:
:for i in range(65,90) | let c = nr2char(i) | echo "fragment" c ": '"tolower(c)"' | '" c "';" | endfor
fragment A : ' a ' | ' A ';
fragment B : ' b ' | ' B ';
fragment C : ' c ' | ' C ';
...
fragment X : ' x ' | ' X ';
fragment Y : ' y ' | ' Y ';
fragment Z : ' z ' | ' Z ';
Use :redir #a to copy that output to register a.
Here's one way.
First, I'm gonna create the text in bash with a single command, then I'll tell VIM to insert the output of that command into the file.
I need to iterate through English alphabets, and for every letter, echo one line in the specified format. So at first, let's just echo each letter in a single line (By using a for loop):
❯ alphabets="abcdefghijklmnopqrstuvwxyz"
❯ for ((i=0; i<${#alphabets}; i++)); do echo "${alphabets:$i:1}"; done
a
b
...
z
The way this works is:
${#alphabets} is equal to the length of the variable alphabets.
${alphabets:$i:1} extracts the letter at position i from the variable alphabets (zero-based).
Now we need to convert these letters to upper case. Here's one way we can achieve this:
❯ echo "a" | tr a-z A-Z
A
Now if we apply this to the for loop we had, we get this:
❯ for ((i=0; i<${#alphabets}; i++)); do echo "${alphabets:$i:1}" | tr a-z A-Z; done
A
B
...
Z
From here, it's quite easy to produce the text we wanted:
❯ for ((i=0; i<${#alphabets}; i++)); do c="${alphabets:$i:1}"; cap=$(echo "${c}" | tr a-z A-Z); echo "fragment ${cap}: '${c}' | '${cap}';"; done
fragment A: 'a' | 'A';
fragment B: 'b' | 'B';
...
fragment Z: 'z' | 'Z';
Now that we generated the text, we can simply use :r !command to insert the text into vim:
:r !alphabets="abcdefghijklmnopqrstuvwxyz"; for ((i=0; i<${\#alphabets}; i++)); do c="${alphabets:$i:1}"; cap=$(echo "${c}" | tr a-z A-Z); echo "fragment ${cap}: '${c}' | '${cap}';"; done
Note that # is a special character in vim and should be spaced using \.
Here's another one-liner that does the same thing, and I believe is more intuitive:
for c in {a..z}; do u=$(echo ${c} | tr a-z A-Z); echo "fragment ${u}: '${c}' | '${u}';"; done

Processing lines of file in Ruby

I have some file like this
file alldataset; append next;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
and I am trying to write a ruby program to push any line that comes after a semi colon to a new line. In addition, if a line has a 'do', indent from the 'do' so that the following line is indented by two blanks and any inner 'do' be indented by 4 blanks and so on.
I am very new to Ruby and my code so far is quite away from what I want. This is what I have
def indent(text, num)
" "*num+" " + text
end
doc = File.open('newtext.txt')
doc.to_a.each do |line|
if line.downcase =~ /^(file).+(;)/i
puts line+"\n"
end
if line.downcase.include?('do')
puts indent(line, 2)
end
end
This is the desired output
file alldataset;
append next;
if file.first? do
line + "\n";
if !file.last? do
line.indent(2);
end;
end;
Any help would be appreciated.
As you are interested in parsing, here is a quickly made example, just to give you a taste. I have learned Lex/Yacc, Flex/Bison, ANTLR v3 and ANTLR v4. I strongly recommend ANTLR4 which is so powerful. References :
the ANTLR site
The ANTLR mega tutorial
the expert book
StackOverflow -> Tags -> antlr
The following grammar can parse only the input example you have provided.
File Question.g4 :
grammar Question;
/* Simple grammar example to parse the following code :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
*/
start
#init {System.out.println("Question last update 1048");}
: file* EOF
;
file
: FILE ID ';' statement_p*
;
statement_p
: statement
{System.out.println("Statement found : " + $statement.text);}
;
statement
: 'append' ID ';'
| if_statement
| other_statement
| 'end' ';'
;
if_statement
: 'if' expression 'do' expression ';'
;
other_statement
: ID ';'
;
expression
: receiver=( ID | FILE ) '.' method_call # Send
| expression '+' expression # Addition
| '!' expression # Negation
| atom # An_atom
;
method_call
: method_name=ID arguments?
;
arguments
: '(' ( argument ( ',' argument )* )? ')'
;
argument
: ID | NUMBER
;
atom
: ID
| FILE
| STRING
;
FILE : 'file' ;
ID : LETTER ( LETTER | DIGIT | '_' )* ( '?' | '!' )? ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
STRING : '"' .*? '"' ;
NL : ( [\r\n] | '\r\n' ) -> skip ;
WS : [ \t]+ -> channel(HIDDEN) ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
File input.txt :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
Execution :
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question start -tokens -diagnostics input.txt
[#0,0:0=' ',<WS>,channel=1,1:0]
[#1,1:4='file',<'file'>,1:1]
[#2,5:5=' ',<WS>,channel=1,1:5]
[#3,6:15='alldataset',<ID>,1:6]
[#4,16:16=';',<';'>,1:16]
[#5,17:17=' ',<WS>,channel=1,1:17]
[#6,18:23='append',<'append'>,1:18]
[#7,24:24=' ',<WS>,channel=1,1:24]
[#8,25:28='next',<ID>,1:25]
[#9,29:29=';',<';'>,1:29]
[#10,30:30=' ',<WS>,channel=1,1:30]
[#11,31:33='xyz',<ID>,1:31]
[#12,34:34=';',<';'>,1:34]
[#13,36:36=' ',<WS>,channel=1,2:0]
[#14,37:38='if',<'if'>,2:1]
[#15,39:39=' ',<WS>,channel=1,2:3]
[#16,40:43='file',<'file'>,2:4]
[#17,44:44='.',<'.'>,2:8]
[#18,45:50='first?',<ID>,2:9]
[#19,51:51=' ',<WS>,channel=1,2:15]
[#20,52:53='do',<'do'>,2:16]
[#21,54:54=' ',<WS>,channel=1,2:18]
[#22,55:58='line',<ID>,2:19]
[#23,59:59=' ',<WS>,channel=1,2:23]
[#24,60:60='+',<'+'>,2:24]
[#25,61:61=' ',<WS>,channel=1,2:25]
[#26,62:65='"\n"',<STRING>,2:26]
[#27,66:66=';',<';'>,2:30]
...
[#59,133:132='<EOF>',<EOF>,7:0]
Question last update 1048
Statement found : append next;
Statement found : xyz;
Statement found : if file.first? do line + "\n";
Statement found : if !file.last? do line.indent(2);
Statement found : end;
Statement found : end;
Statement found : xyz;
One advantage of ANTLR4 over previous versions or other parser generators is that the code is no longer scattered among the parser rules, but gathered in a separate listener. This is where you do the actual processing, such as producing a new reformatted file. It would be too long to show a complete example. Today you can write the listener in C++, C#, Python and others. As I don't know Java, I have a machinery using Jruby, see my forum answer.
In Ruby there are many ways to do things. So my solution is one among others.
File t.rb :
def print_indented(p_file, p_indent, p_text)
p_file.print p_indent
p_file.puts p_text
end
# recursively split the line at semicolon, as long as the rest is not empty
def partition_on_semicolon(p_line, p_answer, p_level)
puts "in partition_on_semicolon for level #{p_level} p_line=#{p_line} / p_answer=#{p_answer}"
first_segment, semi, rest = p_line.partition(';')
p_answer << first_segment + semi
partition_on_semicolon(rest.lstrip, p_answer, p_level + 1) unless rest.empty?
end
lines = IO.readlines('input.txt')
# Compute initial indentation, the indentation of the first line.
# This is to preserve the spaces which are in the input.
m = lines.first.match(/^( *)(.*)/)
initial_indent = ' ' * m[1].length
# initial_indent = '' # uncomment if the initial indentation needs not to be preserved
puts "initial_indent=<#{initial_indent}> length=#{initial_indent.length}"
level = 1
indentation = ' '
File.open('newtext.txt', 'w') do | output_file |
lines.each do | line |
line = line.chomp
line = line.lstrip # remove trailing spaces
puts "---<#{line}>"
next_indent = initial_indent + indentation * (level - 1)
case
when line =~ /^file/ && line.count(';') > 1
level = 1 # restore, remove this if files can be indented
next_indent = initial_indent + indentation * (level - 1)
# split in count fragments
fragments = []
partition_on_semicolon(line, fragments, 1)
puts '---fragments :'
puts fragments.join('/')
print_indented(output_file, next_indent, fragments.first)
fragments[1..-1].each do | fragment |
print_indented(output_file, next_indent + indentation, fragment)
end
level += 1
when line.include?(' do ')
fragment1, _fdo, fragment2 = line.partition(' do ')
print_indented(output_file, next_indent, "#{fragment1} do")
print_indented(output_file, next_indent + indentation, fragment2)
level += 1
else
level -= 1 if line =~ /end;/
print_indented(output_file, next_indent, line)
end
end
end
File input.txt :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
Execution :
$ ruby -w t.rb
initial_indent=< > length=1
---<file alldataset; append next; xyz;>
in partition_on_semicolon for level 1 p_line=file alldataset; append next; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=append next; xyz; / p_answer=["file alldataset;"]
in partition_on_semicolon for level 3 p_line=xyz; / p_answer=["file alldataset;", "append next;"]
---fragments :
file alldataset;/append next;/xyz;
---<if file.first? do line + "\n";>
---<if !file.last? do line.indent(2);>
---<end;>
---<end;>
---<file file2; xyz;>
in partition_on_semicolon for level 1 p_line=file file2; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=xyz; / p_answer=["file file2;"]
---fragments :
file file2;/xyz;
---<>
Output file newtext.txt :
file alldataset;
append next;
xyz;
if file.first? do
line + "\n";
if !file.last? do
line.indent(2);
end;
end;
file file2;
xyz;

Why does this //ID pass but //DEF fails?

I'm experimenting with the XPath using the grammar provided in the test suite and am having a problem with the path //ID being identified, but //DEF is not found. An IllegalArgumentException is thrown. "DEF at index 2 isn't a valid token name" Why is //ID matched, but //DEFnot?
String exprGrammar = "grammar Expr;\n" +
"prog: func+ ;\n" +
"func: DEF ID '(' arg (',' arg)* ')' body ;\n" +
"body: '{' stat+ '}' ;\n" +
"arg : ID ;\n" +
"stat: expr ';' # printExpr\n" +
" | ID '=' expr ';' # assign\n" +
" | 'return' expr ';' # ret\n" +
" | ';' # blank\n" +
" ;\n" +
"expr: expr ('*'|'/') expr # MulDiv\n" +
" | expr ('+'|'-') expr # AddSub\n" +
" | primary # prim\n" +
" ;\n" +
"primary" +
" : INT # int\n" +
" | ID # id\n" +
" | '(' expr ')' # parens\n" +
" ;" +
"\n" +
"MUL : '*' ; // assigns token name to '*' used above in grammar\n" +
"DIV : '/' ;\n" +
"ADD : '+' ;\n" +
"SUB : '-' ;\n" +
"RETURN : 'return' ;\n" +
"DEF: 'def';\n" +
"ID : [a-zA-Z]+ ; // match identifiers\n" +
"INT : [0-9]+ ; // match integers\n" +
"NEWLINE:'\\r'? '\\n' -> skip; // return newlines to parser (is end-statement signal)\n" +
"WS : [ \\t]+ -> skip ; // toss out whitespace\n";
String SAMPLE_PROGRAM =
"def f(x,y) { x = 3+4; y; ; }\n" +
"def g(x) { return 1+2*x; }\n";
Grammar g2 = new Grammar(exprGrammar);
LexerInterpreter g2LexerInterpreter = g2.createLexerInterpreter(new ANTLRInputStream(SAMPLE_PROGRAM));
CommonTokenStream tokens = new CommonTokenStream(g2LexerInterpreter);
ParserInterpreter parser = g2.createParserInterpreter(tokens);
parser.setBuildParseTree(true);
ParseTree tree = parser.parse(g2.rules.get("prog").index);
String xpath = "//DEF";
for (ParseTree t : XPath.findAll(tree, xpath, parser) ) {
System.out.println(t.getSourceInterval());
}
When I run your code, the following gets printed:
0..0
18..18
In other words:
;)
This XPath tree pattern matching is all rather new, so my guess is that you've stumbled upon a bug that has been fixed. I'm using ANTLR version 4.2.2

Resources