I have some file like this
file alldataset; append next;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
and I am trying to write a ruby program to push any line that comes after a semi colon to a new line. In addition, if a line has a 'do', indent from the 'do' so that the following line is indented by two blanks and any inner 'do' be indented by 4 blanks and so on.
I am very new to Ruby and my code so far is quite away from what I want. This is what I have
def indent(text, num)
" "*num+" " + text
end
doc = File.open('newtext.txt')
doc.to_a.each do |line|
if line.downcase =~ /^(file).+(;)/i
puts line+"\n"
end
if line.downcase.include?('do')
puts indent(line, 2)
end
end
This is the desired output
file alldataset;
append next;
if file.first? do
line + "\n";
if !file.last? do
line.indent(2);
end;
end;
Any help would be appreciated.
As you are interested in parsing, here is a quickly made example, just to give you a taste. I have learned Lex/Yacc, Flex/Bison, ANTLR v3 and ANTLR v4. I strongly recommend ANTLR4 which is so powerful. References :
the ANTLR site
The ANTLR mega tutorial
the expert book
StackOverflow -> Tags -> antlr
The following grammar can parse only the input example you have provided.
File Question.g4 :
grammar Question;
/* Simple grammar example to parse the following code :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
*/
start
#init {System.out.println("Question last update 1048");}
: file* EOF
;
file
: FILE ID ';' statement_p*
;
statement_p
: statement
{System.out.println("Statement found : " + $statement.text);}
;
statement
: 'append' ID ';'
| if_statement
| other_statement
| 'end' ';'
;
if_statement
: 'if' expression 'do' expression ';'
;
other_statement
: ID ';'
;
expression
: receiver=( ID | FILE ) '.' method_call # Send
| expression '+' expression # Addition
| '!' expression # Negation
| atom # An_atom
;
method_call
: method_name=ID arguments?
;
arguments
: '(' ( argument ( ',' argument )* )? ')'
;
argument
: ID | NUMBER
;
atom
: ID
| FILE
| STRING
;
FILE : 'file' ;
ID : LETTER ( LETTER | DIGIT | '_' )* ( '?' | '!' )? ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
STRING : '"' .*? '"' ;
NL : ( [\r\n] | '\r\n' ) -> skip ;
WS : [ \t]+ -> channel(HIDDEN) ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
File input.txt :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
Execution :
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question start -tokens -diagnostics input.txt
[#0,0:0=' ',<WS>,channel=1,1:0]
[#1,1:4='file',<'file'>,1:1]
[#2,5:5=' ',<WS>,channel=1,1:5]
[#3,6:15='alldataset',<ID>,1:6]
[#4,16:16=';',<';'>,1:16]
[#5,17:17=' ',<WS>,channel=1,1:17]
[#6,18:23='append',<'append'>,1:18]
[#7,24:24=' ',<WS>,channel=1,1:24]
[#8,25:28='next',<ID>,1:25]
[#9,29:29=';',<';'>,1:29]
[#10,30:30=' ',<WS>,channel=1,1:30]
[#11,31:33='xyz',<ID>,1:31]
[#12,34:34=';',<';'>,1:34]
[#13,36:36=' ',<WS>,channel=1,2:0]
[#14,37:38='if',<'if'>,2:1]
[#15,39:39=' ',<WS>,channel=1,2:3]
[#16,40:43='file',<'file'>,2:4]
[#17,44:44='.',<'.'>,2:8]
[#18,45:50='first?',<ID>,2:9]
[#19,51:51=' ',<WS>,channel=1,2:15]
[#20,52:53='do',<'do'>,2:16]
[#21,54:54=' ',<WS>,channel=1,2:18]
[#22,55:58='line',<ID>,2:19]
[#23,59:59=' ',<WS>,channel=1,2:23]
[#24,60:60='+',<'+'>,2:24]
[#25,61:61=' ',<WS>,channel=1,2:25]
[#26,62:65='"\n"',<STRING>,2:26]
[#27,66:66=';',<';'>,2:30]
...
[#59,133:132='<EOF>',<EOF>,7:0]
Question last update 1048
Statement found : append next;
Statement found : xyz;
Statement found : if file.first? do line + "\n";
Statement found : if !file.last? do line.indent(2);
Statement found : end;
Statement found : end;
Statement found : xyz;
One advantage of ANTLR4 over previous versions or other parser generators is that the code is no longer scattered among the parser rules, but gathered in a separate listener. This is where you do the actual processing, such as producing a new reformatted file. It would be too long to show a complete example. Today you can write the listener in C++, C#, Python and others. As I don't know Java, I have a machinery using Jruby, see my forum answer.
In Ruby there are many ways to do things. So my solution is one among others.
File t.rb :
def print_indented(p_file, p_indent, p_text)
p_file.print p_indent
p_file.puts p_text
end
# recursively split the line at semicolon, as long as the rest is not empty
def partition_on_semicolon(p_line, p_answer, p_level)
puts "in partition_on_semicolon for level #{p_level} p_line=#{p_line} / p_answer=#{p_answer}"
first_segment, semi, rest = p_line.partition(';')
p_answer << first_segment + semi
partition_on_semicolon(rest.lstrip, p_answer, p_level + 1) unless rest.empty?
end
lines = IO.readlines('input.txt')
# Compute initial indentation, the indentation of the first line.
# This is to preserve the spaces which are in the input.
m = lines.first.match(/^( *)(.*)/)
initial_indent = ' ' * m[1].length
# initial_indent = '' # uncomment if the initial indentation needs not to be preserved
puts "initial_indent=<#{initial_indent}> length=#{initial_indent.length}"
level = 1
indentation = ' '
File.open('newtext.txt', 'w') do | output_file |
lines.each do | line |
line = line.chomp
line = line.lstrip # remove trailing spaces
puts "---<#{line}>"
next_indent = initial_indent + indentation * (level - 1)
case
when line =~ /^file/ && line.count(';') > 1
level = 1 # restore, remove this if files can be indented
next_indent = initial_indent + indentation * (level - 1)
# split in count fragments
fragments = []
partition_on_semicolon(line, fragments, 1)
puts '---fragments :'
puts fragments.join('/')
print_indented(output_file, next_indent, fragments.first)
fragments[1..-1].each do | fragment |
print_indented(output_file, next_indent + indentation, fragment)
end
level += 1
when line.include?(' do ')
fragment1, _fdo, fragment2 = line.partition(' do ')
print_indented(output_file, next_indent, "#{fragment1} do")
print_indented(output_file, next_indent + indentation, fragment2)
level += 1
else
level -= 1 if line =~ /end;/
print_indented(output_file, next_indent, line)
end
end
end
File input.txt :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
Execution :
$ ruby -w t.rb
initial_indent=< > length=1
---<file alldataset; append next; xyz;>
in partition_on_semicolon for level 1 p_line=file alldataset; append next; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=append next; xyz; / p_answer=["file alldataset;"]
in partition_on_semicolon for level 3 p_line=xyz; / p_answer=["file alldataset;", "append next;"]
---fragments :
file alldataset;/append next;/xyz;
---<if file.first? do line + "\n";>
---<if !file.last? do line.indent(2);>
---<end;>
---<end;>
---<file file2; xyz;>
in partition_on_semicolon for level 1 p_line=file file2; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=xyz; / p_answer=["file file2;"]
---fragments :
file file2;/xyz;
---<>
Output file newtext.txt :
file alldataset;
append next;
xyz;
if file.first? do
line + "\n";
if !file.last? do
line.indent(2);
end;
end;
file file2;
xyz;
i have problems with my grammar code in antlr3.5 . My input file is
` define tcpChannel ChannelName
define
listener
ListnerProperty
end
listener ;
define
execution
request with format RequestFormat,
response with format ResponseFormat,
error with format ErrorFormat ,
call servicename.executionname
end define execution ;
end
define channel ;
`
My lexer code is as follows:
lexer grammar ChannelLexer;
// ***************** lexer rules:
Define
:
'define'
;
Tcpchannel
:
'tcphannel'
;
Listener
:
'Listener'
;
End
:
'end'
;
Execution
:
' execution '
;
Request
:
' request '
;
With
:
' with '
;
Format
:
' format '
;
Response
:
' response '
;
Error
:
' error '
;
Call
:
' call '
;
Channel
:
' channel '
;
Dot
:
'.'
;
SColon
:
';'
;
Comma
:
','
;
Value
:
(
'a'..'z'
|'A'..'Z'
|'_'
)
(
'a'..'z'
|'A'..'Z'
|'_'
|Digit
)*
;
fragment
String
:
(
'"'
(
~(
'"'
| '\\'
)
| '\\'
(
'\\'
| '"'
)
)*
'"'
| '\''
(
~(
'\''
| '\\'
)
| '\\'
(
'\\'
| '\''
)
)*
'\''
)
{
setText(getText().substring(1, getText().length() - 1).replaceAll("\\\\(.)",
"$1"));
}
;
fragment
Digit
:
'0'..'9'
;
Space
:
(
' '
| '\t'
| '\r'
| '\n'
| '\u000C'
)
{
skip();
}
;
My parser code is:
parser grammar ChannelParser;
options
{
// antlr will generate java lexer and parser
language = Java;
// generated parser should create abstract syntax tree
output = AST;
}
// ***************** parser rules:
//our grammar accepts only salutation followed by an end symbol
expression
:
tcpChannelDefinition listenerDefinition executionDefintion endchannel
;
tcpChannelDefinition
:
Define Tcpchannel channelName
;
channelName
:
i= Value
{
$i.setText("CHANNEL_NAME#" + $i.text);
}
;
listenerDefinition
:
Define Listener listenerProperty endListener
;
listenerProperty
:
i=Value
{
$i.setText("PROPERTY_VALUE#" + $i.text);
}
;
endListener
:
End Listener SColon
;
executionDefintion
:
Define Execution execution
;
execution
:
Request With Format requestValue Comma
Response With Format responseValue Comma
Error With Format errorValue Comma
Call servicename Dot executionname
;
requestValue
:
i=Value
{
$i.setText("REQUEST_FORMAT#" + $i.text);
}
;
responseValue
:
i=Value
{
$i.setText("RESPONSE_FORMAT#" + $i.text);
}
;
errorValue
:
i=Value
{
$i.setText("ERROR_FORMAT#" + $i.text);
}
;
servicename
:
i=Value
{
$i.setText("SERVICE_NAME#" + $i.text);
}
;
executionname
:
i=Value
{
$i.setText("OPERATION_NAME#" + $i.text);
}
;
endexecution
:
End Define Execution SColon
;
endchannel
:
End Channel SColon
;
im getting error like missing Tcpchannel at 'tcpChannel' and extraneous input 'ChannelName' expecting Define. How to correct them. Please do help.ASAP
I'm experimenting with the XPath using the grammar provided in the test suite and am having a problem with the path //ID being identified, but //DEF is not found. An IllegalArgumentException is thrown. "DEF at index 2 isn't a valid token name" Why is //ID matched, but //DEFnot?
String exprGrammar = "grammar Expr;\n" +
"prog: func+ ;\n" +
"func: DEF ID '(' arg (',' arg)* ')' body ;\n" +
"body: '{' stat+ '}' ;\n" +
"arg : ID ;\n" +
"stat: expr ';' # printExpr\n" +
" | ID '=' expr ';' # assign\n" +
" | 'return' expr ';' # ret\n" +
" | ';' # blank\n" +
" ;\n" +
"expr: expr ('*'|'/') expr # MulDiv\n" +
" | expr ('+'|'-') expr # AddSub\n" +
" | primary # prim\n" +
" ;\n" +
"primary" +
" : INT # int\n" +
" | ID # id\n" +
" | '(' expr ')' # parens\n" +
" ;" +
"\n" +
"MUL : '*' ; // assigns token name to '*' used above in grammar\n" +
"DIV : '/' ;\n" +
"ADD : '+' ;\n" +
"SUB : '-' ;\n" +
"RETURN : 'return' ;\n" +
"DEF: 'def';\n" +
"ID : [a-zA-Z]+ ; // match identifiers\n" +
"INT : [0-9]+ ; // match integers\n" +
"NEWLINE:'\\r'? '\\n' -> skip; // return newlines to parser (is end-statement signal)\n" +
"WS : [ \\t]+ -> skip ; // toss out whitespace\n";
String SAMPLE_PROGRAM =
"def f(x,y) { x = 3+4; y; ; }\n" +
"def g(x) { return 1+2*x; }\n";
Grammar g2 = new Grammar(exprGrammar);
LexerInterpreter g2LexerInterpreter = g2.createLexerInterpreter(new ANTLRInputStream(SAMPLE_PROGRAM));
CommonTokenStream tokens = new CommonTokenStream(g2LexerInterpreter);
ParserInterpreter parser = g2.createParserInterpreter(tokens);
parser.setBuildParseTree(true);
ParseTree tree = parser.parse(g2.rules.get("prog").index);
String xpath = "//DEF";
for (ParseTree t : XPath.findAll(tree, xpath, parser) ) {
System.out.println(t.getSourceInterval());
}
When I run your code, the following gets printed:
0..0
18..18
In other words:
;)
This XPath tree pattern matching is all rather new, so my guess is that you've stumbled upon a bug that has been fixed. I'm using ANTLR version 4.2.2
my file myComp.l
%{
#include <stdlib.h>
#include <stdio.h>
#include "y.tab.h"
int yyerror(char *);
%}
%%
[a-z] {
yylval = *yytext - 'a';
return VAR;
}
[0-9]+ {
yylval = atoi(yytext);
return INT;
}
[-+()=/*\n] { return *yytext; } [ \t] ;
. { yyerror("Input non valido"); }
%% int yywrap(void){
return 1; }
and this is the file myComp.y
%{ /* Prologo */
#define YYSTYPE int
#include <math.h>
#include <stdio.h>
int yyerror(char *);
int yylex(void) ;
int sym[26];
%}
/* Definizioni */
%token INT VAR
%left '+' '-'
%left '*' '/'
%%
program:
program statement '\n'
|
;
statement:
expr { printf("%d\n", $1); }
| VAR '=' expr { sym[$1] = $3; }
;
expr:
INT
| VAR { $$ = sym[$1]; }
| expr '+' expr { $$ = $1 + $3; }
| expr '-' expr { $$ = $1 - $3; }
| expr '*' expr { $$ = $1 * $3; }
| expr '/' expr { $$ = $1 / $3; }
| '(' expr ')' { $$ = $2; }
;
%%
int yyerror(char *s) {
fprintf(stderr, "%s\n", s);
return 1;
}
int main( void ) {
yyparse();
return 0;
}
i used this commands for compiling
flex myComp.l
bison -y myComp.y
gcc -o myComp y.tab.c
but i have this error:
/tmp/ccaHRWZu.o: In function `yyparse':
y.tab.c:(.text+0x24a): undefined reference to `yylex'
collect2: ld returned 1 exit status
all programs that i installed are updated in the last version.i can't unterstand where is the problem?what i can i do for risolving this error.please help me to fix it.thanks all
thk's all
you are missing the linker flag -lfl to link your parser against the flex library where yylex is defined. Additionally you need to build the output of flex, too. That c-file is probably called: myComp.lex.c
compile with:
gcc -o myComp y.tab.c myComp.lex.c -lfl