Processing lines of file in Ruby - ruby

I have some file like this
file alldataset; append next;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
and I am trying to write a ruby program to push any line that comes after a semi colon to a new line. In addition, if a line has a 'do', indent from the 'do' so that the following line is indented by two blanks and any inner 'do' be indented by 4 blanks and so on.
I am very new to Ruby and my code so far is quite away from what I want. This is what I have
def indent(text, num)
" "*num+" " + text
end
doc = File.open('newtext.txt')
doc.to_a.each do |line|
if line.downcase =~ /^(file).+(;)/i
puts line+"\n"
end
if line.downcase.include?('do')
puts indent(line, 2)
end
end
This is the desired output
file alldataset;
append next;
if file.first? do
line + "\n";
if !file.last? do
line.indent(2);
end;
end;
Any help would be appreciated.

As you are interested in parsing, here is a quickly made example, just to give you a taste. I have learned Lex/Yacc, Flex/Bison, ANTLR v3 and ANTLR v4. I strongly recommend ANTLR4 which is so powerful. References :
the ANTLR site
The ANTLR mega tutorial
the expert book
StackOverflow -> Tags -> antlr
The following grammar can parse only the input example you have provided.
File Question.g4 :
grammar Question;
/* Simple grammar example to parse the following code :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
*/
start
#init {System.out.println("Question last update 1048");}
: file* EOF
;
file
: FILE ID ';' statement_p*
;
statement_p
: statement
{System.out.println("Statement found : " + $statement.text);}
;
statement
: 'append' ID ';'
| if_statement
| other_statement
| 'end' ';'
;
if_statement
: 'if' expression 'do' expression ';'
;
other_statement
: ID ';'
;
expression
: receiver=( ID | FILE ) '.' method_call # Send
| expression '+' expression # Addition
| '!' expression # Negation
| atom # An_atom
;
method_call
: method_name=ID arguments?
;
arguments
: '(' ( argument ( ',' argument )* )? ')'
;
argument
: ID | NUMBER
;
atom
: ID
| FILE
| STRING
;
FILE : 'file' ;
ID : LETTER ( LETTER | DIGIT | '_' )* ( '?' | '!' )? ;
NUMBER : DIGIT+ ( ',' DIGIT+ )? ( '.' DIGIT+ )? ;
STRING : '"' .*? '"' ;
NL : ( [\r\n] | '\r\n' ) -> skip ;
WS : [ \t]+ -> channel(HIDDEN) ;
fragment DIGIT : [0-9] ;
fragment LETTER : [a-zA-Z] ;
File input.txt :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
Execution :
$ export CLASSPATH=".:/usr/local/lib/antlr-4.6-complete.jar"
$ alias
alias a4='java -jar /usr/local/lib/antlr-4.6-complete.jar'
alias grun='java org.antlr.v4.gui.TestRig'
$ a4 Question.g4
$ javac Q*.java
$ grun Question start -tokens -diagnostics input.txt
[#0,0:0=' ',<WS>,channel=1,1:0]
[#1,1:4='file',<'file'>,1:1]
[#2,5:5=' ',<WS>,channel=1,1:5]
[#3,6:15='alldataset',<ID>,1:6]
[#4,16:16=';',<';'>,1:16]
[#5,17:17=' ',<WS>,channel=1,1:17]
[#6,18:23='append',<'append'>,1:18]
[#7,24:24=' ',<WS>,channel=1,1:24]
[#8,25:28='next',<ID>,1:25]
[#9,29:29=';',<';'>,1:29]
[#10,30:30=' ',<WS>,channel=1,1:30]
[#11,31:33='xyz',<ID>,1:31]
[#12,34:34=';',<';'>,1:34]
[#13,36:36=' ',<WS>,channel=1,2:0]
[#14,37:38='if',<'if'>,2:1]
[#15,39:39=' ',<WS>,channel=1,2:3]
[#16,40:43='file',<'file'>,2:4]
[#17,44:44='.',<'.'>,2:8]
[#18,45:50='first?',<ID>,2:9]
[#19,51:51=' ',<WS>,channel=1,2:15]
[#20,52:53='do',<'do'>,2:16]
[#21,54:54=' ',<WS>,channel=1,2:18]
[#22,55:58='line',<ID>,2:19]
[#23,59:59=' ',<WS>,channel=1,2:23]
[#24,60:60='+',<'+'>,2:24]
[#25,61:61=' ',<WS>,channel=1,2:25]
[#26,62:65='"\n"',<STRING>,2:26]
[#27,66:66=';',<';'>,2:30]
...
[#59,133:132='<EOF>',<EOF>,7:0]
Question last update 1048
Statement found : append next;
Statement found : xyz;
Statement found : if file.first? do line + "\n";
Statement found : if !file.last? do line.indent(2);
Statement found : end;
Statement found : end;
Statement found : xyz;
One advantage of ANTLR4 over previous versions or other parser generators is that the code is no longer scattered among the parser rules, but gathered in a separate listener. This is where you do the actual processing, such as producing a new reformatted file. It would be too long to show a complete example. Today you can write the listener in C++, C#, Python and others. As I don't know Java, I have a machinery using Jruby, see my forum answer.

In Ruby there are many ways to do things. So my solution is one among others.
File t.rb :
def print_indented(p_file, p_indent, p_text)
p_file.print p_indent
p_file.puts p_text
end
# recursively split the line at semicolon, as long as the rest is not empty
def partition_on_semicolon(p_line, p_answer, p_level)
puts "in partition_on_semicolon for level #{p_level} p_line=#{p_line} / p_answer=#{p_answer}"
first_segment, semi, rest = p_line.partition(';')
p_answer << first_segment + semi
partition_on_semicolon(rest.lstrip, p_answer, p_level + 1) unless rest.empty?
end
lines = IO.readlines('input.txt')
# Compute initial indentation, the indentation of the first line.
# This is to preserve the spaces which are in the input.
m = lines.first.match(/^( *)(.*)/)
initial_indent = ' ' * m[1].length
# initial_indent = '' # uncomment if the initial indentation needs not to be preserved
puts "initial_indent=<#{initial_indent}> length=#{initial_indent.length}"
level = 1
indentation = ' '
File.open('newtext.txt', 'w') do | output_file |
lines.each do | line |
line = line.chomp
line = line.lstrip # remove trailing spaces
puts "---<#{line}>"
next_indent = initial_indent + indentation * (level - 1)
case
when line =~ /^file/ && line.count(';') > 1
level = 1 # restore, remove this if files can be indented
next_indent = initial_indent + indentation * (level - 1)
# split in count fragments
fragments = []
partition_on_semicolon(line, fragments, 1)
puts '---fragments :'
puts fragments.join('/')
print_indented(output_file, next_indent, fragments.first)
fragments[1..-1].each do | fragment |
print_indented(output_file, next_indent + indentation, fragment)
end
level += 1
when line.include?(' do ')
fragment1, _fdo, fragment2 = line.partition(' do ')
print_indented(output_file, next_indent, "#{fragment1} do")
print_indented(output_file, next_indent + indentation, fragment2)
level += 1
else
level -= 1 if line =~ /end;/
print_indented(output_file, next_indent, line)
end
end
end
File input.txt :
file alldataset; append next; xyz;
if file.first? do line + "\n";
if !file.last? do line.indent(2);
end;
end;
file file2; xyz;
Execution :
$ ruby -w t.rb
initial_indent=< > length=1
---<file alldataset; append next; xyz;>
in partition_on_semicolon for level 1 p_line=file alldataset; append next; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=append next; xyz; / p_answer=["file alldataset;"]
in partition_on_semicolon for level 3 p_line=xyz; / p_answer=["file alldataset;", "append next;"]
---fragments :
file alldataset;/append next;/xyz;
---<if file.first? do line + "\n";>
---<if !file.last? do line.indent(2);>
---<end;>
---<end;>
---<file file2; xyz;>
in partition_on_semicolon for level 1 p_line=file file2; xyz; / p_answer=[]
in partition_on_semicolon for level 2 p_line=xyz; / p_answer=["file file2;"]
---fragments :
file file2;/xyz;
---<>
Output file newtext.txt :
file alldataset;
append next;
xyz;
if file.first? do
line + "\n";
if !file.last? do
line.indent(2);
end;
end;
file file2;
xyz;

Related

To split and arrange number in single inverted

I have around 65000 products codes in a text file.I wanted to split those number in group of 999 each .Then-after want each 999 number with single quotes separated by comma.
Could you please suggest how I can achieve above scenario through Unix script.
87453454
65778445
.
.
.
.
Till 65000 productscodes
Need to arrange in below pattern:
'87453454','65778445',
With awk:
awk '
++c == 1 { out = "\047" $0 "\047"; next }
{ out = out ",\047" $0 "\047" }
c == 999 { print out; c = 0 }
END { if (c) print out }
' file
Or, with GNU sed:
sed "
:a
\$bb
N
0~999{
:b
s/\n/','/g
s/^/'/
s/$/'/
b
}
ba" file
With Perl:
perl -ne '
sub pq { chomp; print "\x27$_\x27" } pq;
for (1 .. 998) {
if (defined($_ = <>)) {
print ",";
pq
}
}
print "\n"
' < file
Credit for Mauke perl#libera.chat
65000 isn't that many lines for awk - just do it all in one shot :
mawk 'BEGIN { FS = RS; RS = "^$"; OFS = (_="\47")(",")_
} gsub(/^|[^0-9]*$/,_, $!(NF = NF))'
'66771756','69562431','22026341','58085790','22563930',
'63801696','24044132','94255986','56451624','46154427'
That's for grouping them all in one line. To make 999 ones, try
jot -r 50 10000000 99999999 |
# change "5" to "999" here
rs -C= 0 5 |
mawk 'sub(".*", "\47&\47", $!(NF -= _==$NF ))' FS== OFS='\47,\47'
'36452530','29776340','31198057','36015730','30143632'
'49664844','83535994','86871984','44613227','12309645'
'58002568','31342035','72695499','54546650','21800933'
'38059391','36935562','98323086','91089765','65672096'
'17634208','14009291','39114390','35338398','43676356'
'14973124','19782405','96782582','27689803','27438921'
'79540212','49141859','25714405','42248622','25589123'
'11466085','87022819','65726165','86718075','56989625'
'12900115','82979216','65469187','63769703','86494457'
'26544666','89342693','64603075','26102683','70528492'
_==$NF checks whether right most column is empty or not,
—- i.e. whether there's a trailing edge sep that needds to be trimmed
If your input file only contains short codes as shown in your example, you could use the following hack:
xargs -L 999 bash -c "printf \'%s\', \"\$#\"; echo" . <inputFile >outputFile
Alternatively, you can use this sed command:
sed -Ene"s/(.*)/'\1',/;H" -e{'0~999','$'}'{z;x;s/\n//g;p}' <inputFile >outputFile
s/(.*)/'\1',/ wraps each line in '...',
but does not print it (-n)
instead, H appends the modified line to the so called hold space; basically a helper variable storing a single string.
(This also adds a line break as a separator, but we remove that later).
Every 999 lines (0~999) and at the end of the input file ($) ...
... the hold space is then printed and cleared (z;x;...;p)
while deleting all delimiter-linebreaks (s/\n//g) mentioned earlier.

Insert formatted text into VIM for every letter of alphabet

In VIM, for every letter of the English alphabet, I want to insert a line in the following format:
fragment {upper(letter)}: '{lower(letter)}' | '{upper(letter)'};
So, for example, for the letter a, it would be:
fragment A: 'a' | 'A';
Writing 26 lines like this is tedious, and you shouldn't repeat yourself. How can I do that?
In vim:
for i in range(65,90) " ASCII codes
let c = nr2char(i) " Character
echo "fragment" c ": '"tolower(c)"' | '" c "';"
endfor
Or as a oneliner:
:for i in range(65,90) | let c = nr2char(i) | echo "fragment" c ": '"tolower(c)"' | '" c "';" | endfor
fragment A : ' a ' | ' A ';
fragment B : ' b ' | ' B ';
fragment C : ' c ' | ' C ';
...
fragment X : ' x ' | ' X ';
fragment Y : ' y ' | ' Y ';
fragment Z : ' z ' | ' Z ';
Use :redir #a to copy that output to register a.
Here's one way.
First, I'm gonna create the text in bash with a single command, then I'll tell VIM to insert the output of that command into the file.
I need to iterate through English alphabets, and for every letter, echo one line in the specified format. So at first, let's just echo each letter in a single line (By using a for loop):
❯ alphabets="abcdefghijklmnopqrstuvwxyz"
❯ for ((i=0; i<${#alphabets}; i++)); do echo "${alphabets:$i:1}"; done
a
b
...
z
The way this works is:
${#alphabets} is equal to the length of the variable alphabets.
${alphabets:$i:1} extracts the letter at position i from the variable alphabets (zero-based).
Now we need to convert these letters to upper case. Here's one way we can achieve this:
❯ echo "a" | tr a-z A-Z
A
Now if we apply this to the for loop we had, we get this:
❯ for ((i=0; i<${#alphabets}; i++)); do echo "${alphabets:$i:1}" | tr a-z A-Z; done
A
B
...
Z
From here, it's quite easy to produce the text we wanted:
❯ for ((i=0; i<${#alphabets}; i++)); do c="${alphabets:$i:1}"; cap=$(echo "${c}" | tr a-z A-Z); echo "fragment ${cap}: '${c}' | '${cap}';"; done
fragment A: 'a' | 'A';
fragment B: 'b' | 'B';
...
fragment Z: 'z' | 'Z';
Now that we generated the text, we can simply use :r !command to insert the text into vim:
:r !alphabets="abcdefghijklmnopqrstuvwxyz"; for ((i=0; i<${\#alphabets}; i++)); do c="${alphabets:$i:1}"; cap=$(echo "${c}" | tr a-z A-Z); echo "fragment ${cap}: '${c}' | '${cap}';"; done
Note that # is a special character in vim and should be spaced using \.
Here's another one-liner that does the same thing, and I believe is more intuitive:
for c in {a..z}; do u=$(echo ${c} | tr a-z A-Z); echo "fragment ${u}: '${c}' | '${u}';"; done

Create asterisk border around output in terminal?

I need to create a border around the output of a command in terminal so that if, for example, the output of a command is this:
Apple
Paper Clip
Water
It will become this:
/==========\
|Apple |
|Paper Clip|
|Water |
\==========/
Thanks ahead of time for any and all responses.
-C.L
awk seems like the least insane way to go about this:
command | expand | awk 'length($0) > length(longest) { longest = $0 } { lines[NR] = $0 } END { gsub(/./, "=", longest); print "/=" longest "=\\"; n = length(longest); for(i = 1; i <= NR; ++i) { printf("| %s %*s\n", lines[i], n - length(lines[i]) + 1, "|"); } print "\\=" longest "=/" }'
expand replaces tabs that may be in the output with the appropriate number of spaces to keep the look of it the same (this is to make sure that every byte of output is rendered with the same width). The awk code works as follows:
length($0) > length(longest) { # Remember the longest line
longest = $0
}
{ # also remember all lines in order
lines[NR] = $0
}
END { # when you have everything:
gsub(/./, "=", longest) # build a line of = as long as the longest
# line
print "/=" longest "=\\" # use it to print the top bit
n = length(longest) # format the content with left and right
for(i = 1; i <= NR; ++i) { # delimiters; spacing through printf
printf("| %s %*s\n", lines[i], n - length(lines[i]) + 1, "|")
}
print "\\=" longest "=/" # print bottom bit.
}
The most insane way to do it, and I dare you to dispute this, is with sed:
#!/bin/sed -f
# assemble lines in the hold buffer, preceded by the left delimiter
s/^/| /
1h
1!H
$!d
# make a copy of it in the pattern space
x
h
# isolate the longest line (or rather: a line of = as long as the longest
# line)
s/[^\n]/=/g
:a
/^\(=*\)\n\1/ {
s//\1/
ba
}
//! {
s/\n=*//
ta
}
# build top bit, print it
s,.*,/&\\,
p
# build measuring stick
s,.\(.*\).,=\1,
# for all lines in the output:
:lineloop
# fetch the line
G
s/^\(=*\n\)\([^\n]*\).*/\1\2/
# replace it with = to get a second measuring stick
s/[^\n]/=/g
# fetch another copy of the line
G
s/^\(=*\n=*\n\)\([^\n]*\).*/\1\2/
# inner loop:
:spaceloop
# while the line measuring stick is not as long as the overall measuring
# stick
/^\(=*\)\n\1/! {
# append a = to it and a space to the line for output
s/\n/\n=/
s/$/ /
b spaceloop
}
# once that is done, append the second delimiter
s/$/|/
# remove one measuring stick
s/=*\n//
# put the second behind the actual line
s/\(.*\)\n\(.*\)/\2\n\1/
# print the line
P
# remove it. Only the measuring stick remains and can be reused for the
# next line
s/.*\n//
# do this while there are more lines to be processed
x
/\n/ {
s/[^\n]*\n//
x
b lineloop
}
# then build the bottom bit and print it.
x
s/=/\\/
s/$/\//
Put that in a file foo.sed, use command | expand | sed -f foo.sed. But only do it once to confirm that it works. You don't want to run something like that in production.
Not in the language you were looking for, but succinct and readable:
#!/usr/bin/env ruby
input = STDIN.read.split("\n")
width = input.map(&:size).max + 2
bar = '='*(width-2)
puts '/' + bar + '\\'
input.each {|i| puts "|"+i+" "*(width-i.size-2)+"|" }
puts '\\'+ bar + '/'
You can save it in a file, chmod +x it, and pipe your input into it.
If you "need" to have it in a one-liner:
echo e"Apple\nPaper Clip\nWater" |
ruby -e 'i=STDIN.read.split("\n");w=i.map(&:size).max+2;b="="*(w-2);i.map! {|j| "|"+j+" "*(w-j.size-2)+"|" };i.unshift "/"+b+"\\"; i<<"\\"+b+"/";puts i'

Mismatched input errors in antlr 3.5

i have problems with my grammar code in antlr3.5 . My input file is
` define tcpChannel ChannelName
define
listener
ListnerProperty
end
listener ;
define
execution
request with format RequestFormat,
response with format ResponseFormat,
error with format ErrorFormat ,
call servicename.executionname
end define execution ;
end
define channel ;
`
My lexer code is as follows:
lexer grammar ChannelLexer;
// ***************** lexer rules:
Define
:
'define'
;
Tcpchannel
:
'tcphannel'
;
Listener
:
'Listener'
;
End
:
'end'
;
Execution
:
' execution '
;
Request
:
' request '
;
With
:
' with '
;
Format
:
' format '
;
Response
:
' response '
;
Error
:
' error '
;
Call
:
' call '
;
Channel
:
' channel '
;
Dot
:
'.'
;
SColon
:
';'
;
Comma
:
','
;
Value
:
(
'a'..'z'
|'A'..'Z'
|'_'
)
(
'a'..'z'
|'A'..'Z'
|'_'
|Digit
)*
;
fragment
String
:
(
'"'
(
~(
'"'
| '\\'
)
| '\\'
(
'\\'
| '"'
)
)*
'"'
| '\''
(
~(
'\''
| '\\'
)
| '\\'
(
'\\'
| '\''
)
)*
'\''
)
{
setText(getText().substring(1, getText().length() - 1).replaceAll("\\\\(.)",
"$1"));
}
;
fragment
Digit
:
'0'..'9'
;
Space
:
(
' '
| '\t'
| '\r'
| '\n'
| '\u000C'
)
{
skip();
}
;
My parser code is:
parser grammar ChannelParser;
options
{
// antlr will generate java lexer and parser
language = Java;
// generated parser should create abstract syntax tree
output = AST;
}
// ***************** parser rules:
//our grammar accepts only salutation followed by an end symbol
expression
:
tcpChannelDefinition listenerDefinition executionDefintion endchannel
;
tcpChannelDefinition
:
Define Tcpchannel channelName
;
channelName
:
i= Value
{
$i.setText("CHANNEL_NAME#" + $i.text);
}
;
listenerDefinition
:
Define Listener listenerProperty endListener
;
listenerProperty
:
i=Value
{
$i.setText("PROPERTY_VALUE#" + $i.text);
}
;
endListener
:
End Listener SColon
;
executionDefintion
:
Define Execution execution
;
execution
:
Request With Format requestValue Comma
Response With Format responseValue Comma
Error With Format errorValue Comma
Call servicename Dot executionname
;
requestValue
:
i=Value
{
$i.setText("REQUEST_FORMAT#" + $i.text);
}
;
responseValue
:
i=Value
{
$i.setText("RESPONSE_FORMAT#" + $i.text);
}
;
errorValue
:
i=Value
{
$i.setText("ERROR_FORMAT#" + $i.text);
}
;
servicename
:
i=Value
{
$i.setText("SERVICE_NAME#" + $i.text);
}
;
executionname
:
i=Value
{
$i.setText("OPERATION_NAME#" + $i.text);
}
;
endexecution
:
End Define Execution SColon
;
endchannel
:
End Channel SColon
;
im getting error like missing Tcpchannel at 'tcpChannel' and extraneous input 'ChannelName' expecting Define. How to correct them. Please do help.ASAP

Bash using first lines counted output to add whitespace to following lines

I have this code here
printf '$request1 = "select * from whatever where this = that and active = 1 order by something asc";\n'
| perl -pe 's/select/SELECT/gi ; s/from/\n FROM/gi ; s/where/\n WHERE/gi ; s/and/\n AND/gi ; s/order by/\n ORDER BY/gi ; s/asc/ASC/gi ; s/desc/DESC/gi ;'
| awk '{gsub(/\r/,"");printf "%s\n%d",$0,length($0)}'
it produce output like this currently
$request1 = "SELECT *
22 FROM whatever
17 WHERE this = that
24 AND active = 1
21 ORDER BY something ASC";
I would like to take the count of the first line (22) and add that amount of whitespace to each additional line.
Assuming that you don't want to print the numbers, change your AWK command to:
awk 'NR == 1 {pad = length($0); print} NR > 1 {gsub(/\r/,""); printf "%*s%s\n", pad, " ", $0}'
Output:
$request1 = "SELECT *
FROM whatever
WHERE this = that
AND active = 1
ORDER BY something ASC";

Resources