Getting 3 string before and after of each characters using awk

Getting 3 string before and after of each characters using awk - bash

What I have expected is the output like below:
[before character h is null and assign with '#". After character h
are "e","l","l".]
[before character e is "h". After character e are
"l","l","o".]
[before character l are "h" and "e". After character l
are "l" and "o".]
[before character l are "h" and "e". After character
l are "l" and "o".]
[before character l are "h","e","l". After
character l is "o".]
[before character o are "e","l","l". After
character o is null and assign with '#".]
# # # h e l l
# # h e l l o
# h e l l o #
h e l l o # #
e l l o # # #
# # # w o n d
# # w o n d e
# w o n d e r
w o n d e r f
o n d e r f u
n d e r f u l
d e r f u l #
e r f u l # #
r f u l # # #
Input file:
h e l l o
w o n d e r f u l
Code:
awk -v s1="# # #"
'BEGIN{v=length(s1)}
{$0=s1 $0 s1;num=split($0, A,"");
for(i=v+1;i<=num-v;i++){
q=i-v;p=i+v;
while(q<=p){
Q=Q?Q OFS A[q]:A[q];q++
};
print Q;Q=""
}
}' InputFile
But the result I got is:
# # # h e l
# # h e l l
# # h e l l
# h e l l o
# h e l l o #
h e l l o #
e l l o # #
e l l o # #
l l o # # #
# # # w o n
# # w o n d
# # w o n d
# w o n d e
# w o n d e
w o n d e r
o n d e r
o n d e r f
n d e r f
n d e r f u
d e r f u
d e r f u l
e r f u l #
e r f u l #
r f u l # #
r f u l # #
f u l # # #
How to solve it? Please guide me. Thanks

Add gsub(/ /,"") to the top of #fedorqui's answer to your previous question, change ## to ### and change 5 to 7 and you get:
$ cat tst.awk
{
gsub(/ /,"")
n=length($0)
$0 = "###" $0 "###"
gsub(/./, "& ")
for (i=1; i<=2*n; i+=2)
print substr($0, i, 7*2-1)
print ""
}
$ awk -f tst.awk file
# # # h e l l
# # h e l l o
# h e l l o #
h e l l o # #
e l l o # # #
# # # w o n d
# # w o n d e
# w o n d e r
w o n d e r f
o n d e r f u
n d e r f u l
d e r f u l #
e r f u l # #
r f u l # # #

Related

Output list items as ls does

I'm trying to output something that resembles as ls output. The ls command outputs like this:
file1.txt file3.txt file5.txt
file2.txt file4.txt
But I this sample list:
a b c d e f g h i j k l m n o p q r s t u v w x y z
to appear as:
a e i m q u y
b f j n r v z
c g k o s w
d h l p t x
In that case, it gave 7 columns which is fine, but I wanted up to 8 columns max. Next the following list:
a b c d e f g h i j k l m n o p q r s t u v w
will have to show as:
a d g j m p s v
b e h k n q t w
c f i l o r u
And "a b c d e f g h" will have to show as is because it is already 8 columns in 1 line, but:
a b c d e f g h i
will show as:
a c e g i
b d f h
And:
a b c d e f g h i j
a c e g i
b d f h j

One way:
#!/usr/bin/env tclsh
proc columnize {lst {columns 8}} {
set len [llength $lst]
set nrows [expr {int(ceil($len / (0.0 + $columns)))}]
set cols [list]
for {set n 0} {$n < $len} {incr n $nrows} {
lappend cols [lrange $lst $n [expr {$n + $nrows - 1}]]
}
for {set n 0} {$n < $nrows} {incr n} {
set row [list]
foreach col $cols {
lappend row [lindex $col $n]
}
puts [join $row " "]
}
}
columnize {a b c d e f g h i j k l m n o p q r s t u v w x y z}
puts ----
columnize {a b c d e f g h i j k l m n o p q r s t u v w}
puts ----
columnize {a b c d e f g h}
puts ----
columnize {a b c d e f g h i}
puts ----
columnize {a b c d e f g h i j}
The columnize function first figures out how many rows are needed with a simple division of the length of the list by the number of columns requested, then splits the list up into chunks of that length, one per column, and finally iterates through those sublists extracting the current row's element for each column, and prints the row out as a space-separated list.

What are the inorder and postorder traversals of the following tree?

With respect to the following tree:
What is the correct inorder traversal?
U S T X C P Y R B A I G J F N H V T E D L
U S T X C P Y R B A D E I G J F N H V T L
What is the correct postorder traversal?
U T S X P R Y C B D I J G N V T H F E L A
U T S X P R Y C B I J G N V T H F E D L A
I evaluated both pairs. But some are saying 1-1 and 2-1 are correct, while others say 1-2 and 2-2 are correct. I'm confused. Which ones are actually correct?

inorder:
B U S T X C P Y R A D E I G J F N H V T L
postorder (2.2 is correct):
U T S X P R Y C B I J G N V T H F E D L A

Shell script CSV processing - adding new column with AWK

I have a shell script which processes CSV file. One step in particular is adding a column and putting default value "null" in it. I got the expected change, its just that the new column to be added gets added to the next line instead of the same line.
Can anyone suggest whats wrong in the code and causing this unexpected change?
CODE:
awk 'BEGIN{FS=",";OFS=";"} {$(NF+1) = NR==1 ? "NewColm" : "NULL"} 1' source.csv > final.csv
Input CSV:
OldColm1,OldColm2,OldColm3,OldColm4,OldColm5,OldColm6
Value1,Value2,Value3,Value4,Value5,Value6
Output CSV:
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6
;NewColm
Value1;Value2;Value3;Value4;Value5;Value6
;NULL
Expected CSV:
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL

As explained in the comments, this was caused by the lines being separated by \r\n instead of \n.
The od programm can be used to illustrate this:
cat source_dos.csv
OldColm1,OldColm2,OldColm3,OldColm4,OldColm5,OldColm6
Value1,Value2,Value3,Value4,Value5,Value6
od -c source_dos.csv
0000000 O l d C o l m 1 , O l d C o l m
0000020 2 , O l d C o l m 3 , O l d C o
0000040 l m 4 , O l d C o l m 5 , O l d
0000060 C o l m 6 \r \n V a l u e 1 , V a
0000100 l u e 2 , V a l u e 3 , V a l u
0000120 e 4 , V a l u e 5 , V a l u e 6
0000140 \r \n
0000142
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
;NewColm;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6
;NULL1;Value2;Value3;Value4;Value5;Value6
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 \r ; N e w C o l m \n V
0000100 a l u e 1 ; V a l u e 2 ; V a l
0000120 u e 3 ; V a l u e 4 ; V a l u e
0000140 5 ; V a l u e 6 \r ; N U L L \n
0000157
A work-around solution provided in the comments is to convert the input from DOS-like (\r) to UNIX-like (\n) input:
cp source_dos.csv source_unix.csv && dos2unix source_unix.csv
dos2unix: converting file source_unix.csv to Unix format ...
od -c source_unix.csv
0000000 O l d C o l m 1 , O l d C o l m
0000020 2 , O l d C o l m 3 , O l d C o
0000040 l m 4 , O l d C o l m 5 , O l d
0000060 C o l m 6 \n V a l u e 1 , V a l
0000100 u e 2 , V a l u e 3 , V a l u e
0000120 4 , V a l u e 5 , V a l u e 6 \n
0000140
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
An awk-only solution to deal with this would be to adjust the record separator RS accordingly.
RS, as well as its counterpart the output record separator ORS, default to \n.
That's why in the \r\n input case, the \r remains part of the last input column and your new column gets 'stuck' in between this \r and the \n added as ORS.
Changing RS solves this:
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
Note that this will still create UNIX-like (\n) output:
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
To generate DOS-like (\r\n) output, just adjust ORS, too:
awk 'BEGIN{RS="\r\n";ORS=RS;FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r\n";ORS=RS;FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \r \n V
0000100 a l u e 1 ; V a l u e 2 ; V a l
0000120 u e 3 ; V a l u e 4 ; V a l u e
0000140 5 ; V a l u e 6 ; N U L L \r \n
0000157
Note however that this will fail for UNIX-like (\n) input:
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6
Value1;Value2;Value3;Value4;Value5;Value6
;NewColm
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 \n V a l u e 1 ; V a l
0000100 u e 2 ; V a l u e 3 ; V a l u e
0000120 4 ; V a l u e 5 ; V a l u e 6 \n
0000140 ; N e w C o l m \n
0000151
Why I think this is better than using dos2unix:
Using a regular expression (RE) as RS one can make it work for both \n and \r\n-separated input without the need to know which of the two it is:
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
In both cases, UNIX-like (\n) output will be generated:
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
To set the output type according to the input type, the ORS can be set per record to the actual text that matched the RS RE, RT:
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \r \n V
0000100 a l u e 1 ; V a l u e 2 ; V a l
0000120 u e 3 ; V a l u e 4 ; V a l u e
0000140 5 ; V a l u e 6 ; N U L L \r \n
0000157
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
Note that using an RE as RS as well as the RT built-in variable are GNU awk (gawk) extensions and might not be supported by all awk implementations.

Passing return value into another method

I am starting with an array of letters:
letters = %w[c s t p b l f g d m
y o u i h t r a e l
o t l a e m r s n i
m a y l p x s e k d]
Passing them, finding all combinations that return an array like this ["cstp", "cstb", "cstl"], this is a shortened example.
def combinations(letters)
combos = letters.combination(4)
combos.collect do |letter_set|
letter_set.join(",").gsub("," ,"")
end
end
I am trying to figure out how to pass the return value of combinations into start_wtih_letter_c. Do I have to pass a block like &block? I tried various things that keep saying wrong number of arguments.
def start_with_letter_c(pass the return value)
combinations.select {|word| word.match(/^ca/) }
end

Here you go, no errors:
letters = %w[c s t p b l f g d m
y o u i h t r a e l
o t l a e m r s n i
m a y l p x s e k d]
def combinations(letters)
combos = letters.combination(4)
combos.collect do |letter_set|
letter_set.join(",").gsub("," ,"")
end
end
def start_with_letter_c(combinations)
combinations.select {|word| word.match(/^ca/) }
end
start_with_letter_c(combinations(letters))
# => ["cael", "caeo", "caet", "cael", "ca ...and so on

I would write something like this:
letters = %w[c s t p b l f g d m
y o u i h t r a e l
o t l a e m r s n i
m a y l p x s e k d]
def combinations(letters)
letters.combination(4).map(&:join)
end
def start_with_letter_c(combinations)
combinations.select { |word| word.start_with?('ca') }
end
start_with_letter_c(combinations(letters))

Understanding how Ruby stdout works

i'm trying to understand how Ruby's stdout actually works, since i'm struggling with the output of some code.
Actually, within my script i'm using a unix sort, which works fine from termina, but this is what i get from ruby, suppose you have this in your file (tsv)
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
My ruby code is this:
#raw_file=File.open(ARGV[0],"r") unless File.open(ARGV[0],"r").nil?
tmp_raw=File.new("#{#pwd}/tmp_raw","w")
`cut -f1,6,3,4,2,5,9,12 #{#raw_file.path} | sort -k1,1 -k8,8 > #{tmp_raw.path}`
This is what i get (misplaced separators):
a b c d e f i
1a b c d e f g h i l m
1
Whats happening here?
When running from terminal i get no separators misplacement
enter code here

Instead of writing to a temporary file, passing the file via argument etc, you can use Ruby's open3 module to create the pipeline in a more Ruby-friendly manner (instead of relying on the underlying shell):
require 'open3'
raw_file = File.open(ARGV[0], "r")
commands = [
["cut", "-f1,6,3,4,2,5,9,12"],
["sort", "-k1,1", "-k8,8"],
]
result = Open3.pipeline_r(*commands, in: raw_file) do |out|
break out.read
end
puts result
Shell escaping problems, for example, become a thing from the past, and no temporary files are necessary, since pipes are used.
I would, however, advise doing this kind of processing in Ruby itself, instead of calling external utilities; you're getting no benefit from using Ruby here, you're just doing shell stuff.

As Linuxios says, your code never uses STDOUT, so your question doesn't make a lot of sense.
Here's a simple example showing how to do this all in Ruby.
Starting with an input file called "test.txt":
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
a b c d e f g h i l m
This code:
File.open('test_out.txt', 'w') do |test_out|
File.foreach('test.txt') do |line_in|
chars = line_in.split
test_out.puts chars.values_at(0, 5, 2, 3, 1, 4, 8, 10).sort_by{ |*c| [c[0], c[7]] }.join("\t")
end
end
Creates this output in 'test_out.txt':
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
a b c d e f i m
Read about values_at and sort_by.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Getting 3 string before and after of each characters using awk - bash

Related

Output list items as ls does

What are the inorder and postorder traversals of the following tree?

Shell script CSV processing - adding new column with AWK

Passing return value into another method

Understanding how Ruby stdout works

Categories

Resources