AWK: How to use OFS ignoring blank and commented out lines - bash

I'm trying to rewrite a file on the fly, like this:
10.213.20.173, mem_chld, p3b-aggr-103, c3.xlarge, db, mysql
#10.213.20.191, mem_leaf, p3b-leaf-101, r3.xlarge, db, mysql
10.213.20.192, mem_leaf, p3b-leaf-102, r3.xlarge, db, mysql
10.213.20.190, mem_leaf, p3b-leaf-103, r3.xlarge, db, mysql
.....
from the original , separated filed to a : separated ones. So, I used this:
awk -F', ' 'BEGIN{OFS=":";} { $1=$1; print }'
which is pretty much working but that file also has some blank and commented out lines, which I also want to exclude. My attempt with:
awk -F', ' '!/^(#|$)/ {OFS=":";} { $1=$1; print }'
did not work as I expected. How can I do that? Best!

Using awk:
$ awk -F', ' 'BEGIN{OFS=":"} !/^#/ && NF{$1=$1; print}' file
10.213.20.173:mem_chld:p3b-aggr-103:c3.xlarge:db:mysql
10.213.20.192:mem_leaf:p3b-leaf-102:r3.xlarge:db:mysql
10.213.20.190:mem_leaf:p3b-leaf-103:r3.xlarge:db:mysql
alternatively you can set OFS like:
awk -F', ' -v OFS=':' '!/^#/ && NF{$1=$1; print}' file
or even
awk -F', ' '!/^#/ && NF{$1=$1; print}' OFS=':' file
As Ed Morton suggested in the comments, for an edge case where you might have space before the # it is best to use the following:
awk -F', ' 'BEGIN{OFS=":"} !/^[[:space:]]*#/ && NF{$1=$1; print}' file
Explanation:
$1=$1 rebuilds the $0 variable. It takes all the fields and concatenates them, separated by OFS which we have set to : instead of space which is the default.

What about:
awk -F', ' -v OFS=':' '/^[^#]/ {$1=$1; print}' datafile
This will ignore both empty lines and lines starting with a # sign.
If comments might be preceded by some spaces, you would prefer:
awk -F', ' -v OFS=':' '!/^[ \t]*(#.*)?$/ {$1=$1; print}' datafile

awk -F', ' -v OFS=: '/^[ \t]*(#|$)/{next}{$1=$1}1' file
Output:
10.213.20.173:mem_chld:p3b-aggr-103:c3.xlarge:db:mysql
10.213.20.192:mem_leaf:p3b-leaf-102:r3.xlarge:db:mysql
10.213.20.190:mem_leaf:p3b-leaf-103:r3.xlarge:db:mysql

Related

Shell script to add values to a specific column

I have semicolon-separated columns, and I would like to add some characters to a specific column.
aaa;111;bbb
ccc;222;ddd
eee;333;fff
to the second column I want to add '#', so the output should be;
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
I tried
awk -F';' -OFS=';' '{ $2 = "#" $2}1' file
It adds the character but removes all semicolons with space.
You could use sed to do your job:
# replaces just the first occurrence of ';', note the absence of `g` that
# would have made it a global replacement
sed 's/;/;#/' file > file.out
or, to do it in place:
sed -i 's/;/;#/' file
Or, use awk:
awk -F';' '{$2 = "#"$2}1' OFS=';' file
All the above commands result in the same output for your example file:
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
#atb: Try:
1st:
awk -F";" '{print $1 FS "#" $2 FS $3}' Input_file
Above will work only when your Input_file has 3 fields only.
2nd:
awk -F";" -vfield=2 '{$field="#"$field} 1' OFS=";" Input_file
Above code you could put any field number and could make it as per your request.
Here I am making field separator as ";" and then taking a variable named field which will have the field number in it and then that concatenating "#" in it's value and 1 is for making condition TRUE and not making and action so by default print action will happen of current line.
You just misunderstood how to set variables. Change -OFS to -v OFS:
awk -F';' -v OFS=';' '{ $2 = "#" $2 }1' file
but in reality you should set them both to the same value at one time:
awk 'BEGIN{FS=OFS=";"} { $2 = "#" $2 }1' file

Convert slurm accounting output

I'm looking for a way to get the elapsed time output to always include days, at the moment I can't see away in defining an output format so I'm looking at using cut, awk, sed or similar command(s) to do this after the output has been generated.
So any ideas how I can change output such as:
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40
968491|interactive-a|bob|COMPLETED|12:49:20
970801|interactive-a|sam|COMPLETED|07:00:46
912973|interactive-a|tom|COMPLETED|41-02:34:41
971356|interactive-a|mat|COMPLETED|04:36:35
971912|interactive-a|mat|COMPLETED|02:12:02
972668|interactive-a|mat|COMPLETED|00:09:06
Into this format (the last column has 0- added where needed)
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40|
968491|interactive-a|bob|COMPLETED|0-12:49:20|
970801|interactive-a|sam|COMPLETED|0-07:00:46|
912973|interactive-a|tom|COMPLETED|41-02:34:41|
971356|interactive-a|mat|COMPLETED|0-04:36:35|
971912|interactive-a|mat|COMPLETED|0-02:12:02|
972668|interactive-a|mat|COMPLETED|0-00:09:06|
Thanks
$ sed 's/|\([0-9:]\{1,\}\)$/|0-\1/' file
JobID|Partition|User|State|Elapsed|
902464|interactive-a|bob|COMPLETED|10-00:10:40
968491|interactive-a|bob|COMPLETED|0-12:49:20
970801|interactive-a|sam|COMPLETED|0-07:00:46
912973|interactive-a|tom|COMPLETED|41-02:34:41
971356|interactive-a|mat|COMPLETED|0-04:36:35
971912|interactive-a|mat|COMPLETED|0-02:12:02
972668|interactive-a|mat|COMPLETED|0-00:09:06
In awk:
$ awk -F\| '$5 ~ /-|E/ || ($5 = "0-" $5) && gsub(/ /,"|")' file
-F\| set FS to |
$5 ~ /-|E/ matches and prints records with - OR E in fifth field
|| logical OR, ie. if previous didn't match, then:
($5 = "0-" $5) prepend 0- to fifth field
&& gsub(/ /,"|") AND replace those space-replaced field separators with |s.
above could be removed if -v OFS="|" was used:
$ awk -v OFS=\| -F\| '$5 ~ /-|E/ || ($5 = "0-" $5)' file
$ awk -v OFS=\| -F\| '$5 ~ /-|E/ || ($5 = "0-" $5)' file

grep/awk specific lines based on specific fields; using ksh variable with awk

I have this input file: file_in.txt (delimited by pipe)
3345:tyg|rty|27|0|0|ty6|{89|io|}62|0
3346:tyg|rtyuio|63|0|1|ty6|{89|gh|}45|0
3347:tyu|ray|24|0|0|ty6|{89|uh|}27|0
3348:tyg|rtoy|93|0|1|ty6|{89|yh|}1|0
3349:tyo|rtert|28|0|0|ty6|{89|gh|}27|0
I want to get only those lines which have 9th field value as }27 using '|' as delimiter so that my output should be:
3347:tyu|ray|24|0|0|ty6|{89|uh|}27|0
3349:tyo|rtert|28|0|0|ty6|{89|gh|}27|0
Below command works fine:
awk -F"|" '{ if ($9 == "}27") print $0 }' file_in.txt
But I want to use a shell variable instead of "}27" for which I tried this:
taskid="}27"
awk -v tid="$taskid" -F"|" '{ if ($9 == "}tid") print $0 }' file_in.txt
Please help me figure out where I am going wrong with this command.
Any other command suggestions to achieve the same are appreciated.
This should work:
taskid="}27"
awk -F'|' -v tid="$taskid" '$9 == tid' file
Output:
3347:tyu|ray|24|0|0|ty6|{89|uh|}27|0
3349:tyo|rtert|28|0|0|ty6|{89|gh|}27|0
Assuming your shell variable $tasked has the value 27, you want to use one of these forms:
build the string with the open brace in the shell
awk -v tid="}$taskid" -F"|" '$9 == tid' file
or do it in awk --- awk's string concatenation is just placing strings side-by-side with optional whitespace in between
awk -v tid="$taskid" -F"|" '$9 == "}" tid' file
Your own command should have worked with this change:
$ ksh
$ taskid=}27
$ awk -v tid=$taskid -F"|" '{ if ($9 == tid) print $0}' file_in.txt
Output:
3347:tyu|ray|24|0|0|ty6|{89|uh|}27|0
3349:tyo|rtert|28|0|0|ty6|{89|gh|}27|0

Multiple if statements in awk

I have a file that looks like
01/11/2015;998978000000;4890********3290;5735;ITUNES.COM/BILL;LU;Cross_border_rub;4065;17;915;INSUFF FUNDS;51;0;
There are 13 semicolon separated columns.
I'm trying to calculate 9 columns for all lines:
awk -F ';' -vOFS=';' '{ gsub(",", ".", $9); print }' file |
awk -F ';' '$0 = NR-1";"$0' |
awk -F ';' -vOFS=';' '{bar[$1]=$1;a[$1]=$2;b[$1]=$3;c[$1]=$4;d[$1]=$5;e[$1]=$6;f[$1]=$7;g[$1]=$8;h[$1]=$9;k[$1]=$10;l[$1]=$11;l[$1]=$12;m[$1]=$13;p[$1]=$14;};
if($7="International") {income=0.0162*h[i]+0.0425*h[i]};
else if($7="Domestic") {income=0.0188*h[i]};
else if($7="Cross_border_rub") {income=0.0162*h[i]+0.025*h[i]}
END{for(i in bar) print income";"a[i],b[i],c[i],d[i],e[i],f[i],g[i],h[i],k[i],l[i],m[i],p[i]}'
How exactly do multiple if statements correctly work in awk?
awk to the rescue!
You don't need the multiple awk invocations. Can consolidate into one
$ awk -F';' -v OFS=';' '{gsub(",", ".", $9)}
$7=="International" {income=(0.0162+0.0425)*$9}
$7=="Domestic" {income=0.0188*$9}
$7=="Cross_border_rub" {income=(0.0162+0.025)*$9}
# what happens for other values since previous income will be copied over
{print income, NR-1, $0}' file
test with your file since you didn't provide a enough sample to test.
Perhaps better if you just assign the rate
$ awk -F';' -v OFS=';' '{gsub(",", ".", $9); rate=0}
$7=="International" {rate=0.0162+0.0425}
$7=="Domestic" {rate=0.0188}
$7=="Cross_border_rub" {rate=0.0162+0.025}
{print rate*$9, NR-1, $0}' file

awk OFS not producing expected value

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?
You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were
Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were
Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

Resources