awk OFS not producing expected value - bash

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?

You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were

awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were

Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were

Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

Related

Shell script to add values to a specific column

I have semicolon-separated columns, and I would like to add some characters to a specific column.
aaa;111;bbb
ccc;222;ddd
eee;333;fff
to the second column I want to add '#', so the output should be;
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
I tried
awk -F';' -OFS=';' '{ $2 = "#" $2}1' file
It adds the character but removes all semicolons with space.
You could use sed to do your job:
# replaces just the first occurrence of ';', note the absence of `g` that
# would have made it a global replacement
sed 's/;/;#/' file > file.out
or, to do it in place:
sed -i 's/;/;#/' file
Or, use awk:
awk -F';' '{$2 = "#"$2}1' OFS=';' file
All the above commands result in the same output for your example file:
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
#atb: Try:
1st:
awk -F";" '{print $1 FS "#" $2 FS $3}' Input_file
Above will work only when your Input_file has 3 fields only.
2nd:
awk -F";" -vfield=2 '{$field="#"$field} 1' OFS=";" Input_file
Above code you could put any field number and could make it as per your request.
Here I am making field separator as ";" and then taking a variable named field which will have the field number in it and then that concatenating "#" in it's value and 1 is for making condition TRUE and not making and action so by default print action will happen of current line.
You just misunderstood how to set variables. Change -OFS to -v OFS:
awk -F';' -v OFS=';' '{ $2 = "#" $2 }1' file
but in reality you should set them both to the same value at one time:
awk 'BEGIN{FS=OFS=";"} { $2 = "#" $2 }1' file

Gawk Line removal, Splitter is :

Is it possible to move certain columns from one .txt file into another .txt file?
I have a .txt that contains:
USERID:ORDER#:IP:PHONE:ADDRESS:POSTCODE
USERID:ORDER#:IP:PHONE:ADDRESS:POSTCODE
With gawk I want to extract ADDRESS & POSTCODE columns into another .txt, so for this given file the output should be:
ADDRESS1:POSTCODE1
ADDRESS2:POSTCODE2
etc.
This is a classic AWK transform. You want to use "-F :" to specify that the input is delimited by ":" and print a new ":" on output:
awk -F: '{ print $5 ":" $6 }' <input.txt >output.txt
Try that:
awk -F: '{printf "%s:%s ",$5,$6}' ex.txt
input is
USERID:ORDER#:IP:PHONE:ADDRESS1:POSTCODE1
USERID:ORDER#:IP:PHONE:ADDRESS2:POSTCODE2
output is (on one line if I understand correctly)
ADDRESS1:POSTCODE1 ADDRESS2:POSTCODE2
only default is that it ends with a trailing space and does not end with a newline.
Which can be fixed with the slightly more complex (but still readable):
awk -F: 'BEGIN {z=0;} {if (z==1) { printf " "; } ; z=1; printf "%s:%s",$5,$6} END{printf"\n"}' ex.txt
awk -F: 'NR==1 {print $5"1:"$6"1"};NR==2 {print $5"2:"$6"2"}' file
ADDRESS1:POSTCODE1
ADDRESS2:POSTCODE2

enclose a string where missing double quotes

I have an input file like below. The issue is that the file is pipe delimited and enclosed by double quotes, optionally. It is missed in the third field at the end of the string and I could see that it happens whenever the length exceeds say 2.
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2|10301 # 3rd field -> closing " missed out
The output should look like
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
I was trying with some awk commands but could not achieve it.
awk -F'|' -v q=\" '{$3=$3 q;}1' OFS=| temp
awk -F'|' -v q=\" '{if (length($3) > 2) ($3=$3;}1)}' OFS='|' temp
Using awk you can write,
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}'
Example
awk -F'"?\\|' -vOFS='"|' '{print $1, $2, $3, $4}' input
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301
What it does?
-F'"?\\|' Sets the input field separator to either "| or |
-vOFS='"|' Sets the output filed separator to "|. This is set always, that is even if the input field separator is | or "|
Or you can also write
awk -F'"?\|' -vOFS='"|' '1' input
Here 1 is always evaluated to true, in which case it will print the entire line.
awk -F'"?\\|' -vOFS='"|' '1' input
or
awk -F'"?\\|' -vOFS='"|' '{$1=$1}1' input
See #Kent's comment.
EDIT
If you want to add the quoting only for the third filed based on the length, you can write something like
awk -F'|' -vOFS='|' '{print $1, $2, $3(length($3)>4 ? "\"" : ""), $4}'
this sed one-liner works for given example:
sed 's/\([^"]\)|"/\1"|"/' file # this only works for the original example
This works for the original and current example:
sed 's/\([^"]\)|/\1"|/' file
awk '{sub(/Asdf2/,"Asdf2\"")}1' file
"SER1828"|"ZXC"|"A1"|10002
"SER1878"|"IOP"|"B1"|98989
"SER1930"|"QWE"|"A2"|10301
"SER1930"|"QWE"|"Asdf2"|10301

Format date in a column using awk

I just want to fix this problem. I am running the code below
awk -F, 'NR>1{gsub(/\:/,"",$4);gsub(/\-/,"",$4);gsub(/\.0/,"",$4);gsub(/\ /,",",$4);NF--}{$1=$1}1' OFS=, sample
$cat sample
1,0,null,2014-11-24 08:15:18.0,1
1,0,null,2014-11-24 08:15:16.0,1
The output is
1,0,null,2014-11-24 08:15:18.0,1
1,0,null,20141124,081516
My expected output:
1,0,null,20141124,081518,1
1,0,null,20141124,081516,1
Anyone who could help me with my code above?
You probably just need
awk -F, '{gsub(/[-:]/,"",$4);sub(/ /,OFS,$4);sub(/\.0$/,"",$4)}1' OFS=, sample
Instead of using gsub, you are better off using split.
awk '
BEGIN { FS = OFS = "," }
{
split ($4, flds, /[- :.]/);
$4 = flds[1] flds[2] flds[3] FS flds[4] flds[5] flds[6]
}1' sample
1,0,null,20141124,081518,1
1,0,null,20141124,081516,1
We set the input and output field separator in the BEGIN block to ,.
Using split, we break the forth field on -, :, . and space in to an array.
We then re-construct the forth field by concatenating the array elements.
1 at the end will do default awk action, that is print.
#!/usr/bin/awk -f
$1 {
gsub(/(\.0|[-:])/, "")
gsub(/ /, ",")
print
}
$ awk 'BEGIN{FS=OFS=","} {gsub(/[-:]|\.0/,"",$4); sub(/ /,OFS,$4)} 1' file
1,0,null,20141124,081518,1
1,0,null,20141124,081516,1
or:
$ awk 'BEGIN{FS="[ ,]";OFS=","} {gsub(/-/,"",$4); gsub(/:|\.0/,"",$5)} 1' file
1,0,null,20141124,081518,1
1,0,null,20141124,081516,1

AWK: How to use OFS ignoring blank and commented out lines

I'm trying to rewrite a file on the fly, like this:
10.213.20.173, mem_chld, p3b-aggr-103, c3.xlarge, db, mysql
#10.213.20.191, mem_leaf, p3b-leaf-101, r3.xlarge, db, mysql
10.213.20.192, mem_leaf, p3b-leaf-102, r3.xlarge, db, mysql
10.213.20.190, mem_leaf, p3b-leaf-103, r3.xlarge, db, mysql
.....
from the original , separated filed to a : separated ones. So, I used this:
awk -F', ' 'BEGIN{OFS=":";} { $1=$1; print }'
which is pretty much working but that file also has some blank and commented out lines, which I also want to exclude. My attempt with:
awk -F', ' '!/^(#|$)/ {OFS=":";} { $1=$1; print }'
did not work as I expected. How can I do that? Best!
Using awk:
$ awk -F', ' 'BEGIN{OFS=":"} !/^#/ && NF{$1=$1; print}' file
10.213.20.173:mem_chld:p3b-aggr-103:c3.xlarge:db:mysql
10.213.20.192:mem_leaf:p3b-leaf-102:r3.xlarge:db:mysql
10.213.20.190:mem_leaf:p3b-leaf-103:r3.xlarge:db:mysql
alternatively you can set OFS like:
awk -F', ' -v OFS=':' '!/^#/ && NF{$1=$1; print}' file
or even
awk -F', ' '!/^#/ && NF{$1=$1; print}' OFS=':' file
As Ed Morton suggested in the comments, for an edge case where you might have space before the # it is best to use the following:
awk -F', ' 'BEGIN{OFS=":"} !/^[[:space:]]*#/ && NF{$1=$1; print}' file
Explanation:
$1=$1 rebuilds the $0 variable. It takes all the fields and concatenates them, separated by OFS which we have set to : instead of space which is the default.
What about:
awk -F', ' -v OFS=':' '/^[^#]/ {$1=$1; print}' datafile
This will ignore both empty lines and lines starting with a # sign.
If comments might be preceded by some spaces, you would prefer:
awk -F', ' -v OFS=':' '!/^[ \t]*(#.*)?$/ {$1=$1; print}' datafile
awk -F', ' -v OFS=: '/^[ \t]*(#|$)/{next}{$1=$1}1' file
Output:
10.213.20.173:mem_chld:p3b-aggr-103:c3.xlarge:db:mysql
10.213.20.192:mem_leaf:p3b-leaf-102:r3.xlarge:db:mysql
10.213.20.190:mem_leaf:p3b-leaf-103:r3.xlarge:db:mysql

Resources