How to pad a CSV first column with zeroes in awk? - bash

I have a CSV like this:
1,"Paris","3.57"
10,"Singapore","3.57"
211,"Sydney","3.28"
324,"Toronto Center","3.33"
I'd like to pad the first column with zeroes to get:
001,"Paris","3.57"
010,"Singapore","3.57"
211,"Sydney","3.28"
324,"Toronto Center","3.33"
I tried to assign the first column to the output of printf with awk:
awk '{ $1 = printf("%03d", $1); print }' my.csv
But it gives me a syntax error :
awk: cmd. line:1: { $1 = printf("%03d", $1); print }
awk: cmd. line:1: ^ syntax error
It doesn't work either if I quote the printf function.
How could I do that?

If you want just to format the text of one field then you can use sprintf of awk.
awk '{ $1=sprintf("%03d", $1)}1' csvfile
Or standard way:
awk '{printf "%03d %s\n", $1,$2}' csvfile
As per update by OP in question:
awk 'BEGIN{FS=OFS=","}{ $1=sprintf("%03d", $1)}1' csvfile

printf is not a function, it is a keyword, and its result cannot be assigned.
To return a formatted string, use sprintf (which is a function):
awk -F, -v OFS=, '{ $1 = sprintf("%03d", $1) } 1' file
It is necessary to set FS (via -F) and OFS so that when awk reformats the line, the field separators remain intact.
As pointed out in the comments, using %d can potentially lead to problems when the input starts with a 0, as numbers with a leading 0 are interpreted as octal. This can break on input like 08 because 8 is outside of the octal range (0-7).
One way to get around this is to use %03.0f, which interprets the input as a floating point value, with the output precision set to 0:
awk -F, -v OFS=, '{ $1 = sprintf("%03f.0", $1) } 1' file
(the second 0 in the format specifier can in fact be omitted)

awk '{printf("%03d", $1) ; print " "$2}' my.csv

Related

awk print getting error - not enough arguments to satisfy format string

I am getting below error for awk print
awk: cmd. line:1: (FILENAME=- FNR=1) fatal: not enough arguments to satisfy format string
`MiNirzRjlOth%ER48R6K0SGDY!'
^ ran out for this one
code using
ABC=`echo ${ABC1}| awk '{printf $NF}'`
where the value of ABC1 is random string
Never do printf input_data for any input data ($1, $NF, $0, or any variable that holds previously read input data) as it'll fail when your input data contains printf formatting chars like %s, as you're now seeing with %E... appearing in your input. Do printf "%s", input_data instead.
printf in awk is to print something using a format template. Documentation here. If you just want to print the variable as is, you don't need to use this.
You have specified $NF - which is "variable number equal to count of variables", effectively "last variable".
In your case you can one of the following, which will give the same result:
`awk '{print $NF}'`
MiNirzRjlOth%ER48R6K0SGDY!
or using formatting. For strings code %s is used.
`awk '{printf "%s", $NF}'`
MiNirzRjlOth%ER48R6K0SGDY!
Using printf has the advantage that you can pad with data, left/right adjust and much more.

Shell script to add values to a specific column

I have semicolon-separated columns, and I would like to add some characters to a specific column.
aaa;111;bbb
ccc;222;ddd
eee;333;fff
to the second column I want to add '#', so the output should be;
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
I tried
awk -F';' -OFS=';' '{ $2 = "#" $2}1' file
It adds the character but removes all semicolons with space.
You could use sed to do your job:
# replaces just the first occurrence of ';', note the absence of `g` that
# would have made it a global replacement
sed 's/;/;#/' file > file.out
or, to do it in place:
sed -i 's/;/;#/' file
Or, use awk:
awk -F';' '{$2 = "#"$2}1' OFS=';' file
All the above commands result in the same output for your example file:
aaa;#111;bbb
ccc;#222;ddd
eee;#333;fff
#atb: Try:
1st:
awk -F";" '{print $1 FS "#" $2 FS $3}' Input_file
Above will work only when your Input_file has 3 fields only.
2nd:
awk -F";" -vfield=2 '{$field="#"$field} 1' OFS=";" Input_file
Above code you could put any field number and could make it as per your request.
Here I am making field separator as ";" and then taking a variable named field which will have the field number in it and then that concatenating "#" in it's value and 1 is for making condition TRUE and not making and action so by default print action will happen of current line.
You just misunderstood how to set variables. Change -OFS to -v OFS:
awk -F';' -v OFS=';' '{ $2 = "#" $2 }1' file
but in reality you should set them both to the same value at one time:
awk 'BEGIN{FS=OFS=";"} { $2 = "#" $2 }1' file

awk - how to replace semicolon in string in csv file?

I need to manage smtp logfile handling in my company.
These logfiles need to be imported to MSSQL, so it is my job to provide this data.
I got strange undelivery message with a ";" in the string, I need to replace this with a comma.
So what I got:
Sender;Recipient;Operation;Answer;Error;Servername
bla#bla.com;rockit#sohard.com;RCPT TO;450;+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions;+try+later;M0641
Mention the ";" in the Answer field after "restrictions", dunno why the mail server sends semicolons, maybe to annoy me :P
I tried following with awk after I did a lot of research:
awk 'BEGIN{FS=OFS=";"} {for (i=5;i<=NF;i++) gsub (";",",",$i)} 1' myfile.csv
This command actually works but it seems it does nothing with my file, the ";" in the error field remains. What I am missing here ?
Replacing the fifth and later ; with ,
$ awk -F\; '{for (i=1;i<=NF;i++) printf "%s%s",$i,(i==NF?ORS:(i<=4?";":","))}' myfile.csv
Sender;Recipient;Operation;Answer;Error,Servername
bla#bla.com;rockit#sohard.com;RCPT TO;450;+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
How it works:
-F\;
This sets the field separator for input to ;.
for (i=1;i<=NF;i++) printf "%s%s",$i,(i==NF?ORS:(i<=4?";":","))
This loops over every field and prints the field followed by (a) ORS if we are on the last field, or (b) , if were are on field 5 or later, or (c) ; if we are on one of the first four fields.
Replacing all ; with ,
Try:
$ awk -F\; '{$1=$1} 1' OFS=, myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
bla#bla.com,rockit#sohard.com,RCPT TO,450,+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
How it works:
-F\;
This sets the field separator on input to a semicolon.
$1=$1
This causes awk to think the the line has been changed so that awk will update the output line to use the new field separator.
1
This tells awk to print the line.
OFS=,
This sets the field separator on output to a comma.
Alternative #1
$ awk '{gsub(/;/, ",")} 1' myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
bla#bla.com,rockit#sohard.com,RCPT TO,450,+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
Alternative #2
$ sed 's/;/,/g' myfile.csv
Sender,Recipient,Operation,Answer,Error,Servername
bla#bla.com,rockit#sohard.com,RCPT TO,450,+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later,M0641
I think your problem is replacing the unquotes delimiters in your logical 4th field in a five field wide input. Although this script is repetitious should be easier to understand
$ awk '{n=split($0,a,";");
for(i=1; i<4; i++) printf "%s;", a[i];
for(i=4; i<n-1; i++) printf "%s,", a[i];
printf "%s;%s\n", a[n-1], a[n]}' file
A better way to write the same based on #Ed Morton's comments
$ awk -F';' '{for(i=1; i<NF-1; i++) printf "%s"(i<4?FS:","), $i;
print $(NF-1) FS $NF}' file
For the input
1;2;3;4a;4b;4c;5
1;2;3;4;5
it generates
1;2;3;4a,4b,4c;5
1;2;3;4;5
If the offending semi-colons only appear in your 5th field then you can do this using GNU awk for the 3rd arg to match():
$ awk 'match($0,/(([^;]+;){4})(.*)(;[^;]+$)/,a){gsub(/;/,",",a[3]); print a[1] a[3] a[4]}' file
bla#bla.com;rockit#sohard.com;RCPT TO;450;+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later;M0641
If your fifth ; should be removed, append $6 to $5 and advance accordingly. This could be done with for loop (there are examples in SO) but since the fault is so near the end, we'll just do this in a simpler way:
$ awk 'BEGIN {FS=OFS=";"} NR==1 {nf=NF} NF==(nf+1) {$5=$5 "," $6; $6=$7; NF=nf} 1' file
Explained:
BEGIN {FS=OFS=";"} # set separator
NR==1 {nf=NF} # get field count from the first record (6)
NF==(nf+1) { # if record is one field longer:
$5=$5 "," $6 # append $6 to $5, comma-separated
$6=$7 # set $7 (NF) to $6 (nf)
NF=nf # reset NF
} 1 # output
Testing: Running the program and sending the output to cut -d\; -f 5 outputs:
Error
+4.2.0+<rockit#sohard.com>:+Recipient+address+rejected:+Policy+restrictions,+try+later

awk: sort file based on user input

I have this simple awk code:
awk -F, 'BEGIN{OFS=FS} {print $2,$1,$3}' $1
Works great, except I've hardcoded how I want to sort the comma-delimited fields of my plaintext file. I want to be able to specify at run time in which order I'd like to sort my fields.
One hacky way I thought about doing this was this:
read first
read second
read third
TOTAL=$first","$second","$third
awk -F, 'BEGIN{OFS=FS} {print $TOTAL}' $1
But this doesn't actually work:
awk: illegal field $(), name "TOTAL"
Also, I know a bit about awk's ability to accept user input:
BEGIN {
getline first < "-"
}
$1 == first {
}
But I wonder whether the variables created can in turn be used as variables in the original print command? Is there a better way?
You have to let bash expand $TOTAL before awk is called, so that awk sees the value of $TOTAL, not the literal string $TOTAL. This means using double, not single, quotes.
read first
read second
read third
# Dynamically construct the awk script to run
TOTAL="\$$first,\$$second,\$$third"
SCRIPT="BEGIN{OFS=FS} {print $TOTAL}"
awk -F, "$SCRIPT" "$1"
A safer method is to pass the field numbers as awk variables.
awk -F, -v c1="$first" -v c2="$second" -v c3="$third" 'BEGIN{OFS=FS} {print $c1, $c2, $c3}' "$1"
All you need is:
awk -v order='3 1 2' 'BEGIN{split(order,o)} {for (i=1;i<=NF;i++) printf "%s%s", $(o[i]), (i<NF?OFS:ORS)}'
e.g.:
$ echo 'a b c' | awk -v order='3 1 2' 'BEGIN{split(order,o)} {for (i=1;i<=NF;i++) printf "%s%s", $(o[i]), (i<NF?OFS:ORS)}'
c a b
$ echo 'a b c' | awk -v order='2 3 1' 'BEGIN{split(order,o)} {for (i=1;i<=NF;i++) printf "%s%s", $(o[i]), (i<NF?OFS:ORS)}'
b c a

awk OFS not producing expected value

I have a file
[root#nmk~]# cat file
abc>
sssd>
were>
I run both these variations of the awk commands
[root#nmk~]# cat file | awk -F\> ' { print $1}' OFS=','
abc
sssd
were
[root#nmk~]# cat file | awk -F\> ' BEGIN { OFS=","} { print $1}'
abc
sssd
were
[root#nmk~]#
But my expected output is
abc,sssd,were
What's missing in my commands ?
You're just a bit confused about the meaning/use of FS, OFS, RS and ORS. Take another look at the man page. I think this is what you were trying to do:
$ awk -F'>' -v ORS=',' '{print $1}' file
abc,sssd,were,$
but this is probably closer to the output you really want:
$ awk -F'>' '{rec = rec (NR>1?",":"") $1} END{print rec}' file
abc,sssd,were
or if you don't want to buffer the whole output as a string:
$ awk -F'>' '{printf "%s%s", (NR>1?",":""), $1} END{print ""}' file
abc,sssd,were
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1}' file
to print newline at the end:
awk -F\> -v ORS="" 'NR>1{print ","$1;next}{print $1} END{print "\n"}' file
output:
abc,sssd,were
Each line of input in awk is a record, so what you want to set is the Output Record Separator, ORS. The OFS variable holds the Output Field Separator, which is used to separate different parts of each line.
Since you are setting the input field separator, FS, to >, and OFS to ,, an easy way to see how these work is to add something on each line of your file after the >:
awk 'BEGIN { FS=">"; OFS=","} {$1=$1} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc,def
sssd,dsss
were,wolf
So you want to set the ORS. The default record separator is newline, so whatever you set ORS to effectively replaces the newlines in the input. But that means that if the last line of input has a newline - which is usually the a case - that last line will also get a copy of your new ORS:
awk 'BEGIN { FS=">"; ORS=","} 1' <<<$'abc>def\nsssd>dsss\nwere>wolf'
abc>def,sssd>dsss,were>wolf,
It also won't get a newline at all, because that newline was interpreted as an input record separator and turned into the output record separator - it became the final comma.
So you have to be a little more explicit about what you're trying to do:
awk 'BEGIN { FS=">" } # split input on >
(NR>1) { printf "," } # if not the first line, print a ,
{ printf "%s", $1 } # print the first field (everything up to the first >)
END { printf "\n" } # add a newline at the end
' <<<$'abc>\nsssd>\nwere>'
Which outputs this:
abc,sssd,were
Through sed,
$ sed ':a;N;$!ba;s/>\n/,/g;s/>$//' file
abc,sssd,were
Through Perl,
$ perl -00pe 's/>\n(?=.)/,/g;s/>$//' file
abc,sssd,were

Resources