issue with OFS in awk - bash

I have a string containing this (field separator is the percentage sign), stored in a variable called data
201%jkhjfhn%kfhngjm%mkdfhgjdfg%mkdfhgjdfhg%mkdhfgjdhfg%kdfhgjgh%kdfgjhgfh%mkfgnhmkgfnh%k,gnhjkgfn%jkdfhngjdfng
I'm trying to print out that string replacing the percentage sign with a pipe but it seems harden than i thought:
echo ${data} | awk -F"%" 'BEGIN {OFS="|"} {print $0}'
I know I'm very close to it just not close enough.
I see that code as:
1 echo the variable value into a awk session
2 set field separator as "%"
3 set as output field separator "|"
4 print the line

Try this :
echo "$data" | awk -F"%" 'BEGIN {OFS="|"} {$1=$1; print $0}'
From awk manual
Finally, there are times when it is convenient to force awk to rebuild the entire
record, using the current value of the fields and OFS. To do this, use the seemingly
innocuous assignment:
$1 = $1 # force record to be reconstituted
print $0 # or whatever else with $0
Another lightweight way using only tr if you search an alternative for awk :
tr '%' '|' <<< "$data"

Sputnick gave you the awk solution, but you don't actually need awk at all, just use your shell:
echo ${data//%/|}

Related

How can we use '~|~' delimiter to split the records using scripting command?

Please suggest how can I split the columns separated with ~|~ delimiter.(file: abc.dat)
a~|~1~|~x
b~|~1~|~y
c~|~2~|~z
I am trying below awk command but getting output 0 count.
awk -F'~|~' '$2 == 1' ${file} | wc -l
With your shown samples, please try following. We need not to use wc command along with awk, it could be done within awk itself.
awk -F'~\\|~' '$2 == 1{count++} END{print count}' "$file"
Explanation: Setting field separator as ~|~(escaped | here). Then checking if 2nd field is 1, increment variable count with 1 then. In END block of this program print its value.
For saving values into shell variable use like:
var=$(awk -F'~\\|~' '$2 == 1{count++} END{print count}' "$file")
You can also use ~[|]~ as FS value, as the pipe char used inside a bracket expression always matches itself, a pipe char:
counter=$(awk 'BEGIN{FS="~[|]~"} $2==1{cnt++} END{print cnt}' file)
See the online awk demo:
s='a~|~1~|~x
b~|~1~|~y
c~|~2~|~z'
counter=$(awk 'BEGIN{FS="~[|]~"} $2==1{cnt++} END{print cnt}' <<< "$s")
echo $counter
# => 2

Awk add a variable to part of a column string

Objective
add "67" to column 1 of the output file with 67 being the variable ($iv) classified on the difference between 2 dates.
File1.csv
display,dc,client,20572431,5383594
display,dc,client,20589101,4932821
display,dc,client,23030494,4795549
display,dc,client,22973424,5844194
display,dc,client,21489000,4251031
display,dc,client,23150347,3123945
display,dc,client,23194965,2503875
display,dc,client,20578983,1522448
display,dc,client,22243554,920166
display,dc,client,20572149,118865
display,dc,client,23077785,28077
display,dc,client,21811100,5439
Current Output 3_file1.csv
BOB-UK-,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Desired Output 3_file1.csv
BOB-UK-67,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-67,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-67,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-67,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-67,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-67,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-67,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-67,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-67,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-67,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-67,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-67,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Current Code
#! bin/sh
set -eu
de=$(date +"%d-%m-%Y" -d "1 month ago")
ds="15-04-2014"
iv=$(awk -vdate1=$de -vdate2=$ds 'BEGIN{split(date1, A,"-");split(date2, B,"-");year_diff=A[3]-B[3];if(year_diff){months_diff=A[2] + 12 * year_diff - B[2] + 1;} else {months_diff=A[2]>B[2]?A[2]-B[2]+1:B[2]-A[2]+1};print months_diff}')
for f in $(find *.csv); do
awk -F"," -v OFS=',' '{print "BOB-UK-"$iv,$0,0.05}' $f > "1_$f.csv" ##PROBLEM LINE##
awk -F"," -v OFS=',' '{print $0,$6*$7/1000}' "1_$f.csv" > "2_$f.csv" ##calculate price
awk -F"," -v OFS=',' '{print $0}; {sum+=$6}{sum2+=$8} END {print "TOTAL,,,,," (sum)",,"(sum2)}' "2_$f.csv" > "3_$f.csv" ##calculate total
done
Issue
When I run the first awk line (Marked as "## PROBLEM LINE##") the loop doesn't change column $1 to include the "67" after "BOB-UK-". This should be done with the print "BOB-UK-"$iv but instead it doesn't do anything. I suspect this is due to the way print works in awk but I haven't been able to work out a way to treat it within this row. Does anyone know if this is possible or do I need to create a new row to achieve this?
You have to pass the variable value to awk. awk does not inherit variables from the shell and does not expand $variable variables like shell. It is another tool with it's internal language.
awk -v iv="$iv" -F"," -v OFS=',' '{print "BOB-UK-"iv,$0,0.05}' "$f"
Tested in repl with the input provided.
for f in $(find *.csv)
Is useless use of find, makes no sense, just
for f in *.csv
Also note that you are creating 1_$f.csv, 2_$f.csv and 3_$f.csv files in the current directory in your loop, so the next time you run your script there will be 4 times more .csv files to iterate through. Dunno if that's relevant.
How $iv works in awk?
The $<number> is the field number <number> from the line in awk. So for example the $1 is the first field of the line in awk. The $2 is the second field. The $0 is special and it is the whole line.
The $iv expands to $ + the value of iv. So for example:
echo a b c | awk '{iv=2; print $iv}'
will output b, as the $iv expands to $2 then $2 expands to the second field from the input - ie. b.
Uninitialized variables in awk are initialized with 0. So $iv is substituted for $0 in your awk line, so it expands for the whole line.

Can I have multiple awk actions without inserting newlines?

I'm a newbie with very small and specific needs. I'm using awk to parse something and I need to generate uninterrupted lines of text assembled from several pieces in the original text. But awk inserts a newline in the output whenever I use a semicolon.
Simplest example of what I mean:
Original text:
1 2
awk command:
{ print $1; print $2 }
The output will be:
1
2
The thing is that I need the output to be a single line, and I also need to use the semicolons, because I have to do multiple actions on the original text, not all of them print.
Also, using ORS=" " causes a whole lot of different problems, so it's not an option.
Is there any other way that I can have multiple actions in the same line without newline insertion?
Thanks!
The newlines in the output are nothing to do with you using semicolons to separate statements in your script, they are because print outputs the arguments you give it followed by the contents of ORS and the default value of ORS is newline.
You may want some version of either of these:
$ echo '1 2' | awk '{printf "%s ", $1; printf "%s ", $2; print ""}'
1 2
$
$ echo '1 2' | awk -v ORS=' ' '{print $1; print $2; print "\n"}'
1 2
$
$ echo '1 2' | awk -v ORS= '{print $1; print " "; print $2; print "\n"}'
1 2
$
but it's hard to say without knowing more about what you're trying to do.
At least scan through the book Effective Awk Programming, 4th Edition, by Arnold Robbins to get some understanding of the basics before trying to program in awk or you're going to waste a lot of your time and learn a lot of bad habits first.
You have better control of the output if you use printf, e.g.
awk '{ printf "%s %s\n",$1,$2 }'
awk '{print $1 $2}'
Is the solution in this case
TL;DR
You're getting newlines because print sends OFS to standard output after each print statement. You can format the output in a variety of other ways, but the key is generally to invoke only a single print or printf statement regardless of how many fields or values you want to print.
Use Commas
One way to do this is to use a single call to print using commas to separate arguments. This will insert OFS between the printed arguments. For example:
$ echo '1 2' | awk '{print $1, $2}'
1 2
Don't Separate Arguments
If you don't want any separation in your output, just pass all the arguments to a single print statement. For example:
$ echo '1 2' | awk '{print $1 $2}'
12
Formatted Strings
If you want more control than that, use formatted strings using printf. For example:
$ echo '1 2' | awk '{printf "%s...%s\n", $1, $2}'
1...2
$ echo "1 2" | awk '{print $1 " " $2}'
1 2

passing for loop index into awk

I am trying to pass a for loop index i into awk but keep getting unexpected token awk errors.
First I tried using the -v option within awk:
for i in "${myarray}"
awk -v var=$i '/var/{print}' myfile.dat
done
I also tried calling the variable directly using single quotes:
for i in "${myarray}"
awk '/'"$i"'/{print}' myfile.dat
done
My end goal is to learn how to pass a for loop index variable through awk as the search pattern. I'd like the above code to search through myfile.dat and print lines which contain the strings in myarray.
There are 2 problems:
Array traversing should be like this for i in "${myarray[#]}"
awk treats text between /.../ as regex literal, to use a variable use $0 ~ var.
Your code should be:
for i in "${myarray[#]}"; do
awk -v var="$i" '$0 ~ var' myfile.dat
done
{print} is default action in awk that you can omit as shown above.
you can do the same loop free as well, e.g.,
echo "${myarray[#]}" | tr ' ' '|' | awk 'NR==FNR{pat=$0; next} $0 ~ pat' - file

awk - split only by first occurrence

I have a line like:
one:two:three:four:five:six seven:eight
and I want to use awk to get $1 to be one and $2 to be two:three:four:five:six seven:eight
I know I can get it by doing sed before. That is to change the first occurrence of : with sed then awk it using the new delimiter.
However replacing the delimiter with a new one would not help me since I can not guarantee that the new delimiter will not already be somewhere in the text.
I want to know if there is an option to get awk to behave this way
So something like:
awk -F: '{print $1,$2}'
will print:
one two:three:four:five:six seven:eight
I will also want to do some manipulations on $1 and $2 so I don't want just to substitute the first occurrence of :.
Without any substitutions
echo "one:two:three:four:five" | awk -F: '{ st = index($0,":");print $1 " " substr($0,st+1)}'
The index command finds the first occurance of the ":" in the whole string, so in this case the variable st would be set to 4. I then use substr function to grab all the rest of the string from starting from position st+1, if no end number supplied it'll go to the end of the string. The output being
one two:three:four:five
If you want to do further processing you could always set the string to a variable for further processing.
rem = substr($0,st+1)
Note this was tested on Solaris AWK but I can't see any reason why this shouldn't work on other flavours.
Some like this?
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1'
one two:three:four:five:six
This replaces the first : to space.
You can then later get it into $1, $2
echo "one:two:three:four:five:six" | awk '{sub(/:/," ")}1' | awk '{print $1,$2}'
one two:three:four:five:six
Or in same awk, so even with substitution, you get $1 and $2 the way you like
echo "one:two:three:four:five:six" | awk '{sub(/:/," ");$1=$1;print $1,$2}'
one two:three:four:five:six
EDIT:
Using a different separator you can get first one as filed $1 and rest in $2 like this:
echo "one:two:three:four:five:six seven:eight" | awk -F\| '{sub(/:/,"|");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
Unique separator
echo "one:two:three:four:five:six seven:eight" | awk -F"#;#." '{sub(/:/,"#;#.");$1=$1;print "$1="$1 "\n$2="$2}'
$1=one
$2=two:three:four:five:six seven:eight
The closest you can get with is with GNU awk's FPAT:
$ awk '{print $1}' FPAT='(^[^:]+)|(:.*)' file
one
$ awk '{print $2}' FPAT='(^[^:]+)|(:.*)' file
:two:three:four:five:six seven:eight
But $2 will include the leading delimiter but you could use substr to fix that:
$ awk '{print substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
two:three:four:five:six seven:eight
So putting it all together:
$ awk '{print $1, substr($2,2)}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
Storing the results of the substr back in $2 will allow further processing on $2 without the leading delimiter:
$ awk '{$2=substr($2,2); print $1,$2}' FPAT='(^[^:]+)|(:.*)' file
one two:three:four:five:six seven:eight
A solution that should work with mawk 1.3.3:
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1}' FS='\0'
one
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $2}' FS='\0'
two:three:four five:six:seven
awk '{n=index($0,":");s=$0;$1=substr(s,1,n-1);$2=substr(s,n+1);print $1,$2}' FS='\0'
one two:three:four five:six:seven
Just throwing this on here as a solution I came up with where I wanted to split the first two columns on : but keep the rest of the line intact.
Comments inline.
echo "a:b:c:d::e" | \
awk '{
split($0,f,":"); # split $0 into array of fields `f`
sub(/^([^:]+:){2}/,"",$0); # remove first two "fields" from `$0`
print f[1],f[2],$0 # print first two elements of `f` and edited `$0`
}'
Returns:
a b c:d::e
In my input I didn't have to worry about the first two fields containing escaped :, if that was a requirement, this solution wouldn't work as expected.
Amended to match the original requirements:
echo "a:b:c:d::e" | \
awk '{
split($0,f,":");
sub(/^([^:]+:)/,"",$0);
print f[1],$0
}'
Returns:
a b:c:d::e

Resources