Objective
add "67" to column 1 of the output file with 67 being the variable ($iv) classified on the difference between 2 dates.
File1.csv
display,dc,client,20572431,5383594
display,dc,client,20589101,4932821
display,dc,client,23030494,4795549
display,dc,client,22973424,5844194
display,dc,client,21489000,4251031
display,dc,client,23150347,3123945
display,dc,client,23194965,2503875
display,dc,client,20578983,1522448
display,dc,client,22243554,920166
display,dc,client,20572149,118865
display,dc,client,23077785,28077
display,dc,client,21811100,5439
Current Output 3_file1.csv
BOB-UK-,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Desired Output 3_file1.csv
BOB-UK-67,display,dc,client,20572431,5383594,0.05,269.18
BOB-UK-67,display,dc,client,20589101,4932821,0.05,246.641
BOB-UK-67,display,dc,client,23030494,4795549,0.05,239.777
BOB-UK-67,display,dc,client,22973424,5844194,0.05,292.21
BOB-UK-67,display,dc,client,21489000,4251031,0.05,212.552
BOB-UK-67,display,dc,client,23150347,3123945,0.05,156.197
BOB-UK-67,display,dc,client,23194965,2503875,0.05,125.194
BOB-UK-67,display,dc,client,20578983,1522448,0.05,76.1224
BOB-UK-67,display,dc,client,22243554,920166,0.05,46.0083
BOB-UK-67,display,dc,client,20572149,118865,0.05,5.94325
BOB-UK-67,display,dc,client,23077785,28077,0.05,1.40385
BOB-UK-67,display,dc,client,21811100,5439,0.05,0.27195
TOTAL,,,,,33430004,,1671.5
Current Code
#! bin/sh
set -eu
de=$(date +"%d-%m-%Y" -d "1 month ago")
ds="15-04-2014"
iv=$(awk -vdate1=$de -vdate2=$ds 'BEGIN{split(date1, A,"-");split(date2, B,"-");year_diff=A[3]-B[3];if(year_diff){months_diff=A[2] + 12 * year_diff - B[2] + 1;} else {months_diff=A[2]>B[2]?A[2]-B[2]+1:B[2]-A[2]+1};print months_diff}')
for f in $(find *.csv); do
awk -F"," -v OFS=',' '{print "BOB-UK-"$iv,$0,0.05}' $f > "1_$f.csv" ##PROBLEM LINE##
awk -F"," -v OFS=',' '{print $0,$6*$7/1000}' "1_$f.csv" > "2_$f.csv" ##calculate price
awk -F"," -v OFS=',' '{print $0}; {sum+=$6}{sum2+=$8} END {print "TOTAL,,,,," (sum)",,"(sum2)}' "2_$f.csv" > "3_$f.csv" ##calculate total
done
Issue
When I run the first awk line (Marked as "## PROBLEM LINE##") the loop doesn't change column $1 to include the "67" after "BOB-UK-". This should be done with the print "BOB-UK-"$iv but instead it doesn't do anything. I suspect this is due to the way print works in awk but I haven't been able to work out a way to treat it within this row. Does anyone know if this is possible or do I need to create a new row to achieve this?
You have to pass the variable value to awk. awk does not inherit variables from the shell and does not expand $variable variables like shell. It is another tool with it's internal language.
awk -v iv="$iv" -F"," -v OFS=',' '{print "BOB-UK-"iv,$0,0.05}' "$f"
Tested in repl with the input provided.
for f in $(find *.csv)
Is useless use of find, makes no sense, just
for f in *.csv
Also note that you are creating 1_$f.csv, 2_$f.csv and 3_$f.csv files in the current directory in your loop, so the next time you run your script there will be 4 times more .csv files to iterate through. Dunno if that's relevant.
How $iv works in awk?
The $<number> is the field number <number> from the line in awk. So for example the $1 is the first field of the line in awk. The $2 is the second field. The $0 is special and it is the whole line.
The $iv expands to $ + the value of iv. So for example:
echo a b c | awk '{iv=2; print $iv}'
will output b, as the $iv expands to $2 then $2 expands to the second field from the input - ie. b.
Uninitialized variables in awk are initialized with 0. So $iv is substituted for $0 in your awk line, so it expands for the whole line.
Given below is the file content and the awk command used:
Input file:in_t.txt
1,ABC,SSS,20-OCT-16,4,1,0,5,0,0,0,0
2,DEF,AAA,20-JUL-16,4,1,0,5,0,0,0,0
Expected outfile:
SSS|2016-10-20,5
AAA|2016-07-20,5
I tried the below command:
awk -F , '{print $3"|"$(date -d 4)","$8}' in_t.txt
Got the outfile as:
SSS|20-OCT-16,5
AAA|20-JUL-16,5
Only thing I want to know is on how to format the date with the same awk command. Tried with
awk -F , '{print $3"|"$(date -d 4)","$8 +%Y-%m-%d}' in_t.txt
Getting syntax error. Can I please get some help on this?
Better to do this in shell itself and use date -d to convert the date format:
#!/bin/bash
while IFS=',' read -ra arr; do
printf "%s|%s,%s\n" "${arr[2]}" $(date -d "${arr[3]}" '+%Y-%m-%d') "${arr[7]}"
done < file
SSS|2016-10-20,5
AAA|2016-07-20,5
What's your definition of a single command? A call to awk is a single shell command. This may be what you want:
$ awk -F'[,-]' '{ printf "%s|20%02d-%02d-%02d,%s\n", $3, $6, (match("JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC",$5)+2)/3, $4, $10 }' file
SSS|2016-10-20,5
AAA|2016-07-20,5
BTW it's important to remember that awk is not shell. You can't call shell tools (e.g. date) directly from awk any more than you could from C. When you wrote $(date -d 4) awk saw an unset variable named date (numeric value 0) from which you extracted the value of an unset variable named d (also 0) to get the numeric result 0 which you then concatenated with the number 4 to get 04 and then applied the $ operator to to get the contents of field $04 (=$4). The output has nothing to do with the shell command date.
From Unix.com
Just tweaked it a little to suit your needs
awk -v var="20-OCT-16" '
BEGIN{
split("JAN FEB MAR APR MAY JUN JUL AUG SEP OCT NOV DEC", month, " ")
for (i=1; i<=12; i++) mdigit[month[i]]=i
m=toupper(substr(var,4,3))
dat="20"substr(var,8,2)"-"sprintf("%02d",mdigit[m])"-"substr(var,1,2)
print dat
}'
2016-10-20
Explanation:
Prefix 20 {20}
Substring from 8th position to 2 positions {16}
Print - {-}
Check for the month literal (converting into uppercase) and assign numbers (mdigit) {10}
Print - {-}
Substring from 1st position to 2 positions {20}
This may work for you also.
awk -F , 'BEGIN {months = " JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{ num = index(months, substr($4,4,3)) / 3
if (length(num) == 1) {num = "0" num}
date = "20" substr($4,8,2) "-" num "-" substr($4,1,2)
print $3"|" date "," $8}' in_t.txt
You were close with your call to date. You can indeed use it with getline to parse and output the date value:
awk -F',' '{
parsedate="date --date="$2" +%Y-%m-%d"
parsedate | getline mydate
close(parsedate)
print $3"|"mydate","$8
}'
Explanation:
-F',' sets the field separator (delimiter) to comma
parsedate="date --date="$2" +%Y-%m-%d" leverages date's ability to convert the 2nd field to a given output format and assigns that command to the variable "parsedate"
parsedate | getline mydate runs your custom "parsedate" command, and assigns the output to the mydate variable
close (parsedate) prevents certain errors with multiline input/output (See Running a system command in AWK for discussion of getline and close())
print $3"|"mydate","$8 outputs the contents of the original line separated by pipe and comma with the new "mydate" value substituted for field 2.
Content of the file is
Feb-01-2014 one two
Mar-02-2001 three four
I'd like to format the first field (the date) to %Y%m%d format
I'm trying to use a combination of awk and date command, but somehow this is failing even though i got the feeling i'm almost there:
cat infile | awk -F"\t" '{$1=system("date -d " $1 " +%Y%m%d");print $1"\t"$2"\t"$3}' > test
this prints out date's usage pages which makes me think that the date command is triggered properly, but there is something wrong with the argument, do you see the issue somewhere?
i'm not that familiar with awk,
You don't need date for this, its simply rearranging the date string:
$ awk 'BEGIN{FS=OFS="\t"} {
split($1,t,/-/)
$1 = sprintf("%s%02d%s", t[3], (match("JanFebMarAprMayJunJulAugSepOctNovDec",t[1])+2)/3, t[2])
}1' file
20140201 one two
20010302 three four
You can use:
while read -r a _; do
date -d "$a" '+%Y%m%d'
done < file
20140201
20010302
system() returns the exit code of the command.
Instead:
cat infile | awk -F"\t" '{"date -d " $1 " +%Y%m%d" | getline d;print d"\t"$2"\t"$3}'
$ awk '{var=system("date -d "$1" +%Y%m%d | tr -d \"\\n\"");printf "%s\t%s\t%s\n", var, $2, $3}' file
201402010 one two
200103020 three four
I have a file that looks like
1234-00AA12 .02
5678-11BB34 .03
In a bash script I have an expression like
day=$(...)
that greps a date in the format YYYY/MM/DD (if this matters), let's say 2014/01/21 for specificity.
I want to produce the following:
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
(The first column is the day, the second and third columns are fixed as "1").
After a bit of googling I tried:
cat file|awk -F "-" '{split($2,array," "); printf "%s,%s,%s,%s,%s,%s\n",$day,"1","1",$1,array[1],array[2]}'> output.csv
but $day isn't working with awk.
Any help would be appreciated.
Try this awk:
awk -v d=$(date '+%Y/%m/%d') '{print d,1,1,$1,$2}' OFS=, file
2014/02/07,1,1,1234-00AA12,.02
2014/02/07,1,1,5678-11BB34,.03
$ awk -v day="$day" 'BEGIN{FS="[ -]";OFS=","} {print day,1,1,$1,$2,$3}' file
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
awk wouldn't understand shell variables. You need to pass those to it:
awk -vdd="$day" -F "-" '{split($2,array," "); printf "%s,%s,%s,%s,%s,%s\n",dd,"1","1",$1,array[1],array[2]}'
Moreover, rather than saying:
cat file | awk ...
avoid the useless use of cat:
awk file
With bash
day="2014/01/21"
(
IFS=,
while IFS=" -" read -ra fields; do
new=( "$day" 1 1 "${fields[#]}" )
echo "${new[*]}"
done < file
)
2014/01/21,1,1,1234,00AA12,.02
2014/01/21,1,1,5678,11BB34,.03
I run the while loop in a subshell just to keep changes to IFS localized.