File 1: myfilename_WEEK.csv
w27_2018,257,1,26.20,0.00,24.26
w28_2018,257,1,7.97,0.00,24.26
w29_2018,257,1,34.86,0.00,24.26
w30_2018,257,1,3.29,0.00,24.26
File 2: myfilename_MONTH.csv
m07_2018,257,1,94.78,0.00,121.31
m08_2018,257,1,719.60,0.00,262.47
m09_2018,257,1,14925.60,0.00,13903.24
m10_2018,257,1,51099.66,0.00,81600.69
File 3: myfilename_HALF.csv
h02_2018,257,1,155345.19,480029.21,235802.91
h01_2019,257,1,273961.84,552545.36,140706.27
h02_2018,258,1,3250552.06,1299785.91,3697749.57
h01_2019,258,1,3582585.66,2670427.72,4009391.28
calendar_file:
20180805,08/05/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,01,SUNDAY
20180806,08/06/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,02,MONDAY
...
20180811,08/11/2018,w27_2018,WK27 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,27,WEEK 27,07,SATURDAY
20180812,08/12/2018,w28_2018,WK28 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,28,WEEK 28,01,SUNDAY
..
20180816,08/16/2018,w28_2018,WK28 2018,m07_2018,AUG 2018,q03_2018,Q03 2018,h02_2018,H02 2018,a2018,FY2018,28,WEEK 28,05,THURSDAY
Expected output (newlines added for readability):
2018,w27_2018,WK27 2018,257,1,26.20,0.00,24.26
2018,w27_2018,WK27 2018,258,1,97192.07,9028.38,52130.32
2018,w27_2018,WK27 2018,300,1,181.44,0.00,-69.72
2018,m07_2018,AUG 2018,257,1,94.78,0.00,121.31
2018,m07_2018,AUG 2018,258,1,509253.46,45141.91,399648.71
2018,m07_2018,AUG 2018,300,1,409.10,0.00,-348.60
2018,h02_2018,H02 2018,257,1,155345.19,480029.21,235802.91
2018,h02_2018,H02 2018,258,1,3250552.06,1299785.91,3697749.57
2018,h02_2018,H02 2018,300,1,1112.93,0.00,-1164.35
I would like to join all myfilename_* to add a label and Fiscal Year using calendar_file:
Individual commands are:
awk -F, 'NR==FNR {a[$3]=substr($12,3,4) FS $3 FS $4; next} {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6}' calendar_file myfilename_WEEK.csv >> my_report.csv
awk -F, 'NR==FNR {a[$5]=substr($12,3,4) FS $5 FS $6; next} {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6}' calendar_file myfilename_MONTH.csv >> my_report.csv
awk -F, 'NR==FNR {a[$9]=substr($12,3,4) FS $9 FS $10; next} {print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6}' calendar_file myfilename_HALF.csv >> my_report.csv
I am trying to join all of these into a single loop:
I have tried the following but it doesn't work:
for exp_file in `ls myfilename_*.csv`
do
awk -F, '\
{ \
if(NR==FNR && FILENAME ~ /WEEK/) {a[$3]=substr($12,3,4) FS $3 FS $4; next} ;\
if(NR==FNR && FILENAME ~ /MONTH/) {a[$5]=substr($12,3,4) FS $5 FS $6; next} ;\
if(NR==FNR && FILENAME ~ /HALF/) {a[$9]=substr($12,3,4) FS $9 FS $10; next} ;\
{print a[$1] FS $2 FS $3 FS $4 FS $5 FS $6} \
}' calendar_file $exp_file >> my_report.csv
done
How can I achieve this? Thanks for your help in advance!
Firt way(GNU awk, if you don't have GNU awk please leave comment):
awk -F, 'NR==FNR{y=substr($12,3,4); a[$3]=y FS $3 FS $4; b[$5]=y FS $5 FS $6; c[$9]=y FS $9 FS $10; next} FNR==1{printf nl;nl="\n"} match(FILENAME, /myfilename_([A-Z]*)/, f){NF=6;switch(f[1]){case "WEEK": $1=a[$1];break; case "MONTH": $1=b[$1];break; case "HALF": $1=c[$1];}}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
Multiple lines for readability:
awk -F, '
NR==FNR{
y=substr($12,3,4);
a[$3]=y FS $3 FS $4;
b[$5]=y FS $5 FS $6;
c[$9]=y FS $9 FS $10;
next
}
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
match(FILENAME, /myfilename_([A-Z]*)/, f){
NF=6; ## To limit results for 6 columns only, can remove it here.
switch(f[1]){
case "WEEK":
$1=a[$1];
break;
case "MONTH":
$1=b[$1];
break;
case "HALF":
$1=c[$1];
}
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
An update to it:
awk -F, '
NR==FNR{
y=substr($12,3,4);
a[$3]=y FS $3 FS $4;
b[$5]=y FS $5 FS $6;
c[$9]=y FS $9 FS $10;
next
}
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
match(FILENAME, /myfilename_([A-Z]*)/, f){
NF=6; ## To limit results for 6 columns only, can remove in your case.
$1 = f[1]=="WEEK" ? a[$1] : ( f[1]=="MONTH" ? b[$1] : (f[1]=="HALF" ? c[$1] : $1) )
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
Second Way, more concise and without using switch (also GNU awk):
awk -F, '
NR==FNR{
y=substr($12,3,4);
a[$3 "WEEK"]=y FS $3 FS $4;
a[$5 "MONTH"]=y FS $5 FS $6;
a[$9 "HALF"]=y FS $9 FS $10;
next
}
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
match(FILENAME, /myfilename_([A-Z]*)/, f){
$1=a[$1 f[1]];
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
Third way: If your data are all corresponding to their filenames, like you showed in your samples, there's a third way which removes the need of match, so it can work on other awks:
awk -F, '
NR==FNR{
y=substr($12,3,4);
a[$3 "w"]=y FS $3 FS $4;
a[$5 "m"]=y FS $5 FS $6;
a[$9 "h"]=y FS $9 FS $10;
next
}
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
$1~/^([wmh])[0-9]{2}_[0-9]{4}/{ ## Check first fields to make sure it matches, the checking is optional if your data is all like you showed.
$1=a[$1 substr($1,1,1)]
print
}' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
Given a second thought, based on your data-filename relations, there's actually no need to check the first letter (nor the filename):
awk -F, '
NR==FNR{
y=substr($12,3,4);
a[$3]=y FS $3 FS $4;
a[$5]=y FS $5 FS $6;
a[$9]=y FS $9 FS $10;
next
}
FNR==1{printf nl;nl=ORS} ## The newlines between sectors, if you do not need those newlines then remove this line.
{ ## Add $1~/^([wmh])[0-9]{2}_[0-9]{4}/ to the beginning of this line if you want to check and make sure first column.
$1=a[$1]
}1' OFS=, calendar_file myfilename_{WEEK,MONTH,HALF}.csv
Here is another awk solution which is portable, efficient and doesn't rely on input filenames but the order they are given on command line.
awk -F ',' -v OFS=',' '
NR==FNR {
y=substr($12,3,4)
a[ARGV[2],$3]=y OFS $3 OFS $4 # week
a[ARGV[3],$5]=y OFS $5 OFS $6 # month
a[ARGV[4],$9]=y OFS $9 OFS $10 # half
next
}
{
$1=a[FILENAME,$1]
} 1' calendar.csv week.csv month.csv half.csv
Note that if your calendar file is sorted, there is no need to parse fiscal year field again and again for each line. Something like this would be way more efficient in that case:
if(p!=$12) y=substr(p=$12,3,4)
I have the following data in one line.
2014-12-30 00:00:02,317 pool-14076-thread-3 DEBUG [com.fundamo.connector.airtime.service.AirtimeService] ERS Response XML - <soap:Envelope><soap:Body><TopUpPhoneAccountResult><MessageID>1913351092</MessageID><MessageRefID>BD9123000000003</MessageRefID><TopUpPhoneAccountStatus><StatusID>200</StatusID><Comment>Transaction Successful</Comment></TopUpPhoneAccountStatus><TopUpPhoneAccountAmountSent><Amount>2000</Amount><AmountExcludingTax>2000</AmountExcludingTax><TaxName/><TaxAmount>0</TaxAmount><PhoneNumber>1766910910</PhoneNumber><ResponseDateTime>20141230000002320</ResponseDateTime><ServiceType>PRETOP</ServiceType><CurrencyCode>TK</CurrencyCode></TopUpPhoneAccountAmountSent></TopUpPhoneAccountResult></soap:Body></soap:Envelope>
Now I want to take a few values from them. I used this command:
cat ERS_RESPONSE_30Dec_atp11.txt |awk -F'<' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}'
Output:
2014-12-30 00:00:02,317 pool-14076-thread-3 DEBUG [com.fundamo.connector.airtime.service.AirtimeService] ERS Response XML - ,MessageID>1913351092,MessageRefID>BD9123000000003,StatusID>200,Comment>Transaction Successful,Amount>2000,PhoneNumber>1766910910
However, I only want the fields shown below.
2014-12-30 00:00:02,317 ,1913351092,BD9123000000003,200,Transaction Successful,2000,1766910910
What should I do?
Here is how to do it with awk
awk -F"[ <>]" '{print $1" "$2,$18,$22,$28,$32" "$33,$41,$55}' OFS=, ERS_RESPONSE_30Dec_atp11.txt
2014-12-30 00:00:02,317,1913351092,BD9123000000003,200,Transaction Successful,2000,1766910910
And here are some tip.
Try to find what separate every fields, her it would be , < and >
Then find all fields, by run it like this: awk -F"[ <>]" '{for (i=1;i<=NF;i++) print i"="$i}' file
Then its just to put it all together.
You can try sed as follows (this is a little long) followed by your file
sed 's#\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\},\S*\).*<MessageID>\([[:digit:]]\{1,\}\)<.*<MessageRefID>\([[:alpha:]]\{1,\}[[:digit:]]\{1,\}\).*<StatusID>\([[:digit:]]\{1,\}\).*\(Transaction Successful\).*<Amount>\([[:digit:]]\{1,\}\).*<PhoneNumber>\([[:digit:]]\{1,\}\).*#\1 ,\2,\3,\4,\5,\6,\7#g'
replace sed with sed -i.bak to make a back up of the original file and make actual changes,once this works (command line tested on my side)
You need to use nawk on Solaris instead of awk. In Solaris' version of awk, the -F parameter can only take a single character, while in nawk, it can take a regular expression.
You need to specify the whole pattern of <.....> as a separator instead of just <:
This works on the Mac:
$ awk -F'<[^<]+>' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}' ERS_RESPONSE_30Dec_atp11.txt
Try the following on Solaris
$ nawk -F'<[^<]+>' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}' ERS_RESPONSE_30Dec_atp11.txt
If that doesn't work...
$ nawk -F'<[^<][^<]*>' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}' ERS_RESPONSE_30Dec_atp11.txt