Need to separate row using sed or awk command in Solaris - bash

I have the following data in one line.
2014-12-30 00:00:02,317 pool-14076-thread-3 DEBUG [com.fundamo.connector.airtime.service.AirtimeService] ERS Response XML - <soap:Envelope><soap:Body><TopUpPhoneAccountResult><MessageID>1913351092</MessageID><MessageRefID>BD9123000000003</MessageRefID><TopUpPhoneAccountStatus><StatusID>200</StatusID><Comment>Transaction Successful</Comment></TopUpPhoneAccountStatus><TopUpPhoneAccountAmountSent><Amount>2000</Amount><AmountExcludingTax>2000</AmountExcludingTax><TaxName/><TaxAmount>0</TaxAmount><PhoneNumber>1766910910</PhoneNumber><ResponseDateTime>20141230000002320</ResponseDateTime><ServiceType>PRETOP</ServiceType><CurrencyCode>TK</CurrencyCode></TopUpPhoneAccountAmountSent></TopUpPhoneAccountResult></soap:Body></soap:Envelope>
Now I want to take a few values from them. I used this command:
cat ERS_RESPONSE_30Dec_atp11.txt |awk -F'<' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}'
Output:
2014-12-30 00:00:02,317 pool-14076-thread-3 DEBUG [com.fundamo.connector.airtime.service.AirtimeService] ERS Response XML - ,MessageID>1913351092,MessageRefID>BD9123000000003,StatusID>200,Comment>Transaction Successful,Amount>2000,PhoneNumber>1766910910
However, I only want the fields shown below.
2014-12-30 00:00:02,317 ,1913351092,BD9123000000003,200,Transaction Successful,2000,1766910910
What should I do?

Here is how to do it with awk
awk -F"[ <>]" '{print $1" "$2,$18,$22,$28,$32" "$33,$41,$55}' OFS=, ERS_RESPONSE_30Dec_atp11.txt
2014-12-30 00:00:02,317,1913351092,BD9123000000003,200,Transaction Successful,2000,1766910910
And here are some tip.
Try to find what separate every fields, her it would be , < and >
Then find all fields, by run it like this: awk -F"[ <>]" '{for (i=1;i<=NF;i++) print i"="$i}' file
Then its just to put it all together.

You can try sed as follows (this is a little long) followed by your file
sed 's#\([0-9]\{4\}-[0-9]\{2\}-[0-9]\{2\} [0-9]\{2\}:[0-9]\{2\}:[0-9]\{2\},\S*\).*<MessageID>\([[:digit:]]\{1,\}\)<.*<MessageRefID>\([[:alpha:]]\{1,\}[[:digit:]]\{1,\}\).*<StatusID>\([[:digit:]]\{1,\}\).*\(Transaction Successful\).*<Amount>\([[:digit:]]\{1,\}\).*<PhoneNumber>\([[:digit:]]\{1,\}\).*#\1 ,\2,\3,\4,\5,\6,\7#g'
replace sed with sed -i.bak to make a back up of the original file and make actual changes,once this works (command line tested on my side)

You need to use nawk on Solaris instead of awk. In Solaris' version of awk, the -F parameter can only take a single character, while in nawk, it can take a regular expression.
You need to specify the whole pattern of <.....> as a separator instead of just <:
This works on the Mac:
$ awk -F'<[^<]+>' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}' ERS_RESPONSE_30Dec_atp11.txt
Try the following on Solaris
$ nawk -F'<[^<]+>' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}' ERS_RESPONSE_30Dec_atp11.txt
If that doesn't work...
$ nawk -F'<[^<][^<]*>' '{print $1 "," $5 "," $7 "," $10 ","$12"," $16 "," $23}' ERS_RESPONSE_30Dec_atp11.txt

Related

Gawk Line removal, Splitter is :

Is it possible to move certain columns from one .txt file into another .txt file?
I have a .txt that contains:
USERID:ORDER#:IP:PHONE:ADDRESS:POSTCODE
USERID:ORDER#:IP:PHONE:ADDRESS:POSTCODE
With gawk I want to extract ADDRESS & POSTCODE columns into another .txt, so for this given file the output should be:
ADDRESS1:POSTCODE1
ADDRESS2:POSTCODE2
etc.
This is a classic AWK transform. You want to use "-F :" to specify that the input is delimited by ":" and print a new ":" on output:
awk -F: '{ print $5 ":" $6 }' <input.txt >output.txt
Try that:
awk -F: '{printf "%s:%s ",$5,$6}' ex.txt
input is
USERID:ORDER#:IP:PHONE:ADDRESS1:POSTCODE1
USERID:ORDER#:IP:PHONE:ADDRESS2:POSTCODE2
output is (on one line if I understand correctly)
ADDRESS1:POSTCODE1 ADDRESS2:POSTCODE2
only default is that it ends with a trailing space and does not end with a newline.
Which can be fixed with the slightly more complex (but still readable):
awk -F: 'BEGIN {z=0;} {if (z==1) { printf " "; } ; z=1; printf "%s:%s",$5,$6} END{printf"\n"}' ex.txt
awk -F: 'NR==1 {print $5"1:"$6"1"};NR==2 {print $5"2:"$6"2"}' file
ADDRESS1:POSTCODE1
ADDRESS2:POSTCODE2

Printing date within awk

I'm trying to print the date inside of an awk command. I cannot find a way around the fact that the arguments for gawk are put inside single quotes, which negate the execution that I need for date:
gawk '/.*(ge|ga).*/ { print $1 "," $2 "," date } ' >> file.csv
gawk '/.*(ge|ga).*/ { print $1 "," $2 "," echo date } ' >> file.csv
gawk '/.*(ge|ga).*/ { print $1 "," $2 "," `date` } ' >> file.csv
What is a way around this inside the gawk command ? Thanks.
It's not 100% clear what you're trying to do here (some input and desired output would be useful) but I think this is what you want:
gawk -v date="$(date)" -v OFS=, '/g[ea]/ { print $1, $2, date }'
This sets an awk variable date based on the output of the date command and prints it after the first and second field. I've set the output field separator OFS to make your print command neater.
Alternatively (and probably preferred) is to use the strftime function available in GNU awk:
gawk -v OFS=, '/g[ea]/ { print $1, $2, strftime() }'
The format of the output is slightly different but can be adjusted by passing a format string to the function. See the GNU awk documentation for more details on that.
I have also simplified your regular expression, based on the suggestions made in the comments (thanks).

Awk multiple pipes to different parts

I have some data where each line looks more or less like this:
11/11/2013 12:10:10,5000,3000,2000
So with normal awk I would get:
$1 = 11/11/2013
$2 = 12:10:10,5000,3000,2000
Now I want to pipe these two element to different new awk functions, because $1 needs to be split based upon a forward slash and $2 needs to be split based upon a comma..
However, with
awk '{print $1}' $INPUT | awk -F/ '{print $3 "-" $2 "-" $1}'> $OUTPUT
I just get access to the date and then Iam already "through the file". How to pipe multiple times?
You can use awk with multiple field separators also. Consider below code:
> s='11/11/2013 12:10:10,5000,3000,2000'
> awk -F '[,/: ]+' '{for (i=1; i<=NF; i++) print i":"$i}' <<< "$s"
1:11
2:11
3:2013
4:12
5:10
6:10
7:5000
8:3000
9:2000
You can just call the split function. No need for a whole new awk program:
s='11/11/2013 12:10:10,5000,3000,2000'
awk '{split($1,d,"/"); split($2,t,","); print d[3]"-"d[2]"-"d[1]}' <<<"$s"
you can do whatever you need to with the components of t the same way... t[1] for the first, etc.

awk match and merge two files on basis of key values

I have two files in which $3,$4 = $3,$2.
file1:
1211,A2,ittp,1,IPSG,a2,PA,3000,3000
1311,A4,iztv,1,IPSG,a4,PA,240,250
1411,B4,iztq,0,IPSG,b4,PA,230,250
file2:
TP,0,nttp,0.865556,0.866667
TP,1,ittp,50.7956,50.65
TP,1,iztv,5.42444,13.8467
TP,0,iztq,645.194,490.609
I want to merge these files and print a new file like if file1 $3,$4 = file2 $3,$2 then print merged file like
TP,1211,A2,ittp,1,IPSG,a2,PA,3000,3000,0.865556,0.866667
TP,1311,A4,iztv,1,IPSG,a4,PA,240,250,50.7956,50.65
TP,1411,B4,iztq,0,IPSG,b4,PA,230,250,5.42444,13.8467
BOTH THE FILES ARE CSV FILES.
I tried using awk but I'm not getting the desired output. It's printing only file1.
$ awk -F, 'NR==FNR{a[$3,$4]=$3$2;next}{print $1, $2, $3, $4, $5, $6, $7, $8, $9, $10 a[$1] }' OFS=, 1.csv 2.csv
awk -F, 'BEGIN {OFS=",";}
NR == FNR {a[$3,$4] = $0;}
NR != FNR && a[$3,$2] {print $1, a[$3,$2], $4, $5;}' 1.csv 2.csv
One way with awk:
awk 'NR==FNR{a[$4,$3]=$0;next}($2,$3) in a{print $1,a[$2,$3],$4,$5}' FS=, OFS=, f1 f2
TP,1211,A2,ittp,1,IPSG,a2,PA,3000,3000,50.7956,50.65
TP,1311,A4,iztv,1,IPSG,a4,PA,240,250,5.42444,13.8467
TP,1411,B4,iztq,0,IPSG,b4,PA,230,250,645.194,490.609
Using Join
If i1 and i2 are the input files
cat i1.txt | awk -F',' '{print $3 "-" $4 "," $1 "," $2 "," $5 "," $6 "," $7 "," $8 "," $9}' | sort > s1.txt
cat i2.txt | awk -F',' '{print $3 "-" $2 "," $1 "," $4 "," $5 }' | sort > s2.txt
join -t',' s1.txt s2.txt | tr '-' ',' > t12.txt
cat t12.txt | awk -F ',' '{print $10 "," $3 "," $4 "," $1 "," $2 "," $5 "," $6 "," $7 "," $8 "," $9 "," $11 "," $12 }'

awk changes the text unexpectedly

I am using the following awk statement in my shell script.
#!/bin/sh
# read file line by line
file="/pdump/country.000000.txt"
while read line
do
mycol=`echo $line | awk -F"," '{print $2}'`
mycol_new=`echo $mycol | tr "[:lower:]" [:upper:]`
echo $line | awk -v var="$mycol_new" -F"," '{print $1 "," var "," $3 "," $4 "," $5 "," $6 "," $7 "," $8}'
done < $file
It is working as expected.
The only problem is that if the original text is \N (slash N) in any other column for e.g. $4 or $7 then it changes to N (without slash).
How do I preserve the original values while replacing only the second column.
You need to use the -r option for read in your while loop:
while read -r line
That preserves backslashes in the input. That option should almost always be used. Make it a habit.
awk strips out the backslash if it's not one of the recognized escape sequences. So if it was \n, awk would have recognized it as newline but \N is simply interpreted as N. More details here
If I read your code correctly, you are trying:
Read input from a comma-separated-values (CSV) file
Change the second field to uppercase
Print the result.
If that is the case, use AWK directly. Save the following to toupper_second_field.awk:
BEGIN { FS = ","; OFS="," }
{ $2 = toupper($2); print }
The first line sets the field separators for both input (FS) and output (OFS) to comma. The second converts field #2 to upper case, then print. To invoke it:
awk -f toupper_second_field.awk /pdump/country.000000.txt
The logic is much simpler and you don't have to worry about backslashes.

Resources