Calculate time based metrics(hourly)

Calculate time based metrics(hourly) - bash

How would I calculate time-based metrics (hourly average) based on log file data?
let me make this more clear, consider a log file that contains entries as follows: Each UIDs appears only twice in the log. they will be in embeded xml format. And they will likely appear OUT of sequence. And the log file will have data for only one day so only one day records will be there.
No of UIDs are 2 millions in log file.
I have to find out average hourly reponse time for these requests. Below has request and response in log file. UID is the key to relate b/w request and response.
2013-04-03 08:54:19,451 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.448-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;FedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Beginning of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>
&lt;ratedDocument&gt;
&lt;objectType&gt;OLB_BBrecords&lt;/objectType&gt;
&lt;provider&gt;JET&lt;/provider&gt;
&lt;metadata&gt;&amp;lt;BooleanQuery&amp;gt;&amp;lt;Clause occurs=&amp;quot;must&amp;quot;&amp;gt;&amp;lt;TermQuery fieldName=&amp;quot;RegistrationNumber&amp;quot;&amp;gt;44565153050735751&amp;lt;/TermQuery&amp;gt;&amp;lt;/Clause&amp;gt;&amp;lt;/BooleanQuery&amp;gt;&lt;/metadata&gt;
&lt;/ratedDocument&gt;
</payload></log-message-body></body></log-event>
2013-04-03 08:54:19,989 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.987-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;fedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Successful Completion of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>0</payload></log-message-body></body></log-event>
here is the bash script I wrote.
uids=cat $i|grep "Service" |awk 'BEGIN {FS="lt;";RS ="gt;"} {print $2;}'| sort -u
for uid in ${uids}; do
count=`grep "$uid" test.log|wc -l`
if [ "${count}" -ne "0" ]; then
unique_uids[counter]="$uid"
let counter=counter+1
fi
done
echo ${unique_uids[#]}
echo $counter
echo " Unique No:" ${#unique_uids[#]}
echo uid StartTime EndTime" > $log
for unique_uids in ${unique_uids[#]} ; do
responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
echo $unique_uids $responseTime >> $log
done
And the output should be like this
Operation comes from id, Consumer comes from documentmetadata and hour is the time 08:54:XX
So if we have more than one request and response then need to average of the response times for requests came at that hour.
Operation Consumer HOUR Avg-response-time(ms)
DKP_DumpDocumentProperties MRP 08 538

Given your posted input file:
$ cat file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
This GNU awk script (you are using GNU awk since you set RS to a multi-character string in the script you posted in your question)
$ cat tst.awk
{
date = $1
time = $2
guid = gensub(/.*;gt;([^&]+).*/,"\\1","")
print guid, date, time
}
will pull out what I THINK is the information you care about:
$ gawk -f tst.awk file
904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989
904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389
edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979
edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569
The rest is simple math, right? And do it in this awk script - don't go piping the awk output to some goofy shell loop!

Extending Ed Morton's solution:
Content of script.awk
function parse_time (date, time, newtime) {
gsub(/-/, " ", date)
gsub(/:/, " ", time)
gsub(/,.*/, "", time)
newtime = date" "time
return newtime
}
(gensub(/.*;gt;([^&]+).*/,"\\1","") in starttime) {
etime = parse_time($1, $2)
endtime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = etime
next
}
{
stime = parse_time($1, $2)
starttime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = stime
}
END {
for (x in starttime) {
for (y in endtime) {
if (x==y) {
diff = mktime(endtime[x]) - mktime(starttime[y])
diff = sprintf("%dh:%dm:%ds",diff/(60*60),diff%(60*60)/60,diff%60)
print x, diff
delete starttime[x]
delete endtime[y]
}
}
}
}
Test: Modified the order of guid for testing
$ cat log.file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
$ awk -f script.awk log.file
904c-be-4e-bbda-3e62 0h:0m:20s
edfc-fr-5e-bced-3443 0h:0m:45s

Related

Building a 'table of contents'

Consider the following 'sample.xml'
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<level>
<name>testA</name>
<level>
<name>testB</name>
</level>
<level>
<name>testC</name>
<level>
<name>testD</name>
<level>
<name>testE</name>
</level>
</level>
</level>
</level>
</root>
Using xmlstarlet i can do:
xml sel -t -m //level -v name -o " " -v "count(ancestor::*)-1" -o "." -v "count(preceding-sibling::*)" -n sample.xml
This produces:
testA 0.0
testB 1.1
testC 1.2
testD 2.1
testE 3.1
What should i do to get:
testA 0.0
testB 1.1
testC 1.2
testD 1.2.1
testE 1.2.1.1
In this example i only have 4 levels, but this can be more than 4.
I am thinking of some kind of recursion, are there any links available which can explain how to do that?

You should be able to do this using XSLT with the "tr" command in xmlstarlet...
However your desired output is a little confusing. If "testA" is the first level and you start at zero, why don't all the other entries start at zero? Or maybe "root" is supposed to be zero?
Anyway, here's an example that starts at 1 instead of zero that should get you started...
XML Input (input.xml)
<root>
<level>
<name>testA</name>
<level>
<name>testB</name>
</level>
<level>
<name>testC</name>
<level>
<name>testD</name>
<level>
<name>testE</name>
</level>
</level>
</level>
</level>
</root>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="level">
<xsl:value-of select="concat(name, ' ')"/>
<xsl:for-each select="ancestor-or-self::level">
<xsl:if test="not(position()=1)">.</xsl:if>
<xsl:number/>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="level"/>
</xsl:template>
</xsl:stylesheet>
Command Line
xmlstarlet tr test.xsl input.xml
Output
testA 1
testB 1.1
testC 1.2
testD 1.2.1
testE 1.2.1.1

This problem can be solved without recursion, by iterating over
elements on the ancestor-or-self axis.
The following xmlstarlet command processes all level elements using
the inner -m (xsl:for-each) to handle each path from root to target
(as suggested in comments the shell variable base defaults to 0 but
can be set to 1).
xmlstarlet select -T -t \
-m '//level' \
-v 'concat(name," ")' \
-m 'ancestor-or-self::level' \
--if 'position() = 1' \
-v "'${base:-0}'" \
--else \
-o '.' \
-v 'count(preceding-sibling::level) + 1' \
-b \
-b \
-n \
file.xml
Output:
testA 0
testB 0.1
testC 0.2
testD 0.2.1
testE 0.2.1.1
For a more compact inner -m -- producing the same output -- instead
-m 'ancestor-or-self::level' \
--if 'position() != 1' -o '.' -b \
-v 'count(preceding-sibling::level) + number(position() != 1)' \
-b \
where the count is incremented by 1 for all except the root level
where position() is 1.
As a variation on the same theme: select elements with the shell
variable target and print their paths as XPath expressions using the
XSLT current() function to reference the
element being processed by the inner -m:
target='//level[name="testB" or name="testE"]'
xmlstarlet select -T -t \
-m "${target}" \
-m 'ancestor-or-self::*' \
--var pos='1 + count(preceding-sibling::*[name() = name(current())])' \
-v 'concat("/",name(),"[",$pos,"]")' \
-b \
-n \
file.xml
Output:
/root[1]/level[1]/level[1]
/root[1]/level[1]/level[2]/level[1]/level[1]

shell - How to match content between xml tags?

I have this file:
<?xml version="1.0" encoding="utf-8"?>
<response>
<Count>1</Count>
<Messages>
<Message>
<Smstat>0</Smstat>
<Index>40001</Index>
<Phone>234</Phone>
<Content>Poin Bonstri kamu: 358
Sisa Kuota kamu :
Kuota WA.Line 18 MB s.d 06/08/2019 19:33:46
Kuota Reguler 1478 MB s.d 02/08/2019 05:36:44
Temukan beragam paket lain di bima+ https://goo.gl/RQ1DBA</Content>
<Date>2019-08-01 13:28:04</Date>
<Sca></Sca>
<SaveType>4</SaveType>
<Priority>0</Priority>
<SmsType>2</SmsType>
</Message>
</Messages>
</response>
I want to match the text between <Content> and </Content>. I've tried:
tr '\n' ' ' < input_file | grep -E "^<Content>.*</Content>$"
But it doesn't work. Please note that I use ash shell instead of bash. How do I do this ?

If you have PCRE capable grep you could use positive lookahead and -behind:
$ tr '\n' ' ' < file | grep -Po "(?<=<Content>).*(?=</Content>)"
Output:
Poin Bonstri kamu: 358 Sisa Kuota kamu : Kuota WA.Line 18 MB s.d 06/08/2019 19:33:46 Kuota Reguler 1478 MB s.d 02/08/2019 05:36:44 Temukan beragam paket lain di bima+ https://goo.gl/RQ1DBA

Changing the date string format

Hello Sed/Bash/Awk experts,
I have a file full of dates in the following format:
Feb 5 2015
Nov 25 2014
Apr 16 2015
What I would like is to convert them to this format:
YYYY-MM-DD
So they should look like this:
2015-02-05
2014-11-25
2015-04-16
Thanks for your help.

You can simply use:
date -f dates.txt +%Y-%m-%d
In the -f option you can provide your input file with one date per line.

Using awk
awk 'BEGIN{x=" JANFEBMARAPRMAYJUNJULAUGSEPOCTNOVDEC"}
{printf "%04d-%02d-%02d\n",$3,index(x,toupper($1))/3,$2}' file

the date command is your friend here:
date --date="Feb 5 2015" +"%Y-%m-%d"
2015-02-05
so, you can say:
$ cat my_file | while read -r dt
> do
> date --date="${dt}" +"%Y-%m-%d"
> done
2015-02-05
2014-11-25
2015-04-16

paste the following:
{
month="00";
mon=toupper($1);
if(mon=="JAN") month="01";
else if(mon=="FEB") month="02";
else if(mon=="MAR") month="03";
else if(mon=="APR") month="04";
else if(mon=="MAY") month="05";
else if(mon=="JUN") month="06";
else if(mon=="JUL") month="07";
else if(mon=="AUG") month="08";
else if(mon=="SEP") month="09";
else if(mon=="OCT") month="10";
else if(mon=="NOV") month="11";
else if(mon=="DEC") month="12";
printf("%s-%s-%02d\n", $3, month, $2)
}
into a file (We'll refer to the filename as [script_filename]
execute the following:
awk -F' ' -E [script_filename] [date_filename]
Where [date_filename] refers to the file which contains the dates you wish to convert.

Bash 'Process substitution' - what is going on here?

I am using process substitution to create a shorthand inline XSL function that I have written...
function _quickxsl() {
if [[ $1 == "head" ]] ; then
cat <<'HEAD'
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE xsl:stylesheet [
<!ENTITY apos "'">
]>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
xmlns:func="http://exslt.org/functions"
xmlns:kcc="http://www.adp.com/kcc"
extension-element-prefixes="func kcc">
HEAD
else
cat <<'FOOT'
</xsl:stylesheet>
FOOT
fi
}
function quickxsl() {
{
_quickxsl head && cat && _quickxsl foot
} | xsltproc - "$#"
}
It seems to work fine if I provide real files as arguments to xsltproc. In the case where I call it with a process substitution on the other hand:
$ quickxsl <(cat xml/kconf.xml) <<QUICKXSL
QUICKXSL
warning: failed to load external entity "/dev/fd/63"
unable to parse /dev/fd/63
Now, I understand that the pipe path is being provided to a sub process connected via another pipe (xsltproc). So I rewrote it slightly:
function quickxsl() {
xsltproc - "$#" < <( _quickxsl head && cat && _quickxsl foot )
}
It seemed to resolve things a little
/dev/fd/63:1: parser error : Document is empty
^
/dev/fd/63:1: parser error : Start tag expected, '<' not found
^
unable to parse /dev/fd/63
Any idea why the pipe cannot be inherited?
Update:
If I simplify the quickxsl function again:
function quickxsl() {
xsltproc <( _quickxsl head && cat && _quickxsl foot ) "$#"
}
I get the same issue, but it's easy to identify which fifo is causing the issue with a bit of xtrace...
$ quickxsl <(cat xml/kconf.xml) <<QUICKXSL
QUICKXSL
+ quickxsl /dev/fd/63
++ cat xml/kconf.xml
+ xsltproc /dev/fd/62 /dev/fd/63
++ _quickxsl head
++ [[ head == \h\e\a\d ]]
++ cat
++ cat -
++ _quickxsl foot
++ [[ foot == \h\e\a\d ]]
++ cat
/dev/fd/62:1: parser error : Document is empty
^
/dev/fd/62:1: parser error : Start tag expected, '<' not found
^
cannot parse /dev/fd/62
The purpose of this exercise is to have the 'process substitution' pipe, connected to a function that returns XML on it's standard output, which it does and works correctly. If I write the contents to a file and pass that to the function, all is well. If I use process substitution, the child process can't read from the pipe and the pipe appears closed or inaccessible. Example:
quickxsl <(my_soap_service "query") <<XSL
<xsl:template match="/">
<xsl:value-of select="/some/path/text()"/>
</xsl:template>
XSL
As you can see, it provides some shortcuts.
Update:
A good point was that pipes can't be continuously opened or closed. Strace output for xsltproc reveals it only opens the file once.
$ grep /dev/fd !$
grep /dev/fd /tmp/xsltproc.strace
execve("/usr/bin/xsltproc", ["xsltproc", "/dev/fd/62"], [/* 31 vars */]) = 0
stat("/dev/fd/62", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat("/dev/fd/62", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
stat("/dev/fd/62", {st_mode=S_IFIFO|0600, st_size=0, ...}) = 0
open("/dev/fd/62", O_RDONLY) = 3
write(2, "/dev/fd/62:1: ", 14) = 14
write(2, "/dev/fd/62:1: ", 14) = 14
write(2, "cannot parse /dev/fd/62\n", 24) = 24
Blimey, I overlooked seeking:
read(3, "<?xml version=\"1.0\" encoding=\"UT"..., 16384) = 390
read(3, "", 12288) = 0
lseek(3, 0, SEEK_CUR) = -1 ESPIPE (Illegal seek)
lseek(3, 18446744073709547520, SEEK_SET) = -1 ESPIPE (Illegal seek)
read(3, "", 4096) = 0

Seems that I found a bug in xsltproc. It doesn't recognise FIFO file types and tries to seek on the FIFO file descriptor after reading in the document. I have raised a bug. A work-around is to parse the xsltproc arguments for FIFO file types and convert them into temporary files that can be seeked by xsltproc.
https://bugzilla.gnome.org/show_bug.cgi?id=730545

How to filter UID which has request and response in log file [duplicate]

How would I calculate time-based metrics (hourly average) based on log file data?
let me make this more clear, consider a log file that contains entries as follows: Each UIDs appears only twice in the log. they will be in embeded xml format. And they will likely appear OUT of sequence. And the log file will have data for only one day so only one day records will be there.
No of UIDs are 2 millions in log file.
I have to find out average hourly reponse time for these requests. Below has request and response in log file. UID is the key to relate b/w request and response.
2013-04-03 08:54:19,451 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.448-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;FedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Beginning of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>
&lt;ratedDocument&gt;
&lt;objectType&gt;OLB_BBrecords&lt;/objectType&gt;
&lt;provider&gt;JET&lt;/provider&gt;
&lt;metadata&gt;&amp;lt;BooleanQuery&amp;gt;&amp;lt;Clause occurs=&amp;quot;must&amp;quot;&amp;gt;&amp;lt;TermQuery fieldName=&amp;quot;RegistrationNumber&amp;quot;&amp;gt;44565153050735751&amp;lt;/TermQuery&amp;gt;&amp;lt;/Clause&amp;gt;&amp;lt;/BooleanQuery&amp;gt;&lt;/metadata&gt;
&lt;/ratedDocument&gt;
</payload></log-message-body></body></log-event>
2013-04-03 08:54:19,989 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.987-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;fedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Successful Completion of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>0</payload></log-message-body></body></log-event>
here is the bash script I wrote.
uids=cat $i|grep "Service" |awk 'BEGIN {FS="lt;";RS ="gt;"} {print $2;}'| sort -u
for uid in ${uids}; do
count=`grep "$uid" test.log|wc -l`
if [ "${count}" -ne "0" ]; then
unique_uids[counter]="$uid"
let counter=counter+1
fi
done
echo ${unique_uids[#]}
echo $counter
echo " Unique No:" ${#unique_uids[#]}
echo uid StartTime EndTime" > $log
for unique_uids in ${unique_uids[#]} ; do
responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
echo $unique_uids $responseTime >> $log
done
And the output should be like this
Operation comes from id, Consumer comes from documentmetadata and hour is the time 08:54:XX
So if we have more than one request and response then need to average of the response times for requests came at that hour.
Operation Consumer HOUR Avg-response-time(ms)
DKP_DumpDocumentProperties MRP 08 538

Given your posted input file:
$ cat file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
This GNU awk script (you are using GNU awk since you set RS to a multi-character string in the script you posted in your question)
$ cat tst.awk
{
date = $1
time = $2
guid = gensub(/.*;gt;([^&]+).*/,"\\1","")
print guid, date, time
}
will pull out what I THINK is the information you care about:
$ gawk -f tst.awk file
904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989
904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389
edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979
edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569
The rest is simple math, right? And do it in this awk script - don't go piping the awk output to some goofy shell loop!

Extending Ed Morton's solution:
Content of script.awk
function parse_time (date, time, newtime) {
gsub(/-/, " ", date)
gsub(/:/, " ", time)
gsub(/,.*/, "", time)
newtime = date" "time
return newtime
}
(gensub(/.*;gt;([^&]+).*/,"\\1","") in starttime) {
etime = parse_time($1, $2)
endtime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = etime
next
}
{
stime = parse_time($1, $2)
starttime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = stime
}
END {
for (x in starttime) {
for (y in endtime) {
if (x==y) {
diff = mktime(endtime[x]) - mktime(starttime[y])
diff = sprintf("%dh:%dm:%ds",diff/(60*60),diff%(60*60)/60,diff%60)
print x, diff
delete starttime[x]
delete endtime[y]
}
}
}
}
Test: Modified the order of guid for testing
$ cat log.file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
$ awk -f script.awk log.file
904c-be-4e-bbda-3e62 0h:0m:20s
edfc-fr-5e-bced-3443 0h:0m:45s

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

Calculate time based metrics(hourly) - bash

Related

Building a 'table of contents'

shell - How to match content between xml tags?

Changing the date string format

Bash 'Process substitution' - what is going on here?

How to filter UID which has request and response in log file [duplicate]

Categories

Resources