XSLT sort output csv data - sorting

My task is to transform xml data to csv (comma separator data).
I have problem with sorting in the output data.
Please look at my examples below.
Please provide any suggestions how to resolve this issue.
Thanks in advance!
INPUT XML DATA
<?xml version="1.0" encoding="UTF-8"?>
<Root>
<ItemInfo>
<ItemNmb>Item1</ItemNmb>
<ItemText>Item 111</ItemText>
<ItemDetails>
<ItemDetailInfo>
<id>111</id>
<Text>Text 111</Text>
</ItemDetailInfo>
<ItemDetailInfo>
<id>555</id>
<Text>Text 555</Text>
</ItemDetailInfo>
</ItemDetails>
</ItemInfo>
<ItemInfo>
<ItemNmb>Item2</ItemNmb>
<ItemText>Item 222</ItemText>
<ItemDetails>
<ItemDetailInfo>
<id>555</id>
<Text>Text 555</Text>
</ItemDetailInfo>
<ItemDetailInfo>
<id>333</id>
<Text>Text 333</Text>
</ItemDetailInfo>
<ItemDetailInfo>
<id>222</id>
<Text>Text 222</Text>
</ItemDetailInfo>
</ItemDetails>
</ItemInfo>
<ItemInfo>
<ItemNmb>Item3</ItemNmb>
<ItemText>Item 333</ItemText>
<ItemDetails>
<ItemDetailInfo>
<id>999</id>
<Text>Text 999</Text>
</ItemDetailInfo>
</ItemDetails>
</ItemInfo>
</Root>
XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:output method="text" encoding="UTF-8" indent="yes"/>
<xsl:param name="delim" select="';'"/>
<xsl:param name="break" select="'
'"/>
<xsl:template match="/">
<xsl:for-each select="/Root/ItemInfo">
<xsl:call-template name="itemtemp">
<xsl:with-param name="item" select="ItemNmb"/>
<xsl:with-param name="text" select="ItemText"/>
</xsl:call-template>
</xsl:for-each>
</xsl:template>
<xsl:template name="itemtemp">
<xsl:param name="item"/>
<xsl:param name="text"/>
<xsl:for-each select="ItemDetails/ItemDetailInfo">
<xsl:sort select="id" data-type="text" order="ascending"/>
<xsl:call-template name="copmitemtemp">
<xsl:with-param name="item" select="$item"/>
<xsl:with-param name="text" select="$text"/>
<xsl:with-param name="idsub" select="id"/>
<xsl:with-param name="textsub" select="Text"/>
</xsl:call-template>
</xsl:for-each>
</xsl:template>
<xsl:template name="copmitemtemp">
<xsl:param name="item"/>
<xsl:param name="text"/>
<xsl:param name="idsub"/>
<xsl:param name="textsub"/>
<xsl:value-of select="$idsub" disable-output-escaping="yes"/><xsl:value-of select="$delim"/>
<xsl:value-of select="$textsub" disable-output-escaping="yes"/><xsl:value-of select="$delim"/>
<xsl:value-of select="$item" disable-output-escaping="yes"/><xsl:value-of select="$delim"/>
<xsl:value-of select="$text" disable-output-escaping="yes"/><xsl:value-of select="$break"/>
</xsl:template>
</xsl:stylesheet>
OUTPUT DATA
111;Text 111;Item1;Item 111
555;Text 555;Item1;Item 111
222;Text 222;Item2;Item 222
333;Text 333;Item2;Item 222
555;Text 555;Item2;Item 222
999;Text 999;Item3;Item 333
EXPECTED RESULT (Is sorted by (id))
111;Text 111;Item1;Item 111
222;Text 222;Item2;Item 222
333;Text 333;Item2;Item 222
555;Text 555;Item1;Item 111
555;Text 555;Item2;Item 222
999;Text 999;Item3;Item 333

Looks like the sorting should be done by ItemInfo/ItemDetails/ItemDetailInfo/id therefor you need to iterate over ItemDetailInfo.
Try this slightly change version of your xslt.
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance">
<xsl:output method="text" encoding="UTF-8" indent="yes"/>
<xsl:param name="delim" select="';'"/>
<xsl:param name="break" select="'
'"/>
<xsl:template match="/">
<xsl:for-each select="/Root/ItemInfo/ItemDetails/ItemDetailInfo">
<xsl:sort select="id" data-type="text" order="ascending"/>
<xsl:call-template name="copmitemtemp">
<xsl:with-param name="item" select="../../ItemNmb"/>
<xsl:with-param name="text" select="../../ItemText"/>
<xsl:with-param name="idsub" select="id"/>
<xsl:with-param name="textsub" select="Text"/>
</xsl:call-template>
</xsl:for-each>
</xsl:template>
<xsl:template name="copmitemtemp">
<xsl:param name="item"/>
<xsl:param name="text"/>
<xsl:param name="idsub"/>
<xsl:param name="textsub"/>
<xsl:value-of select="$idsub" disable-output-escaping="yes"/>
<xsl:value-of select="$delim"/>
<xsl:value-of select="$textsub" disable-output-escaping="yes"/>
<xsl:value-of select="$delim"/>
<xsl:value-of select="$item" disable-output-escaping="yes"/>
<xsl:value-of select="$delim"/>
<xsl:value-of select="$text" disable-output-escaping="yes"/>
<xsl:value-of select="$break"/>
</xsl:template>
</xsl:stylesheet>
Which will generate the following output
111;Text 111;Item1;Item 111
222;Text 222;Item2;Item 222
333;Text 333;Item2;Item 222
555;Text 555;Item1;Item 111
555;Text 555;Item2;Item 222
999;Text 999;Item3;Item 333

Related

Building a 'table of contents'

Consider the following 'sample.xml'
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<root>
<level>
<name>testA</name>
<level>
<name>testB</name>
</level>
<level>
<name>testC</name>
<level>
<name>testD</name>
<level>
<name>testE</name>
</level>
</level>
</level>
</level>
</root>
Using xmlstarlet i can do:
xml sel -t -m //level -v name -o " " -v "count(ancestor::*)-1" -o "." -v "count(preceding-sibling::*)" -n sample.xml
This produces:
testA 0.0
testB 1.1
testC 1.2
testD 2.1
testE 3.1
What should i do to get:
testA 0.0
testB 1.1
testC 1.2
testD 1.2.1
testE 1.2.1.1
In this example i only have 4 levels, but this can be more than 4.
I am thinking of some kind of recursion, are there any links available which can explain how to do that?
You should be able to do this using XSLT with the "tr" command in xmlstarlet...
However your desired output is a little confusing. If "testA" is the first level and you start at zero, why don't all the other entries start at zero? Or maybe "root" is supposed to be zero?
Anyway, here's an example that starts at 1 instead of zero that should get you started...
XML Input (input.xml)
<root>
<level>
<name>testA</name>
<level>
<name>testB</name>
</level>
<level>
<name>testC</name>
<level>
<name>testD</name>
<level>
<name>testE</name>
</level>
</level>
</level>
</level>
</root>
XSLT 1.0 (test.xsl)
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="level">
<xsl:value-of select="concat(name, ' ')"/>
<xsl:for-each select="ancestor-or-self::level">
<xsl:if test="not(position()=1)">.</xsl:if>
<xsl:number/>
</xsl:for-each>
<xsl:text>
</xsl:text>
<xsl:apply-templates select="level"/>
</xsl:template>
</xsl:stylesheet>
Command Line
xmlstarlet tr test.xsl input.xml
Output
testA 1
testB 1.1
testC 1.2
testD 1.2.1
testE 1.2.1.1
This problem can be solved without recursion, by iterating over
elements on the ancestor-or-self axis.
The following xmlstarlet command processes all level elements using
the inner -m (xsl:for-each) to handle each path from root to target
(as suggested in comments the shell variable base defaults to 0 but
can be set to 1).
xmlstarlet select -T -t \
-m '//level' \
-v 'concat(name," ")' \
-m 'ancestor-or-self::level' \
--if 'position() = 1' \
-v "'${base:-0}'" \
--else \
-o '.' \
-v 'count(preceding-sibling::level) + 1' \
-b \
-b \
-n \
file.xml
Output:
testA 0
testB 0.1
testC 0.2
testD 0.2.1
testE 0.2.1.1
For a more compact inner -m -- producing the same output -- instead
-m 'ancestor-or-self::level' \
--if 'position() != 1' -o '.' -b \
-v 'count(preceding-sibling::level) + number(position() != 1)' \
-b \
where the count is incremented by 1 for all except the root level
where position() is 1.
As a variation on the same theme: select elements with the shell
variable target and print their paths as XPath expressions using the
XSLT current() function to reference the
element being processed by the inner -m:
target='//level[name="testB" or name="testE"]'
xmlstarlet select -T -t \
-m "${target}" \
-m 'ancestor-or-self::*' \
--var pos='1 + count(preceding-sibling::*[name() = name(current())])' \
-v 'concat("/",name(),"[",$pos,"]")' \
-b \
-n \
file.xml
Output:
/root[1]/level[1]/level[1]
/root[1]/level[1]/level[2]/level[1]/level[1]

XSLT 1.0: max value of a date node

Given following xml:
<Ergebnisse>
<Spiel>
<Datum>2013-10-02</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-03</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-03</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-03</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-06</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-06</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-06</Datum>
</Spiel>
<Spiel>
<Datum>2013-10-06</Datum>
</Spiel>
<Spiel>
<Datum>2014-05-01</Datum>
</Spiel>
<Spiel>
<Datum>2014-05-01</Datum>
</Spiel>
<Spiel>
<Datum>2014-04-27</Datum>
</Spiel>
</Ergebnisse>
Now I need to know, which is the highest date-value in "Datum". I'm using Python and lxml, so I can only work with Xpath 1.0.
I tried:
//Spiel[not (Datum < preceding::Spiel/Datum) and not (Datum < following::Spiel/Datum)]/Datum
but it returns all values. What can I do?
Thanks!
In XSLT 1.0 you can sort the values and take the last.
<xsl:for-each select="Speil/Datum">
<xsl:sort select="."/>
<xsl:if test="position()=last()"><xsl:value-of select="."/></xsl:if>
</xsl:for-each>
I wouldn't attempt it in XPath alone - if XPath 1.0 is all you have, return all the values and do the extraction in the Python host language.

Calculate time based metrics(hourly)

How would I calculate time-based metrics (hourly average) based on log file data?
let me make this more clear, consider a log file that contains entries as follows: Each UIDs appears only twice in the log. they will be in embeded xml format. And they will likely appear OUT of sequence. And the log file will have data for only one day so only one day records will be there.
No of UIDs are 2 millions in log file.
I have to find out average hourly reponse time for these requests. Below has request and response in log file. UID is the key to relate b/w request and response.
2013-04-03 08:54:19,451 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.448-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;FedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Beginning of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>
&lt;ratedDocument&gt;
&lt;objectType&gt;OLB_BBrecords&lt;/objectType&gt;
&lt;provider&gt;JET&lt;/provider&gt;
&lt;metadata&gt;&amp;lt;BooleanQuery&amp;gt;&amp;lt;Clause occurs=&amp;quot;must&amp;quot;&amp;gt;&amp;lt;TermQuery fieldName=&amp;quot;RegistrationNumber&amp;quot;&amp;gt;44565153050735751&amp;lt;/TermQuery&amp;gt;&amp;lt;/Clause&amp;gt;&amp;lt;/BooleanQuery&amp;gt;&lt;/metadata&gt;
&lt;/ratedDocument&gt;
</payload></log-message-body></body></log-event>
2013-04-03 08:54:19,989 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.987-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;fedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Successful Completion of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>0</payload></log-message-body></body></log-event>
here is the bash script I wrote.
uids=cat $i|grep "Service" |awk 'BEGIN {FS="lt;";RS ="gt;"} {print $2;}'| sort -u
for uid in ${uids}; do
count=`grep "$uid" test.log|wc -l`
if [ "${count}" -ne "0" ]; then
unique_uids[counter]="$uid"
let counter=counter+1
fi
done
echo ${unique_uids[#]}
echo $counter
echo " Unique No:" ${#unique_uids[#]}
echo uid StartTime EndTime" > $log
for unique_uids in ${unique_uids[#]} ; do
responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
echo $unique_uids $responseTime >> $log
done
And the output should be like this
Operation comes from id, Consumer comes from documentmetadata and hour is the time 08:54:XX
So if we have more than one request and response then need to average of the response times for requests came at that hour.
Operation Consumer HOUR Avg-response-time(ms)
DKP_DumpDocumentProperties MRP 08 538
Given your posted input file:
$ cat file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
This GNU awk script (you are using GNU awk since you set RS to a multi-character string in the script you posted in your question)
$ cat tst.awk
{
date = $1
time = $2
guid = gensub(/.*;gt;([^&]+).*/,"\\1","")
print guid, date, time
}
will pull out what I THINK is the information you care about:
$ gawk -f tst.awk file
904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989
904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389
edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979
edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569
The rest is simple math, right? And do it in this awk script - don't go piping the awk output to some goofy shell loop!
Extending Ed Morton's solution:
Content of script.awk
function parse_time (date, time, newtime) {
gsub(/-/, " ", date)
gsub(/:/, " ", time)
gsub(/,.*/, "", time)
newtime = date" "time
return newtime
}
(gensub(/.*;gt;([^&]+).*/,"\\1","") in starttime) {
etime = parse_time($1, $2)
endtime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = etime
next
}
{
stime = parse_time($1, $2)
starttime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = stime
}
END {
for (x in starttime) {
for (y in endtime) {
if (x==y) {
diff = mktime(endtime[x]) - mktime(starttime[y])
diff = sprintf("%dh:%dm:%ds",diff/(60*60),diff%(60*60)/60,diff%60)
print x, diff
delete starttime[x]
delete endtime[y]
}
}
}
}
Test: Modified the order of guid for testing
$ cat log.file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
$ awk -f script.awk log.file
904c-be-4e-bbda-3e62 0h:0m:20s
edfc-fr-5e-bced-3443 0h:0m:45s

How to filter UID which has request and response in log file [duplicate]

How would I calculate time-based metrics (hourly average) based on log file data?
let me make this more clear, consider a log file that contains entries as follows: Each UIDs appears only twice in the log. they will be in embeded xml format. And they will likely appear OUT of sequence. And the log file will have data for only one day so only one day records will be there.
No of UIDs are 2 millions in log file.
I have to find out average hourly reponse time for these requests. Below has request and response in log file. UID is the key to relate b/w request and response.
2013-04-03 08:54:19,451 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.448-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;FedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Beginning of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>
&lt;ratedDocument&gt;
&lt;objectType&gt;OLB_BBrecords&lt;/objectType&gt;
&lt;provider&gt;JET&lt;/provider&gt;
&lt;metadata&gt;&amp;lt;BooleanQuery&amp;gt;&amp;lt;Clause occurs=&amp;quot;must&amp;quot;&amp;gt;&amp;lt;TermQuery fieldName=&amp;quot;RegistrationNumber&amp;quot;&amp;gt;44565153050735751&amp;lt;/TermQuery&amp;gt;&amp;lt;/Clause&amp;gt;&amp;lt;/BooleanQuery&amp;gt;&lt;/metadata&gt;
&lt;/ratedDocument&gt;
</payload></log-message-body></body></log-event>
2013-04-03 08:54:19,989 INFO [Logger] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><log-event><message-time>2013-04-03T08:54:19.987-04:00</message-time><caller>PCMC.common.manage.springUtil</caller><body><log-message-body><headers>&lt;fedDKPLoggingContext id="DKP_DumpDocumentProperties" type="context.generated.FedDKPLoggingContext"&gt;&lt;logFilter&gt;7&lt;/logFilter&gt;&lt;logSeverity&gt;255&lt;/logSeverity&gt;&lt;schemaType&gt;PCMC.MRP.DocumentMetaData&lt;/schemaType&gt;&lt;UID&gt;073104c-4e-4ce-bda-694344ee62&lt;/UID&gt;&lt;consumerSystemId&gt;JTR&lt;/consumerSystemId&gt;&lt;consumerLogin&gt;jbserviceid&lt;/consumerLogin&gt;&lt;logLocation&gt;Successful Completion of Service&lt;/logLocation&gt;&lt;/fedDKPLoggingContext&gt;</headers><payload>0</payload></log-message-body></body></log-event>
here is the bash script I wrote.
uids=cat $i|grep "Service" |awk 'BEGIN {FS="lt;";RS ="gt;"} {print $2;}'| sort -u
for uid in ${uids}; do
count=`grep "$uid" test.log|wc -l`
if [ "${count}" -ne "0" ]; then
unique_uids[counter]="$uid"
let counter=counter+1
fi
done
echo ${unique_uids[#]}
echo $counter
echo " Unique No:" ${#unique_uids[#]}
echo uid StartTime EndTime" > $log
for unique_uids in ${unique_uids[#]} ; do
responseTime=`cat $i|grep "${unique_uids}" |awk '{split($2,Arr,":|,"); print Arr[1]*3600000+Arr[2]*60000+Arr[3]*1000+Arr[4]}'|sort -n`
echo $unique_uids $responseTime >> $log
done
And the output should be like this
Operation comes from id, Consumer comes from documentmetadata and hour is the time 08:54:XX
So if we have more than one request and response then need to average of the response times for requests came at that hour.
Operation Consumer HOUR Avg-response-time(ms)
DKP_DumpDocumentProperties MRP 08 538
Given your posted input file:
$ cat file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
This GNU awk script (you are using GNU awk since you set RS to a multi-character string in the script you posted in your question)
$ cat tst.awk
{
date = $1
time = $2
guid = gensub(/.*;gt;([^&]+).*/,"\\1","")
print guid, date, time
}
will pull out what I THINK is the information you care about:
$ gawk -f tst.awk file
904c-be-4e-bbda-3e62 2013-04-03 08:54:19,989
904c-be-4e-bbda-3e62 2013-04-03 08:54:39,389
edfc-fr-5e-bced-3443 2013-04-03 08:54:34,979
edfc-fr-5e-bced-3443 2013-04-03 08:55:19,569
The rest is simple math, right? And do it in this awk script - don't go piping the awk output to some goofy shell loop!
Extending Ed Morton's solution:
Content of script.awk
function parse_time (date, time, newtime) {
gsub(/-/, " ", date)
gsub(/:/, " ", time)
gsub(/,.*/, "", time)
newtime = date" "time
return newtime
}
(gensub(/.*;gt;([^&]+).*/,"\\1","") in starttime) {
etime = parse_time($1, $2)
endtime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = etime
next
}
{
stime = parse_time($1, $2)
starttime[gensub(/.*;gt;([^&]+).*/,"\\1","")] = stime
}
END {
for (x in starttime) {
for (y in endtime) {
if (x==y) {
diff = mktime(endtime[x]) - mktime(starttime[y])
diff = sprintf("%dh:%dm:%ds",diff/(60*60),diff%(60*60)/60,diff%60)
print x, diff
delete starttime[x]
delete endtime[y]
}
}
}
}
Test: Modified the order of guid for testing
$ cat log.file
2013-04-03 08:54:19,989 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:34,979 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:54:39,389 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;904c-be-4e-bbda-3e62&lt;/UId&gt;&lt;</body></event>
2013-04-03 08:55:19,569 INFO [LOGGER] <?xml version="1.0" encoding="UTF-8" standalone="yes"?><event><body>&lt;UId&gt;edfc-fr-5e-bced-3443&lt;/UId&gt;&lt;</body></event>
$ awk -f script.awk log.file
904c-be-4e-bbda-3e62 0h:0m:20s
edfc-fr-5e-bced-3443 0h:0m:45s

Renaming list elements after moving some of its items in XSLT 1.0.

I have this xml:
<Process>
<elem0>
<pcode>xx<pcode>
</elem0>
<elem1>
<pcode>xy<pcode>
</elem1>
<elem2>
<pcode>ab<pcode>
</elem2>
<elem3>
<pcode>AD<pcode>
</elem3>
</Process>
And I have to MOVE elements with pcode value='xy' to EdProcess, which I am doing it successfully with xslt. Also, I got the Process elements to show up in order with the help from fellow members here. Now the issue is, EdProcess needs to start at elem0, and any new elements that gets moved inside of it should be in order, i.e elem0, elem1, elem2, etc.
<Process>
<elem0>
<pcode>xx<pcode>
</elem0>
<elem1>
<pcode>ab<pcode>
</elem1>
<elem2>
<pcode>AD<pcode>
</elem2>
</Process>
<EdProcess>
<elem1>
<pcode>xy<pcode>
</elem1>
</EdProcess>
I would like for it to be
<Process>
<elem0>
<pcode>xx<pcode>
</elem0>
<elem1>
<pcode>ab<pcode>
</elem1>
<elem2>
<pcode>AD<pcode>
</elem2>
</Process>
<EdProcess>
<elem0>
<pcode>xy<pcode>
</elem0>
</EdProcess>
so it shows up properly in front end, but I am stuck. Tried sorting but didn't work. elem identifications are changing so it is harder for me to use a template of some sort. Since I am createing EdProcess, template matching is not working. Thanks in advice!
This can be done with a small modification to my answer to your previous question:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:template match="#* | node()">
<xsl:copy>
<xsl:apply-templates select="#* | node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*">
<root>
<Process>
<xsl:apply-templates select="*[pcode != 'xy']" mode="elems" />
</Process>
<EdProcess>
<xsl:apply-templates select="*[pcode = 'xy']" mode="elems" />
</EdProcess>
</root>
</xsl:template>
<xsl:template match="*" mode="elems">
<xsl:element name="elem{position() - 1}">
<xsl:apply-templates select="#* | node()" />
</xsl:element>
</xsl:template>
</xsl:stylesheet>
When this is run on the following input:
<Process>
<elem0>
<pcode>xx</pcode>
</elem0>
<elem1>
<pcode>xy</pcode>
</elem1>
<elem2>
<pcode>ab</pcode>
</elem2>
<elem3>
<pcode>xy</pcode>
</elem3>
<elem4>
<pcode>AD</pcode>
</elem4>
</Process>
The result is:
<root>
<Process>
<elem0>
<pcode>xx</pcode>
</elem0>
<elem1>
<pcode>ab</pcode>
</elem1>
<elem2>
<pcode>AD</pcode>
</elem2>
</Process>
<EdProcess>
<elem0>
<pcode>xy</pcode>
</elem0>
<elem1>
<pcode>xy</pcode>
</elem1>
</EdProcess>
</root>

Resources