xsl sort not working as expected - sorting

I've the below XML and using XSLT2.0
<A>
<BID>Pt.IV</BID>
<BID>Pt.III</BID>
<BID>Pt.IIIA</BID>
<BID>Pt.IIIB</BID>
<BID>Pt.IIIC</BID>
<BID>Pt.IIID</BID>
<BID>Pt.IIIE</BID>
<BID>Pt.IIIF</BID>
<BID>Pt.IIIAA</BID>
<BID>s.2(1)</BID>
<BID>s.3</BID>
<BID>s.3(1)</BID>
<BID>s.3(2)</BID>
<BID>s.3A</BID>
<BID>s.3B</BID>
<BID>s.4</BID>
<BID>s.4(2)</BID>
<BID>s.4(5)</BID>
<BID>s.4(2A)</BID>
<BID>s.4(4A)</BID>
<BID>s.6(3)</BID>
<BID>s.7</BID>
<BID>s.7A</BID>
<BID>s.8</BID>
<BID>s.9</BID>
<BID>s.12</BID>
<BID>s.13</BID>
<BID>s.20A</BID>
<BID>s.20F</BID>
<BID>s.20O</BID>
<BID>s.20S</BID>
<BID>s.20T</BID>
<BID>s.20W</BID>
<BID>s.21</BID>
<BID>s.21(2)</BID>
<BID>s.21(3)</BID>
<BID>s.21(2A)</BID>
<BID>s.21(4B)</BID>
<BID>s.21(4C)</BID>
<BID>s.21(4D)</BID>
<BID>s.21B</BID>
<BID>s.22(1)</BID>
<BID>s.22(1)(b)</BID>
<BID>s.22(4)</BID>
<BID>s.23</BID>
<BID>s.25(1A)</BID>
<BID>s.27</BID>
<BID>s.28</BID>
<BID>s.31</BID>
<BID>s.20O(2)</BID>
<BID>s.20W(2)</BID>
<BID>s.21B(1)</BID>
<BID>s.21B(2)</BID>
<BID>s.21B(3)</BID>
</A>
here i'm trying to sort the values of BID using the below XSLT.
<xsl:template match="A">
<xsl:for-each select="BID">
<xsl:sort select="substring-after(.,'.')"/>
<table class="toa-entry">
<tbody>
<tr class="secondary-entry">
<td class="entry-name">
<xsl:value-of select="."/></td>
</tr>
</tbody>
</table>
</xsl:for-each>
</xsl:template>
here the output that is get is as below.
But the expected is as below.
s2(1)
s3
s3(1)
s3(2)
s3A
s3B
s4
s4(2)
s4(5)
s4(2A)
s4(4A)
s6(3)
s7
s7A
s8
s9
s12
s13
s20A
s20F
s20O
s20O(2)
s20S
s20T
s20W
s20W(2)
s21
s21(2)
s21(3)
s21(2A)
here what's happening is, the sorting is working as first get all the numbers starts with 1, then 2, and so on.
where as i want it like in regular ascending order. 1,2,2a,3,3a and so on.
please let me know how i can get this output.
Here is working demo.
DEMO
Thanks

You should try something like:
XSLT 2.0
<xsl:stylesheet version="2.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:template match="/A">
<table >
<xsl:for-each select="BID">
<xsl:sort select="substring-before(., '.')" data-type="text" order="ascending"/>
<xsl:sort select="replace(substring-before(substring-after(concat(., '('), '.'), '('),'[A-Z]', '')" data-type="number" order="ascending"/>
<xsl:sort select="replace(substring-before(substring-after(concat(., '('), '.'), '('),'[0-9]', '')" data-type="text" order="ascending"/>
<xsl:sort select="substring-after(., '(')" data-type="text" order="ascending"/>
<tr>
<td><xsl:value-of select="."/></td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
The (rendered) result, when applied to your example:
Pt.III
Pt.IIIA
Pt.IIIAA
Pt.IIIB
Pt.IIIC
Pt.IIID
Pt.IIIE
Pt.IIIF
Pt.IV
s.2(1)
s.3
s.3(1)
s.3(2)
s.3A
s.3B
s.4
s.4(2)
s.4(2A)
s.4(4A)
s.4(5)
s.6(3)
s.7
s.7A
s.8
s.9
s.12
s.13
s.20A
s.20F
s.20O
s.20O(2)
s.20S
s.20T
s.20W
s.20W(2)
s.21
s.21(2)
s.21(2A)
s.21(3)
s.21(4B)
s.21(4C)
s.21(4D)
s.21B
s.21B(1)
s.21B(2)
s.21B(3)
s.22(1)
s.22(1)(b)
s.22(4)
s.23
s.25(1A)
s.27
s.28
s.31

You can't utilize a text sorting algorithm on numeric data.
Even though you have stripped out the characters, your data values are still text values.
If you require numeric sorting you need to tell the parser the data type of the data, which you can do using the data-type attribute.
data-type text | number | qname
Optional. Specifies the data-type of the data to be sorted. Default is "text"
EDIT: Replace your regex with this: [^a-zA-Z0-9 -]
There is a limitation here because the regex strips all non-numeric characters out of the values. Therefore if the initial list is not already sorted correctly within the numeric factor, for example
s.21(4B)
s.21(4C)
s.21(4D)
then the sorting will ignore the alphabetic component of the values.

If you're using Saxon, there is a collation you can request that treats any sequence of digits in the sort key as a number, so s12 sorts after s9.
collation="http://saxon.sf.net/collation?alphanumeric=yes"
It won't handle roman numerals though: sorting "App IX" after "App VIII" remains a challenge!

Related

How to get sum of an attribute value which is referenced by id multiple times with xpath in xslt 1.0?

I really do hope that my title is at least a bit clear.
important: i can only use xslt 1.0 because the project needs to work with the MSXML XSLT processor.
What I try to do:
I generate documents containing information about rooms. Rooms have walls, I need the sum of wall area of these per room.
The input xml file I get is dynamically created by another program.
Changing the structure of the input xml file is not the solution, trust me, it's needed like that and is much more complex than I show you here.
My XML (the innerArea attribute in the wall element has to get summed up):
<root>
<floor id="30" name="EG">
<flat name="Wohnung" nr="1">
<Room id="49" area="93.08565">
<WallSegments>
<WallSegment id="45"/>
<WallSegment id="42"/>
<WallSegment id="39"/>
</WallSegments>
</Room>
</flat>
</floor>
<components>
<Wall id="20" innerArea="20.7654"/>
<wallSegment id="45" wall="20">[...]</wallSegment>
<Wall id="21" innerArea="12.45678"/>
<wallSegment id="42" wall="21">[...]</wallSegment>
<Wall id="22" innerArea="17.8643"/>
<wallSegment id="39" wall="22">[...]</wallSegment>
</components>
</root>
With my XSLT I was able to reach the values of the walls which belong to a room.
But I have really no idea how I could get the sum of the value out of that.
My XSLT:
<xsl:for-each select="flat/Room">
<xsl:for-each select="WallSegments/WallSegment">
<xsl:variable name="curWallSegId" select="#id"/>
<xsl:for-each select="/root/components/wallSegment[#id = $curWallSegId]">
<xsl:variable name="curWallId" select="#wall"/>
<xsl:for-each select="/root/components/Wall[#id = $curWallId]">
<!--I didn't expect that this was working, but at least I tried :D-->
<xsl:value-of select="sum(#AreaInner)"/>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
Desired Output should be something like...
[...]
<paragraph>
Room 1:
Wall area: 51.09 m²
[...]
</paragraph>
[...]
So I hope I described my problem properly. If not: I am sorry, you may beat me right into the face x)
It's best to use keys to get "related" data. Place this at the top of your stylesheet, outside of any template:
<xsl:key name="wall" match="components/Wall" use="#id" />
<xsl:key name="wallSegment" match="components/wallSegment" use="#id" />
Then:
<xsl:for-each select="flat/Room">
<paragraph>
<xsl:text>Room </xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>:
Wall area: </xsl:text>
<xsl:value-of select="format-number(sum(key('wall', key('wallSegment', WallSegments/WallSegment/#id)/#wall)/#innerArea), '0.00m²')"/>
<xsl:text>
</xsl:text>
</paragraph>
</xsl:for-each>
will return:
<paragraph>Room 1:
Wall area: 51.09m²</paragraph>
If what you need it's the area of every room, this is a way of getting it:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/root/floor">
<xsl:for-each select="flat/Room">
<xsl:variable name="currentRoomSegmentsIds" select="WallSegments/WallSegment/#id"/>
<xsl:variable name="currentRoomWallsIds" select="/root/components/wallSegment[#id = $currentRoomSegmentsIds]/#wall"/>
<xsl:variable name="currentRoomWallsInnerAreas" select="/root/components/Wall[#id = $currentRoomWallsIds]/#innerArea"/>
Id of the room = <xsl:value-of select="#id"/>.
Area of the room = <xsl:value-of select="sum($currentRoomWallsInnerAreas)"/>
</xsl:for-each> <!-- Enf of for each room -->
</xsl:template>
</xsl:stylesheet>
This produces the following result:
Id of the room = 49.
Area of the room = 51.08648

XSLT concatenate input from several nodes in a single output

I'm trying to work out a transformation that will process an input with several Flights with Departure and Arrival into a single output with the complete route for the flights.
Input is as follows:
<FlightTrip>
<flights>
<departureAirport>
<airportCode>LocB</airportCode>
</departureAirport>
<departureTime>2013-03-28T10:00:00.000</departureTime>
<arrivalAirport>
<airportCode>LocC</airportCode>
</arrivalAirport>
</flights>
<flights>
<departureAirport>
<airportCode>LocA</airportCode>
</departureAirport>
<departureTime>2013-03-27T15:00:00.000</departureTime>
<arrivalAirport>
<airportCode>LocB</airportCode>
</arrivalAirport>
</flights>
<flights>
<departureAirport>
<airportCode>LocC</airportCode>
</departureAirport>
<departureTime>2013-03-30T14:00:00.000</departureTime>
<arrivalAirport>
<airportCode>LocD</airportCode>
</arrivalAirport>
</flights>
</FlightTrip>
The desired output would be this:
<FullTrip>LocA LocB LocC LocD</FullTrip>
I've tried to use foreach inside the output variable but I can't get it right. I also need to sort the input based on the departure date as the Flights can be in a different order (as per the sample input).
Any ideas of how to achieve this?
Thanks a lot!
Bruno
<?xml version="1.0" encoding="UTF-8" ?>
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output indent="yes"/>
<xsl:template match="FlightTrip">
<FullTrip>
<xsl:apply-templates select="flights">
<xsl:sort select="departureTime"/>
</xsl:apply-templates>
</FullTrip>
</xsl:template>
<xsl:template match="flights">
<xsl:value-of select="departureAirport/airportCode"/><xsl:text> </xsl:text>
<xsl:if test="position()=last()">
<xsl:value-of select="arrivalAirport/airportCode"/>
</xsl:if>
</xsl:template>
</xsl:transform>
Will produce:
<FullTrip>LocA LocB LocC LocD</FullTrip>
Working example
Thanks to Joepie for the enlightenment. I had to modify it a bit to get it to work in my environment, ended up using foreach as below:
<xsl:template match="/">
<xsl:variable name="locations">
<xsl:for-each select="/FlightTrip/flights">
<xsl:sort select="departureTime" order="ascending" data-type="text"/>
<xsl:value-of select="concat(departureAirport/airportCode,' - ')"/>
<xsl:if test="position() = last()">
<xsl:value-of select="arrivalAirport/airportCode"/>
</xsl:if>
</xsl:for-each>
</xsl:variable>
<FullTrip>
<xsl:value-of select="$locations"/>
</FullTrip>
</xsl:template>
When applied to the example produces the output below:
<FullTrip>LocA - LocB - LocC - LocD</FullTrip>
Thanks again!

Longer node in XPath

I'd like to use XPath to retrieve the longer of two nodes.
E.g., if my XML is
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
And I do document.SelectSingleNode(your XPath here)
I would expect to get back the url1 node. If url2 is longer, or there is no url1 node, I'd expect to get back the url2 node.
Seems simple but I'm having trouble figuring it out. Any ideas?
This works for me, but it is ugly. Cannot you do the comparison outside XPath?
record/*[starts-with(name(),'url')
and string-length(.) > string-length(preceding-sibling::*[1])
and string-length(.) > string-length(following-sibling::*[1])]/text()
<xsl:for-each select="*">
<xsl:sort select="string-length(.)" data-type="number"/>
<xsl:if test="position() = last()">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
Even works in XSLT 1.0!
Use this single XPath expression:
/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
the Xpath expression is evaluated and the result of this evaluation (the selected element) is copied to the output:
<url1>http://www.google.com</url1>

XSLT: poor performance due to complex XPath expression?

At some point in an XSLT program, I have the following:
<xsl:for-each select="tags/tag">
<xsl:apply-templates select="//shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id]">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
It seems that the XPath expression //shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id], which is rather complex, considerably slows down the execution of the program (compared to the execution time before adding the quoted piece of code -- processing the same data, of course).
Do you think this is normal due to the relatively complex nature of the expression, and do you see how I could improve it so it performs better?
NB: in the XPath expression, film and //films/film, group and //groups/group refer to distinct elements.
See below a stripped-down sample of the XML input.
<program>
<tags>
<tag id="1">Tag1</tag>
<tag id="2">Tag2</tag>
<tag id="3">Tag3</tag>
</tags>
<films>
<film id="1">
Film1
<tag>2</tag><!-- References: /program/tags/tag/#id=2 -->
</film>
<film id="2">
Film2
<tag>1</tag><!-- References: /program/tags/tag/#id=1 -->
</film>
<film id="3">
Film3
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
<film id="4">
Film4
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
</film>
<groups>
<group id="1">
<film>3</film><!-- References: /program/films/film/#id=3 -->
<film>4</film><!-- References: /program/films/film/#id=4 -->
</group>
</groups>
<shows>
<show id="1"><!-- Show with film (=simple) -->
<film>1</film><!-- References: /program/films/film/#id=1 -->
<date>2011-12-12</date>
<time>12:00</time>
</show>
<show id="2"><!-- Show with group (=combined) -->
<group>1</group><!-- References: /program/groups/group/#id=1 -->
<date>2011-12-12</date>
<time>14:00</time>
</show>
</shows>
</program>
Explanations:
A tag is a property attached to a film (in fact, it's rather a category).
A group is an enumeration of films.
A show references either a film or a group.
What I want: for each tag, I'm looking for the shows referencing a film having the current tag and the shows referencing a group where at least one of the films has the current tag.
Double slashes in XPath are performance and CPU hogs when working with large documents (since every node in the document must be evaluated). If you can replace it with either an absolute or relative path you should have a noticeable improvement. If you can post the input schema and required output, we could be more specific?
e.g. With an absolute path
//shows/show[film=//films/film[tag=current()/#id]/#id]
becomes
/myroot/somepath/shows/show[film=/myroot/somepath/films/film[tag=current()/#id]/#id]
or if the shows and films are relative to the current node
./relativexpath/shows/show[film=./relativexpath/somepath/films/film[tag=current()/#id]/#id]
The answer by nonnb very likely points to the problem, however not really to an efficient solution ("cheaper" axis are better, but that alone doesn't make the speed such as when indexing data).
Note that the big problem is that the XPath expression predicate does another full traversal of the tree for each evaluation. You should use keys for stuff like this; this will (in most or even all XSLT implementations) make an indexed lookup possible, thereby reducing the runtime a lot.
Define keys for the films, groups and shows by id:
<xsl:key name="filmByTag" match="film" use="tag" />
<xsl:key name="groupsByFilm" match="group" use="tag" />
<xsl:key name="showsByFilm" match="show" use="film" />
<xsl:key name="showsByGroup" match="show" use="group" />
And then use it like this (not tested, but you should get the idea):
<xsl:variable name="films" select="key('filmByTag', #id)/#id" />
<xsl:apply-templates select="key('showsByFilm', $films)/#id|key('showsByGroups', key('groupsByFilm', $films)/#id)/#id">
Your XPath expression seems to be doing a three-way join so unless it's optimized the performance is likely to be O(n^3) in the size of the source document. Optimization involves replacing the serial searches of the document by indexed lookups. There are two ways of achieving this: you can hand-optimize it by replacing the filter expressions with calls on the key() function (as indicated by Dimitre), or you can use an optimizing XSLT processor such as Saxon-EE, which should do the same optimizations automatically.
Define a key with xsl:key and then use the key function for the cross reference instead of that comparison you currently have. Show us a sample of the XML so that we can understand its structure, then we can help with concrete code.
Here are two complete solutions that should exhibit better performance:
Do note: Better performance will be registered on sufficiently large input samples only. On small input samples it isn't worth it to optimize.
I. Not using // (but not using keys)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vFilms" select="/*/films/film"/>
<xsl:variable name="vShows" select="/*/shows/show"/>
<xsl:variable name="vGroups" select="/*/groups/group"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"$vShows
[film
=
$vFilms
[tag=current()/#id]
/#id
or
group
=
$vGroups
[film
=
$vFilms
[tag=current()/#id]
/#id
]
/#id
]
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
II. Using keys
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kShowByFilmId" match="show"
use="film"/>
<xsl:key name="kShowByGroupId" match="show"
use="group"/>
<xsl:key name="kGroupByFilmId" match="group"
use="film"/>
<xsl:key name="kFilmByTag" match="film"
use="tag"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"key('kShowByFilmId',
key('kFilmByTag', current()/#id)/#id
)
|
key('kShowByGroupId',
key('kGroupByFilmId',
key('kFilmByTag', current()/#id)/#id
)
/#id
)
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Navigate HTML table columns with XPath 1.0

Using only an XPath expression (and not in XSLT or DOM - just pure XPath), I'm trying to create a relative path from the current node (in a td) to an associated td in the same column of the same HTML table.
For example, suppose I have this type of data:
<table>
<tr> <td><a>Blue Jeans</a></td> <td><a>Shirt</a></td> </tr>
<tr> <td><span>$21.50</span></td> <td><span>$18.99</span></td> </tr>
</table>
and I'm on the a with "Blue Jeans" and want to find the price ($21.50). In XSLT, I could use the current() function to get the answer like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:apply-templates select="//a" />
</xsl:template>
<xsl:template match="a">
Name: <xsl:value-of select="."/>
Price: <xsl:value-of select="../../following-sibling::tr[1]/td[position() = count(current()/../preceding-sibling::td) + 1]" />
</xsl:template>
</xsl:stylesheet>
But the problem I'm running into is that there is no current() defined in XPath 1.0. I tried using the self:: axis, but like the "." shorthand, that only points to the "context" node, not the "current" node. The language that I'm seeing in the XPath standard suggests that XPath doesn't have a concept of "current node."
Is there perhaps another way to form this path or is this a limitation of XPath?
In XPath 1.0 you could do:
/table/tr/td/a[.='Blue Jeans']/following::td[count(../td)]/span
Of course, this assumes there is no colspan.
EDIT: The proof. This stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:param name="pProduct" select="'Blue Jeans'"/>
<xsl:template match="/">
<xsl:value-of select="/table/tr/td/a[.=$pProduct]
/following::td[count(../td)]/span"/>
</xsl:template>
</xsl:stylesheet>
Output:
$21.50
With param pProduct set to 'Shirt', output:
$18.99
Note: Of course, you need the a element in context in order to select the span element. So, with your stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="text()"/>
<xsl:template match="a">
Name: <xsl:value-of select="."/>
Price: <xsl:value-of select="following::td[count(../td)]/span" />
</xsl:template>
</xsl:stylesheet>
Output:
Name: Blue Jeans
Price: $21.50
Name: Shirt
Price: $18.99
This cannot be achieved with a single XPath 1.0 expression.
In XPath 2.0 one could write:
for $vPreceeding in count(../preceding-sibling::td)
return ../../following-sibling::tr[1]/td[$vPreceeding]

Resources