XSLT: poor performance due to complex XPath expression? - performance

At some point in an XSLT program, I have the following:
<xsl:for-each select="tags/tag">
<xsl:apply-templates select="//shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id]">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
It seems that the XPath expression //shows/show[film=//films/film[tag=current()/#id]/#id]|//shows/show[group=//groups/group[film=//films/film[tag=current()/#id]/#id]/#id], which is rather complex, considerably slows down the execution of the program (compared to the execution time before adding the quoted piece of code -- processing the same data, of course).
Do you think this is normal due to the relatively complex nature of the expression, and do you see how I could improve it so it performs better?
NB: in the XPath expression, film and //films/film, group and //groups/group refer to distinct elements.
See below a stripped-down sample of the XML input.
<program>
<tags>
<tag id="1">Tag1</tag>
<tag id="2">Tag2</tag>
<tag id="3">Tag3</tag>
</tags>
<films>
<film id="1">
Film1
<tag>2</tag><!-- References: /program/tags/tag/#id=2 -->
</film>
<film id="2">
Film2
<tag>1</tag><!-- References: /program/tags/tag/#id=1 -->
</film>
<film id="3">
Film3
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
<film id="4">
Film4
<tag>3</tag><!-- References: /program/tags/tag/#id=3 -->
</film>
</film>
<groups>
<group id="1">
<film>3</film><!-- References: /program/films/film/#id=3 -->
<film>4</film><!-- References: /program/films/film/#id=4 -->
</group>
</groups>
<shows>
<show id="1"><!-- Show with film (=simple) -->
<film>1</film><!-- References: /program/films/film/#id=1 -->
<date>2011-12-12</date>
<time>12:00</time>
</show>
<show id="2"><!-- Show with group (=combined) -->
<group>1</group><!-- References: /program/groups/group/#id=1 -->
<date>2011-12-12</date>
<time>14:00</time>
</show>
</shows>
</program>
Explanations:
A tag is a property attached to a film (in fact, it's rather a category).
A group is an enumeration of films.
A show references either a film or a group.
What I want: for each tag, I'm looking for the shows referencing a film having the current tag and the shows referencing a group where at least one of the films has the current tag.

Double slashes in XPath are performance and CPU hogs when working with large documents (since every node in the document must be evaluated). If you can replace it with either an absolute or relative path you should have a noticeable improvement. If you can post the input schema and required output, we could be more specific?
e.g. With an absolute path
//shows/show[film=//films/film[tag=current()/#id]/#id]
becomes
/myroot/somepath/shows/show[film=/myroot/somepath/films/film[tag=current()/#id]/#id]
or if the shows and films are relative to the current node
./relativexpath/shows/show[film=./relativexpath/somepath/films/film[tag=current()/#id]/#id]

The answer by nonnb very likely points to the problem, however not really to an efficient solution ("cheaper" axis are better, but that alone doesn't make the speed such as when indexing data).
Note that the big problem is that the XPath expression predicate does another full traversal of the tree for each evaluation. You should use keys for stuff like this; this will (in most or even all XSLT implementations) make an indexed lookup possible, thereby reducing the runtime a lot.
Define keys for the films, groups and shows by id:
<xsl:key name="filmByTag" match="film" use="tag" />
<xsl:key name="groupsByFilm" match="group" use="tag" />
<xsl:key name="showsByFilm" match="show" use="film" />
<xsl:key name="showsByGroup" match="show" use="group" />
And then use it like this (not tested, but you should get the idea):
<xsl:variable name="films" select="key('filmByTag', #id)/#id" />
<xsl:apply-templates select="key('showsByFilm', $films)/#id|key('showsByGroups', key('groupsByFilm', $films)/#id)/#id">

Your XPath expression seems to be doing a three-way join so unless it's optimized the performance is likely to be O(n^3) in the size of the source document. Optimization involves replacing the serial searches of the document by indexed lookups. There are two ways of achieving this: you can hand-optimize it by replacing the filter expressions with calls on the key() function (as indicated by Dimitre), or you can use an optimizing XSLT processor such as Saxon-EE, which should do the same optimizations automatically.

Define a key with xsl:key and then use the key function for the cross reference instead of that comparison you currently have. Show us a sample of the XML so that we can understand its structure, then we can help with concrete code.

Here are two complete solutions that should exhibit better performance:
Do note: Better performance will be registered on sufficiently large input samples only. On small input samples it isn't worth it to optimize.
I. Not using // (but not using keys)
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:variable name="vFilms" select="/*/films/film"/>
<xsl:variable name="vShows" select="/*/shows/show"/>
<xsl:variable name="vGroups" select="/*/groups/group"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"$vShows
[film
=
$vFilms
[tag=current()/#id]
/#id
or
group
=
$vGroups
[film
=
$vFilms
[tag=current()/#id]
/#id
]
/#id
]
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>
II. Using keys
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:key name="kShowByFilmId" match="show"
use="film"/>
<xsl:key name="kShowByGroupId" match="show"
use="group"/>
<xsl:key name="kGroupByFilmId" match="group"
use="film"/>
<xsl:key name="kFilmByTag" match="film"
use="tag"/>
<xsl:variable name="vTags" select="/*/tags/tag"/>
<xsl:template match="/*">
<xsl:for-each select="$vTags">
<xsl:apply-templates select=
"key('kShowByFilmId',
key('kFilmByTag', current()/#id)/#id
)
|
key('kShowByGroupId',
key('kGroupByFilmId',
key('kFilmByTag', current()/#id)/#id
)
/#id
)
">
<xsl:sort select="date" data-type="text" order="ascending"/>
<xsl:sort select="time" data-type="text" order="ascending"/>
</xsl:apply-templates>
</xsl:for-each>
</xsl:template>
<xsl:template match="show">
<xsl:copy-of select="."/>
</xsl:template>
</xsl:stylesheet>

Related

XSLT 2.0 dynamic XPATH expression

I have one XML file that I need to transform based on a mapping file with XSLT 2.0. I'm using the Saxon HE processor.
My mapping file:
<element root="TEST">
<childName condition="/TEST/MyElement/CHILD[text()='B']>/TEST/MyElement/CHILD</childName>
<childBez condition="/TEST/MyElement/CHILD[text()='B']>/TEST/MyElement/CHILDBEZ</childBez>
</element>
I have to copy the elements CHILD and CHILDBEZ plus the parent and the root elements when the text of CHILD equals B.
So with this Input:
<?xml version="1.0" encoding="UTF-8"?>
<TEST>
<MyElement>
<CHILD>A</CHILD>
<CHILDBEZ>ABEZ</CHILDBEZ>
<NotInteresting></NotInteresting>
</MyElement>
<MyElement>
<CHILD>B</CHILD>
<CHILDBEZ>BBEZ</CHILDBEZ>
<NotInteresting2></NotInteresting2>
</MyElement>
</TEST>
the desired output:
<TEST>
<MyElement>
<childName>B</childName>
<childBez>BBEZ</childBez>
</MyElement>
</TEST>
what I have so far (based on this solution XSLT 2.0 XPATH expression with variable):
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:strip-space elements="*"/>
<xsl:param name="mapping" select="document('mapping.xml')"/>
<xsl:key name="map" match="*" use="."/>
<xsl:template match="/">
<xsl:variable name="first-pass">
<xsl:apply-templates mode="first-pass"/>
</xsl:variable>
<xsl:apply-templates select="$first-pass/*"/>
</xsl:template>
<xsl:template match="*" mode="first-pass">
<xsl:param name="parent-path" tunnel="yes"/>
<xsl:variable name="path" select="concat($parent-path, '/', name())"/>
<xsl:variable name="replacement" select="key('map', $path, $mapping)"/>
<xsl:variable name="condition" select="key('map', $path, $mapping)/#condition"/>
<xsl:choose>
<xsl:when test="$condition!= ''">
<!-- if there is a condition defined in the mapping file, check for it -->
</xsl:when>
<xsl:otherwise>
<xsl:element name="{if ($replacement) then name($replacement) else name()}">
<xsl:attribute name="original" select="not($replacement)"/>
<xsl:apply-templates mode="first-pass">
<xsl:with-param name="parent-path" select="$path" tunnel="yes"/>
</xsl:apply-templates>
</xsl:element>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
<xsl:template match="*">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
<xsl:template match="*[#original='true' and not(descendant::*/#original='false')]"/>
</xsl:stylesheet>
but the problem is that it's impossible to evaluate dynamic XPATH expressions with XSLT 2.0. Does anyone knows a workaround for that? Plus I have a problem with the mapping file. When there is only one element in it, it's not working at all.
If dynamic XPath evaluation isn't an option in your chosen processor, then generating an XSLT stylesheet is often a good alternative. In fact, it's often a good alternative anyway.
One way of thinking about this is that your mapping file is actually a program written in a very simple transformation language. There are two ways of executing this program: you can write an interpreter (dynamic XPath evaluation), or you can write a compiler (XSLT stylesheet generation). Both work well.

How to get sum of an attribute value which is referenced by id multiple times with xpath in xslt 1.0?

I really do hope that my title is at least a bit clear.
important: i can only use xslt 1.0 because the project needs to work with the MSXML XSLT processor.
What I try to do:
I generate documents containing information about rooms. Rooms have walls, I need the sum of wall area of these per room.
The input xml file I get is dynamically created by another program.
Changing the structure of the input xml file is not the solution, trust me, it's needed like that and is much more complex than I show you here.
My XML (the innerArea attribute in the wall element has to get summed up):
<root>
<floor id="30" name="EG">
<flat name="Wohnung" nr="1">
<Room id="49" area="93.08565">
<WallSegments>
<WallSegment id="45"/>
<WallSegment id="42"/>
<WallSegment id="39"/>
</WallSegments>
</Room>
</flat>
</floor>
<components>
<Wall id="20" innerArea="20.7654"/>
<wallSegment id="45" wall="20">[...]</wallSegment>
<Wall id="21" innerArea="12.45678"/>
<wallSegment id="42" wall="21">[...]</wallSegment>
<Wall id="22" innerArea="17.8643"/>
<wallSegment id="39" wall="22">[...]</wallSegment>
</components>
</root>
With my XSLT I was able to reach the values of the walls which belong to a room.
But I have really no idea how I could get the sum of the value out of that.
My XSLT:
<xsl:for-each select="flat/Room">
<xsl:for-each select="WallSegments/WallSegment">
<xsl:variable name="curWallSegId" select="#id"/>
<xsl:for-each select="/root/components/wallSegment[#id = $curWallSegId]">
<xsl:variable name="curWallId" select="#wall"/>
<xsl:for-each select="/root/components/Wall[#id = $curWallId]">
<!--I didn't expect that this was working, but at least I tried :D-->
<xsl:value-of select="sum(#AreaInner)"/>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
</xsl:for-each>
Desired Output should be something like...
[...]
<paragraph>
Room 1:
Wall area: 51.09 m²
[...]
</paragraph>
[...]
So I hope I described my problem properly. If not: I am sorry, you may beat me right into the face x)
It's best to use keys to get "related" data. Place this at the top of your stylesheet, outside of any template:
<xsl:key name="wall" match="components/Wall" use="#id" />
<xsl:key name="wallSegment" match="components/wallSegment" use="#id" />
Then:
<xsl:for-each select="flat/Room">
<paragraph>
<xsl:text>Room </xsl:text>
<xsl:value-of select="position()"/>
<xsl:text>:
Wall area: </xsl:text>
<xsl:value-of select="format-number(sum(key('wall', key('wallSegment', WallSegments/WallSegment/#id)/#wall)/#innerArea), '0.00m²')"/>
<xsl:text>
</xsl:text>
</paragraph>
</xsl:for-each>
will return:
<paragraph>Room 1:
Wall area: 51.09m²</paragraph>
If what you need it's the area of every room, this is a way of getting it:
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
version="1.0">
<xsl:template match="/root/floor">
<xsl:for-each select="flat/Room">
<xsl:variable name="currentRoomSegmentsIds" select="WallSegments/WallSegment/#id"/>
<xsl:variable name="currentRoomWallsIds" select="/root/components/wallSegment[#id = $currentRoomSegmentsIds]/#wall"/>
<xsl:variable name="currentRoomWallsInnerAreas" select="/root/components/Wall[#id = $currentRoomWallsIds]/#innerArea"/>
Id of the room = <xsl:value-of select="#id"/>.
Area of the room = <xsl:value-of select="sum($currentRoomWallsInnerAreas)"/>
</xsl:for-each> <!-- Enf of for each room -->
</xsl:template>
</xsl:stylesheet>
This produces the following result:
Id of the room = 49.
Area of the room = 51.08648

How to order self-referencing xml

I have a list of order lines with each one product on them. The products in may form a self-referencing hierarchy. I need to order the lines in such a way that all products that have no parent or whose parent is missing from the order are at the top, followed by their children. No child may be above its parent in the end result.
So how can i order the following xml:
<order>
<line><product code="3" parent="1"/></line>
<line><product code="2" parent="1"/></line>
<line><product code="6" parent="X"/></line>
<line><product code="1" /></line>
<line><product code="4" parent="2"/></line>
</order>
Into this:
<order>
<line><product code="6" parent="X"/></line>
<line><product code="1" /></line>
<line><product code="2" parent="1"/></line>
<line><product code="3" parent="1"/></line>
<line><product code="4" parent="2"/></line>
</order>
Note that the order within a specific level is not important, as long as the child node follows at some point after it's parent.
I have a solution which works for hierarchies that do not exceed a predefined depth:
<order>
<xsl:variable name="level-0"
select="/order/line[ not(product/#parent=../line/product/#code) ]"/>
<xsl:for-each select="$level-0">
<xsl:copy-of select="."/>
</xsl:for-each>
<xsl:variable name="level-1"
select="/order/line[ product/#parent=$level-0/product/#code ]"/>
<xsl:for-each select="$level-1">
<xsl:copy-of select="."/>
</xsl:for-each>
<xsl:variable name="level-2"
select="/order/line[ product/#parent=$level-1/product/#code ]"/>
<xsl:for-each select="$level-2">
<xsl:copy-of select="."/>
</xsl:for-each>
</order>
The above sample xslt will work for hierarchies with a maximum depth of 3 levels and is easily extended to more, but how can i generalize this and have the xslt sort arbitrary levels of depth correctly?
To start with, you could define a couple of keys to help you look up the line elements by either their code or parent attribute
<xsl:key name="products-by-parent" match="line" use="product/#parent" />
<xsl:key name="products-by-code" match="line" use="product/#code" />
You would start off by selecting the line elements with no parent, using a key to do this check:
<xsl:apply-templates select="line[not(key('products-by-code', product/#parent))]"/>
Then, within the template that matches the line element, you would just copy the element, and then select its "children" like so, using the other key
<xsl:apply-templates select="key('products-by-parent', product/#code)"/>
This would be a recursive call, so it would recursively look for its children until no more are found.
Try this XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="products-by-parent" match="line" use="product/#parent"/>
<xsl:key name="products-by-code" match="line" use="product/#code"/>
<xsl:template match="order">
<xsl:copy>
<xsl:apply-templates select="line[not(key('products-by-code', product/#parent))]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="line">
<xsl:call-template name="identity"/>
<xsl:apply-templates select="key('products-by-parent', product/#code)"/>
</xsl:template>
<xsl:template match="#*|node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Do note the use of the XSLT identity transform to copy the existing nodes in the XML.
Very interesting problem. I would do this in two passes: first, nest the elements according to their hierarchy. Then output the elements, sorted by the count of their ancestors.
XSLT 1.0 (+ EXSLT node-set() function):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="product-by-code" match="product" use="#code" />
<!-- first pass -->
<xsl:variable name="nested">
<xsl:apply-templates select="/order/line/product[not(key('product-by-code', #parent))]" mode="nest"/>
</xsl:variable>
<xsl:template match="product" mode="nest">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates select="../../line/product[#parent=current()/#code]" mode="nest"/>
</xsl:copy>
</xsl:template>
<!-- output -->
<xsl:template match="/order">
<xsl:copy>
<xsl:for-each select="exsl:node-set($nested)//product">
<xsl:sort select="count(ancestor::*)" data-type="number" order="ascending"/>
<line><product><xsl:copy-of select="#*"/></product></line>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to your input, the result is:
<?xml version="1.0" encoding="UTF-8"?>
<order>
<line>
<product code="6" parent="X"/>
</line>
<line>
<product code="1"/>
</line>
<line>
<product code="3" parent="1"/>
</line>
<line>
<product code="2" parent="1"/>
</line>
<line>
<product code="4" parent="2"/>
</line>
</order>
This still leaves the issue of the existing/missing parent X - I will try to address that later.

Longer node in XPath

I'd like to use XPath to retrieve the longer of two nodes.
E.g., if my XML is
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
And I do document.SelectSingleNode(your XPath here)
I would expect to get back the url1 node. If url2 is longer, or there is no url1 node, I'd expect to get back the url2 node.
Seems simple but I'm having trouble figuring it out. Any ideas?
This works for me, but it is ugly. Cannot you do the comparison outside XPath?
record/*[starts-with(name(),'url')
and string-length(.) > string-length(preceding-sibling::*[1])
and string-length(.) > string-length(following-sibling::*[1])]/text()
<xsl:for-each select="*">
<xsl:sort select="string-length(.)" data-type="number"/>
<xsl:if test="position() = last()">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
Even works in XSLT 1.0!
Use this single XPath expression:
/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
the Xpath expression is evaluated and the result of this evaluation (the selected element) is copied to the output:
<url1>http://www.google.com</url1>

Modify an XML node using [MS] XSLT Script

I would like to select a node and modify its attributes and child-nodes using an
xsl:script function. In addition, templates matching child-nodes of that node should
STILL perform their job (after script is done processing the node).
Can it be done using XSLT?
Can you please provide an example / skeleton for such a transformation?
Yes, it can be done. I don't seem to see what the problem is because the XML (or whatever output) of an XSL script is buffered independently from its input.
This is illustrated in the following example whereby a simple XSL script copies an input XML document mostly as-is, changing a few things:
the root element name and attribute
flattening by removing the element from the hierarchy
dropping the results/date element
rename the item's 'source' attribute 'origin'
change the item's 'level' attribute value
rename the FirstName and LastName elements of the item elements
Sample input
<?xml version="1.0" encoding="ISO-8859-1"?>
<MyRoot version="1.2">
<results>
<info>Alpha Bravo</info>
<author>Employee No 321</author>
<date/>
<item source="www" level="6" cost="33">
<FirstName>Jack</FirstName>
<LastName>Frost</LastName>
<Date>1998-10-30</Date>
<Organization>Lemon growers association</Organization>
</item>
<item source="db-11" level="1" cost="65" qry="routine 21">
<FirstName>Mike</FirstName>
<LastName>Black</LastName>
<Date>2006-10-30</Date>
<Organization>Ford Motor Company</Organization>
</item>
</results>
</MyRoot>
Output produced
<?xml version="1.0" encoding="utf-16"?>
<MyNewRoot version="0.1">
<author>Employee No 321</author>
<info>Alpha Bravo</info>
<item cost="33" origin="www" level="77">
<GivenName>Jack</GivenName>
<FamilyName>Frost</FamilyName>
<Date>1998-10-30</Date>
<Organization>Lemon growers association</Organization>
</item>
<item cost="65" qry="routine 21" origin="db-11" level="77">
<GivenName>Mike</GivenName>
<FamilyName>Black</FamilyName>
<Date>2006-10-30</Date>
<Organization>Ford Motor Company</Organization>
</item>
</MyNewRoot>
XSL script
<?xml version='1.0'?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
exclude-result-prefixes="#default">
<xsl:template match="MyRoot">
<xsl:call-template name="MainTemplate">
</xsl:call-template>
</xsl:template>
<xsl:template name="MainTemplate">
<MyNewRoot version="0.1">
<xsl:copy-of select="results/author" />
<xsl:copy-of select="results/info" />
<xsl:for-each select="results/item">
<xsl:call-template name="FixItemElement"/>
</xsl:for-each>
</MyNewRoot>
</xsl:template>
<xsl:template name="FixItemElement">
<xsl:copy>
<xsl:copy-of select="#*[not(name()='source' or name()='level')]" />
<xsl:attribute name="origin">
<xsl:value-of select="#source"/>
</xsl:attribute>
<xsl:attribute name="level">
<xsl:value-of select="77"/>
</xsl:attribute>
<xsl:for-each select="descendant::*">
<xsl:choose>
<xsl:when test="local-name(.) = 'FirstName'">
<GivenName>
<xsl:value-of select="."/>
</GivenName>
</xsl:when>
<xsl:when test="local-name(.) = 'LastName'">
<FamilyName>
<xsl:value-of select="."/>
</FamilyName>
</xsl:when>
<xsl:otherwise>
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:otherwise>
</xsl:choose>
</xsl:for-each>
</xsl:copy>
</xsl:template>

Resources