Saxon he 9.4 performance -xslt - performance

I have following two templates in xslt :
<xsl:template name="calculateAbsoluteEntryNodeIndex">
<!-- current 'entry' node -->
<xsl:param name="entryNode"/>
<!-- current 'entry' node index (position) in xml tree-->
<xsl:param name="entryNodePosition"/>
<xsl:choose>
<!--if the current 'entry' node contains 'namest' attribute then its ('namest') value is treated as
the absolute index (of the current 'entry' node)-->
<xsl:when test="$entryNode/#namest">
<!--writing result-->
<xsl:value-of select="number($entryNode/#namest)"/>
<xsl:text>;</xsl:text>
<xsl:value-of select="number($entryNode/#nameend)"/>
</xsl:when>
<xsl:otherwise>
<!--getting last 'Nameend' attribute value-->
<xsl:variable name="lastNameEndValue">
<xsl:choose>
<!--check if exists any 'entry' node before the current 'entry' node (on the current 'row' level) having 'nameend' attribute defined
('entry' has to have index number less than $entryNodePosition) -->
<xsl:when test="$entryNode/preceding-sibling::entry[#nameend]">
<!--get 'named' attribute value of the last matched "entry" node and convert it to number -->
<xsl:value-of select="number(($entryNode/preceding-sibling::entry/#nameend)[last()])"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="0"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<!--getting 'entry' node index of the matched 'Nameend' attribute -->
<xsl:variable name="lastNameendNodePosition">
<xsl:choose>
<!-- if lastNameEndValue != 0 -->
<xsl:when test="$lastNameEndValue != '0'">
<!-- calculate index of the 'entry' node matched in $lastNameEndValue selection =>it is done by counting all preceding siblings of the node matched in
$lastNameEndValue increased by 1-->
<xsl:value-of select="count(($entryNode/preceding-sibling::entry[#nameend])[last()]/preceding-sibling::entry) + 1"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="0"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<!--writing result-->
<xsl:value-of select="$entryNodePosition - $lastNameendNodePosition + $lastNameEndValue"/>
<xsl:text>;</xsl:text>
<xsl:value-of select="$entryNodePosition - $lastNameendNodePosition + $lastNameEndValue"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
And
I runned some profiling that goes with saxon he 9.4.-TP:profile.html
template calculateAbsoluteEntryNodeIndex 25268028
Total time spent on this template is 174966.587ms.
Whole xslts is executing in Total time: 337196.696 milliseconds.
It seems that is having a problems with transfomration of big tables around 14 thousands lines of xml. Any idea what can be issue here.
Structure of table is.
<?xml version="1.0" encoding="UTF-8"?>
<table tabledef="excel">
<tgroup cols="1">
<colspec colname="1" colnum="1" colwidth="100%"/>
<thead>
<row>
<entry morerows="1">
<p>
Text
</p>
</entry>
</thead>
<tbody>
<row>
<entry align="left">
<p>1</p>
</entry>
</row>
</tbody>
</tgroup>
</table>

There's not really enough information here: for example, what's the typical number of sibling entry elements within a row? And how often is this template executed? I guess you are probably executing it once per entry, and it is obviously quadratic in the number of entries.
The repetition of the expression $entryNode/preceding-sibling::entry[#nameend] is obviously wasteful.
It's very hard to offer advice on whether there are other ways of writing this that would go faster, without knowing anything about what the code is actually doing. Perhaps using xsl:number for some of the counting would work better; it's very difficult to tell. Alternatively, instead of doing a "for-each" that processes each entry independently, consider doing sibling recursion that works forwards through the nodes, passing parameter information about nodes already processed so you don't have to search backwards to preceding-sibling nodes.

Related

xsl <choose>: How to select a following-sibling with a certain child?

I have the following fragment of a dictionary entry:
<entry>
<form>
<orth></orth>
<orth></orth>
</form>
<form>
<note></note>
<orth></orth>
</form>
</entry>
With xsl <choose> I want to select <form> only when the following <form> has <note> as a child. I tried
<xsl:template match="tei:form">
<xsl:choose>
<xsl:when test="following-sibling::*[1][name()='form' and child='note']">
<xsl:apply-templates/><text>\ </text>
</xsl:when>
</xsl:choose>
</xsl:template>
But this does not work. How should I correctly address <form> with <note> as child?
Your question is confusing. xsl:choose does not select anything.
If you want to use xsl:choose in the context of form - IOW, you want to process all form elements and choose which code should be executed based on the existence of note in the immediately following sibling - try something like:
<xsl:template match="form">
<xsl:choose>
<xsl:when test="following-sibling::form[1]/note">
<!-- DO SOMETHING -->
</xsl:when>
<xsl:otherwise>
<!-- DO SOMETHING ELSE -->
</xsl:otherwise>
</xsl:choose>
</xsl:template>
In order to process only the form elements that satisfy the condition, try instead:
<xsl:template match="entry">
<!-- ... -->
<xsl:for-each select="form[following-sibling::form[1]/note]">
<!-- DO SOMETHING -->
</xsl:for-each>
<!-- ... -->
</xsl:template>
If you are going to apply templates to all of your form elements, then you can avoid the use of conditional instrucctions and just use patterns like:
<xsl:template match="form">
<!-- General case -->
</xsl:template>
<xsl:template match="form[following-sibling::form[1]/note]">
<!-- Particular case -->
</xsl:template>
Do note: these patterns have different default priority, thus the template to apply is perfectly determinded.

Find position of a node with specific text among identical nodes using xpath

There is a set of trademark nodes <tm> in a single document. Each node <tm> contains the text node inside - trademark name. There may be identical nodes among tm's that means they have the same trademark name. I need to write the template that will add the trademark character ™ (™) only to the first occurrence of each trademark.
Example:
<doc>
<a><tm>A</tm></a>
<tm>A</tm>
<tm>B</tm>
<b><tm>B</tm></b>
<a><b><c><tm>A</tm></c></b></a>
</doc>
Only the first occurrences of <tm>A</tm> and <tm>B</tm> should be processed.
The expected result is:
<doc>
<a><tm>A™</tm></a>
<tm>A</tm>
<tm>B™</tm>
<b><tm>B</tm></b>
<a><b><c><tm>A</tm></c></b></a>
</doc>
The difficulty here is that there are identical nodes. Besides, I cannot write a separate template for each trademark, one template should match all.
Here is a draft of the solution:
<xsl:template match="tm">
<xsl:variable name="text" select="text()"/>
<xsl:variable name="same_tms" select="//tm[text()=$text]"/>
<xsl:if test=" --- current tm is the first among $same_tms --- ">
<xsl:value-of select="concat(text(), '™')"/>
</xsl:if>
</xsl:template>
I don't know how to write a generic test condition that would check if the current <tm> is the first among $same_tms. Is it possible?
Use a key, as in Muenchian grouping (http://www.jenitennison.com/xslt/grouping/muenchian.xml), only that with XSLT 2.0 you can use is instead of the generate-id() test you would need in XSLT 1.0:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:key name="tm" match="tm" use="."/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="tm[. is key('tm', .)[1]]">
<xsl:copy>
<xsl:value-of select="concat(., '™')"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
Online as http://xsltransform.net/ncdD7mC.

XPath selection excluding element from a child node

I need to select all <next> nodes, but excluding <element4> from each and add a new element in its place (it would be a replace). I'm working with php.
<root>
<next>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<element4>text</element1>
</node>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<element4>text</element1>
</node>
</next>
</root>
so it should look like this:
<next>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<new>text</new>
</node>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<new><int xmlns="foo.bar">0</int></new>
</node>
</next>
Any tips? Thank you!
XPath is a selection language: it selects nodes or atomic items from an input sequence, it is the language of choice to make a selection over XML or hierarchical data, as SQL is (usually) the language of choice with relational databases.
As such, you can exclude elements from the selection, but you cannot update or change the original sequence. It is possible to do a limited transformation (i.e., turn a string in an integer), but this will change what is selected, it will not change the source. While XPath (namely version 2.0 and up) can "create" atomic values on the fly, it cannot create new elements.
This is possible and will return numeric values in XPath 2.0:
/next/node/number(.)
But this is not possible:
/next/node/(if (element4) then create-element(.) else .)
However, in XSLT 2.0 and up you can create a function that creates elements. As said above, XPath selects, and if you want to change the document, you can create a new document using XSLT (the T standing for Transformation).
Something like the following (partial XSLT 2.0, you need add headers):
<xsl:function name="f:create">
<xsl:param name="node" />
<xsl:param name="name" />
<xsl:choose>
<xsl:when test="name($node) = $name">
<xsl:element name="{if(number($node)) then 'int' else 'new'}">
<xsl:value-of select="$node" />
</xsl:element>
</xsl:when>
<xsl:otherwise><xsl:copy-of select="$node" /></xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:template match="node">
<!-- now XPath, with help of XSLT function, can conditionally create nodes -->
<xsl:copy-of select="child::*/create(., 'element4')" />
</xsl:template>
<!-- boilerplate code, typically used to recursively copy non-matched nodes -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
Note that, while this shows how you can create a different element using XPath and an XSLT function, it does not change the source, it changes the output. Also, it is not a recommended practice, as in XSLT the same pattern is more easily done by simply doing:
<!-- the more specific match -->
<xsl:template match="element4[number(.)]">
<new>
<int xmlns="foo.bar">
<xsl:value-of select="number(.)" />
</int>
</new>
<xsl:template>
<!-- XSLT will automatically fallback to this one if the former fails -->
<xsl:template match="element4">
<new><xsl:copy-of select="node()" /></new>
</xsl:template>
<!-- or this one, if both the former fail -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>

XPath joining multiple elements

I'm looking a way to join every two elements together using XPath2.0:
<item>
<element class='1'>el1</element>
<element class='2'>el2</element>
<break>break</break>
<element class='1'>el3</element>
<element class='2'>el4</element>
<break>break</break>
<element class='1'>el5</element>
<element class='2'>el6</element>
<break>break</break>
<element class='1'>el7</element>
<element class='2'>el8</element>
</item>
I am hoping the result will be like:
el1el2
el3el4
el5el6
el7el8
The are "breaks" between two meaningful elements, and there are also classes to help out, but still I cannot get it done.
Since I'm not familiar with XPath, this is what I can come up so far, and turned out to be wrong, since concatenate needs at least two arguments...
//item/concat(element[preceding-sibling::break | following-sibling::break])
//item/element[#class='1']/concat(., following-sibling::element[1])
You want your result sequence to contain one item for each of the class='1' elements, the value of that item being the concatenation of that element and its next sibling (the matching class='2').
I'm not sure if you're also open for an XSLT 1.0 solution but this works for me with your input xml:
<xsl:template match="/item">
<xsl:apply-templates select="element[1]|break"/>
</xsl:template>
<xsl:template match="element[1]">
<xsl:text>
</xsl:text>
<xsl:value-of select="."/>
<xsl:value-of select="following-sibling::*[1]"/>
</xsl:template>
<xsl:template match="break">
<xsl:text>
</xsl:text>
<xsl:value-of select="following-sibling::*[1]"/>
<xsl:value-of select="following-sibling::*[2]"/>
</xsl:template>
I have two templates that match either the start element or the break element. I use the following-sibling axis to get to the next two elements. The <xsl:text> elements are there to force a linebreak.
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output omit-xml-declaration="yes"/>
<xsl:template match="item">
<xsl:for-each-group select="*" group-starting-with="break">
<xsl:if test="current-group()[1][self::break]">
<xsl:text>
</xsl:text>
</xsl:if>
<xsl:value-of select="current-group()[self::element]" separator=""/>
</xsl:for-each-group>
</xsl:template>
</xsl:stylesheet>

XSLT 1.0: restrict entries in a nodeset

Being relatively new to XSLT I have what I hope is a simple question. I have some flat XML files, which can be pretty big (eg. 7MB) that I need to make 'more hierarchical'. For example, the flat XML might look like this:
<D0011>
<b/>
<c/>
<d/>
<e/>
<b/>
....
....
</D0011>
and it should end up looking like this:
<D0011>
<b>
<c/>
<d/>
<e/>
</b>
<b>
....
....
</D0011>
I have a working XSLT for this, and it essentially gets a nodeset of all the b elements and then uses the 'following-sibling' axis to get a nodeset of the nodes following the current b node (ie. following-sibling::*[position()=$nodePos]). Then recursion is used to add the siblings into the result tree until another b element is found (I have parameterised it of course, to make it more generic).
I also have a solution that just sends the position in the XML of the next b node and selects the nodes after that one after the other (using recursion) via a *[position() = $nodePos] selection.
The problem is that the time to execute the transformation increases unacceptably with the size of the XML file. Looking into it with XML Spy it seems that it is the 'following-sibling' and 'position()=' that take the time in the two respective methods.
What I really need is a way of restricting the number of nodes in the above selections, so fewer comparisons are performed: every time the position is tested, every node in the nodeset is tested to see if its position is the right one. Is there a way to do that ? Any other suggestions ?
Thanks,
Mike
Yes there is a way to do it much more efficiently: See Muenchian grouping. If having looked at this you need more help with the details, let us know. The key you'll need is something like:
<xsl:key name="elements-by-group" match="*[not(self::b)]"
use="generate-id(preceding-sibling::b[1])" />
Then you can iterate over the <b> elements, and for each one, use key('elements-by-group', generate-id()) to get the elements that immediately follow that <b>.
The task of "making the XML more hierarchical" is sometimes called up-conversion, and your scenario is a classic case for it. As you may know, XSLT 2.0 has very useful grouping features that are easier to use than the Muenchian method.
In your case it sounds like you would use <xsl:for-each-group group-starting-with="b" /> or, to parameterize the element name, <xsl:for-each-group group-starting-with="*[local-name() = 'b']" />. But maybe you already considered that and can't use XSLT 2.0 in your environment.
Update:
In response to the request for parameterization, here's a way to do it without a key.
Note though that it may be much slower, depending on your XSLT processor.
<xsl:template match="D0011">
<xsl:for-each select="*[local-name() = $sep]">
<xsl:copy>
<xsl:copy-of select="following-sibling::*[not(local-name() = $sep)
and generate-id(preceding-sibling::*[local-name() = $sep][1]) =
generate-id(current())]" />
</xsl:copy>
</xsl:for-each>
</xsl:template>
As noted in the comment, you can keep the performance benefit of keys by defining several different keys, one for each possible value of the parameter. You then select which key to use by using an <xsl:choose>.
Update 2:
To make the group-starting element be defined based on /*/*[2], instead of based on a parameter, use
<xsl:key name="elements-by-group"
match="*[not(local-name(.) = local-name(/*/*[2]))]"
use="generate-id(preceding-sibling::*
[local-name(.) = local-name(/*/*[2])][1])" />
<xsl:template match="D0011">
<xsl:for-each select="*[local-name(.) = local-name(../*[2])]">
<xsl:copy>
<xsl:copy-of select="key('elements-by-group', generate-id())"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
<xsl:key name="k1" match="D0011/*[not(self::b)]" use="generate-id(preceding-sibling::b[1])"/>
<xsl:template match="D0011">
<xsl:copy>
<xsl:apply-templates select="b"/>
</xsl:copy>
</xsl:template>
<xsl:template match="D0011/b">
<xsl:copy>
<xsl:copy-of select="key('k1', generate-id())"/>
</xsl:copy>
</xsl:template>
This is the fine grained trasversal pattern:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="b[1]" name="group">
<xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::b[1]" mode="group"/>
</xsl:template>
<xsl:template match="b[position()!=1]"/>
<xsl:template match="b" mode="group">
<xsl:call-template name="group"/>
</xsl:template>
</xsl:stylesheet>
Output:
<D0011>
<b>
<c></c>
<d></d>
<e></e>
</b>
<b>
....
....
</b>
</D0011>

Resources