XSLT 1.0: restrict entries in a nodeset - performance

Being relatively new to XSLT I have what I hope is a simple question. I have some flat XML files, which can be pretty big (eg. 7MB) that I need to make 'more hierarchical'. For example, the flat XML might look like this:
<D0011>
<b/>
<c/>
<d/>
<e/>
<b/>
....
....
</D0011>
and it should end up looking like this:
<D0011>
<b>
<c/>
<d/>
<e/>
</b>
<b>
....
....
</D0011>
I have a working XSLT for this, and it essentially gets a nodeset of all the b elements and then uses the 'following-sibling' axis to get a nodeset of the nodes following the current b node (ie. following-sibling::*[position()=$nodePos]). Then recursion is used to add the siblings into the result tree until another b element is found (I have parameterised it of course, to make it more generic).
I also have a solution that just sends the position in the XML of the next b node and selects the nodes after that one after the other (using recursion) via a *[position() = $nodePos] selection.
The problem is that the time to execute the transformation increases unacceptably with the size of the XML file. Looking into it with XML Spy it seems that it is the 'following-sibling' and 'position()=' that take the time in the two respective methods.
What I really need is a way of restricting the number of nodes in the above selections, so fewer comparisons are performed: every time the position is tested, every node in the nodeset is tested to see if its position is the right one. Is there a way to do that ? Any other suggestions ?
Thanks,
Mike

Yes there is a way to do it much more efficiently: See Muenchian grouping. If having looked at this you need more help with the details, let us know. The key you'll need is something like:
<xsl:key name="elements-by-group" match="*[not(self::b)]"
use="generate-id(preceding-sibling::b[1])" />
Then you can iterate over the <b> elements, and for each one, use key('elements-by-group', generate-id()) to get the elements that immediately follow that <b>.
The task of "making the XML more hierarchical" is sometimes called up-conversion, and your scenario is a classic case for it. As you may know, XSLT 2.0 has very useful grouping features that are easier to use than the Muenchian method.
In your case it sounds like you would use <xsl:for-each-group group-starting-with="b" /> or, to parameterize the element name, <xsl:for-each-group group-starting-with="*[local-name() = 'b']" />. But maybe you already considered that and can't use XSLT 2.0 in your environment.
Update:
In response to the request for parameterization, here's a way to do it without a key.
Note though that it may be much slower, depending on your XSLT processor.
<xsl:template match="D0011">
<xsl:for-each select="*[local-name() = $sep]">
<xsl:copy>
<xsl:copy-of select="following-sibling::*[not(local-name() = $sep)
and generate-id(preceding-sibling::*[local-name() = $sep][1]) =
generate-id(current())]" />
</xsl:copy>
</xsl:for-each>
</xsl:template>
As noted in the comment, you can keep the performance benefit of keys by defining several different keys, one for each possible value of the parameter. You then select which key to use by using an <xsl:choose>.
Update 2:
To make the group-starting element be defined based on /*/*[2], instead of based on a parameter, use
<xsl:key name="elements-by-group"
match="*[not(local-name(.) = local-name(/*/*[2]))]"
use="generate-id(preceding-sibling::*
[local-name(.) = local-name(/*/*[2])][1])" />
<xsl:template match="D0011">
<xsl:for-each select="*[local-name(.) = local-name(../*[2])]">
<xsl:copy>
<xsl:copy-of select="key('elements-by-group', generate-id())"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>

<xsl:key name="k1" match="D0011/*[not(self::b)]" use="generate-id(preceding-sibling::b[1])"/>
<xsl:template match="D0011">
<xsl:copy>
<xsl:apply-templates select="b"/>
</xsl:copy>
</xsl:template>
<xsl:template match="D0011/b">
<xsl:copy>
<xsl:copy-of select="key('k1', generate-id())"/>
</xsl:copy>
</xsl:template>

This is the fine grained trasversal pattern:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="b[1]" name="group">
<xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::b[1]" mode="group"/>
</xsl:template>
<xsl:template match="b[position()!=1]"/>
<xsl:template match="b" mode="group">
<xsl:call-template name="group"/>
</xsl:template>
</xsl:stylesheet>
Output:
<D0011>
<b>
<c></c>
<d></d>
<e></e>
</b>
<b>
....
....
</b>
</D0011>

Related

Unable to use apply-templates with node-set - XSLT

I am trying to get distinct values of shelf and add them to xml tree using node-set. I am trying to print the values of node-set using apply-templates. But I am doing something wrong. Kindly correct me. Thanks
<xsl:template name="form_list">
<xsl:for-each select="//library/section/#shelf[generate-id()= generate-id(key('distinct_shelfs',.)[1])]">
<xsl:variable name="shelfs">
<ml>
<m><xsl:value-of select="."/></m>
</ml>
</xsl:variable>
</xsl:for-each>
<xsl:variable name="Shelf_Set"><xsl:value-of select="exsl:node-set($shelfs)/ml"/></xsl:variable>
</xsl:template>
<xsl:call-template name="PrintShelfs"/>
<xsl:template name="PrintShelfs">
<xsl:apply-templates select="$Shelf_Set" mode="m1"/>
</xsl:template>
<xsl:template match="ml" mode="m1" >
<a><xsl:value-of select='.'/></a>
</xsl:template>
The xsl:value-of instruction flattens a node-set to a single text node. I suspect you want simply
<xsl:variable name="Shelf_Set" select="exsl:node-set($shelfs)/ml"/>

Find position of a node with specific text among identical nodes using xpath

There is a set of trademark nodes <tm> in a single document. Each node <tm> contains the text node inside - trademark name. There may be identical nodes among tm's that means they have the same trademark name. I need to write the template that will add the trademark character ™ (™) only to the first occurrence of each trademark.
Example:
<doc>
<a><tm>A</tm></a>
<tm>A</tm>
<tm>B</tm>
<b><tm>B</tm></b>
<a><b><c><tm>A</tm></c></b></a>
</doc>
Only the first occurrences of <tm>A</tm> and <tm>B</tm> should be processed.
The expected result is:
<doc>
<a><tm>A™</tm></a>
<tm>A</tm>
<tm>B™</tm>
<b><tm>B</tm></b>
<a><b><c><tm>A</tm></c></b></a>
</doc>
The difficulty here is that there are identical nodes. Besides, I cannot write a separate template for each trademark, one template should match all.
Here is a draft of the solution:
<xsl:template match="tm">
<xsl:variable name="text" select="text()"/>
<xsl:variable name="same_tms" select="//tm[text()=$text]"/>
<xsl:if test=" --- current tm is the first among $same_tms --- ">
<xsl:value-of select="concat(text(), '™')"/>
</xsl:if>
</xsl:template>
I don't know how to write a generic test condition that would check if the current <tm> is the first among $same_tms. Is it possible?
Use a key, as in Muenchian grouping (http://www.jenitennison.com/xslt/grouping/muenchian.xml), only that with XSLT 2.0 you can use is instead of the generate-id() test you would need in XSLT 1.0:
<xsl:transform xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:key name="tm" match="tm" use="."/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="tm[. is key('tm', .)[1]]">
<xsl:copy>
<xsl:value-of select="concat(., '™')"/>
</xsl:copy>
</xsl:template>
</xsl:transform>
Online as http://xsltransform.net/ncdD7mC.

XPath selection excluding element from a child node

I need to select all <next> nodes, but excluding <element4> from each and add a new element in its place (it would be a replace). I'm working with php.
<root>
<next>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<element4>text</element1>
</node>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<element4>text</element1>
</node>
</next>
</root>
so it should look like this:
<next>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<new>text</new>
</node>
<node>
<element1>text</element1>
<element2>text</element1>
<element3>text</element1>
<new><int xmlns="foo.bar">0</int></new>
</node>
</next>
Any tips? Thank you!
XPath is a selection language: it selects nodes or atomic items from an input sequence, it is the language of choice to make a selection over XML or hierarchical data, as SQL is (usually) the language of choice with relational databases.
As such, you can exclude elements from the selection, but you cannot update or change the original sequence. It is possible to do a limited transformation (i.e., turn a string in an integer), but this will change what is selected, it will not change the source. While XPath (namely version 2.0 and up) can "create" atomic values on the fly, it cannot create new elements.
This is possible and will return numeric values in XPath 2.0:
/next/node/number(.)
But this is not possible:
/next/node/(if (element4) then create-element(.) else .)
However, in XSLT 2.0 and up you can create a function that creates elements. As said above, XPath selects, and if you want to change the document, you can create a new document using XSLT (the T standing for Transformation).
Something like the following (partial XSLT 2.0, you need add headers):
<xsl:function name="f:create">
<xsl:param name="node" />
<xsl:param name="name" />
<xsl:choose>
<xsl:when test="name($node) = $name">
<xsl:element name="{if(number($node)) then 'int' else 'new'}">
<xsl:value-of select="$node" />
</xsl:element>
</xsl:when>
<xsl:otherwise><xsl:copy-of select="$node" /></xsl:otherwise>
</xsl:choose>
</xsl:function>
<xsl:template match="node">
<!-- now XPath, with help of XSLT function, can conditionally create nodes -->
<xsl:copy-of select="child::*/create(., 'element4')" />
</xsl:template>
<!-- boilerplate code, typically used to recursively copy non-matched nodes -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>
Note that, while this shows how you can create a different element using XPath and an XSLT function, it does not change the source, it changes the output. Also, it is not a recommended practice, as in XSLT the same pattern is more easily done by simply doing:
<!-- the more specific match -->
<xsl:template match="element4[number(.)]">
<new>
<int xmlns="foo.bar">
<xsl:value-of select="number(.)" />
</int>
</new>
<xsl:template>
<!-- XSLT will automatically fallback to this one if the former fails -->
<xsl:template match="element4">
<new><xsl:copy-of select="node()" /></new>
</xsl:template>
<!-- or this one, if both the former fail -->
<xsl:template match="node() | #*">
<xsl:copy>
<xsl:apply-templates select="#* | node()" />
</xsl:copy>
</xsl:template>

XPath 1.0 Order of returned attributes in a UNION

<merge>
<text>
<div begin="A" end="B" />
<div begin="C" end="D" />
<div begin="E" end="F" />
<div begin="G" end="H" />
</text>
</merge>
I need a UNIONed set of attribute nodes, in the order A,B,C,D,E,F,G,H, and this will work:
/merge/text/div/#begin | /merge/text/div/#end
but only if each #begin comes before each #end, since the UNION operator is spec'd to return nodes in document order. (Yes?)
I need the nodeset to be in the same order, even if the attributes appear in a different order in the document, as here:
<merge>
<text>
<div end="B" begin="A" />
<div begin="C" end="D" />
<div end="F" begin="E" />
<div begin="G" end="H" />
</text>
</merge>
That is, I need elements to follow document order, but the attributes in each element to follow a determined order (either specified or alphabetical by attribute name).
This simply isn't possible in pure XPath. First of all, attributes in XML are unordered. From the XML 1.0 Recommendation:
Note that the order of attribute specifications in a start-tag or
empty-element tag is not significant.
An XPath engine might be reading and storing them in the order they appear in the document, but in terms of the spec, this is just a happy coincidence that cannot be relied upon.
Second, XPath has no sorting functionality. So, your best option is to sort the elements in your host language (e.g. XSLT or a general-purpose PL) after they've been selected.
Here's how to sort those attributes by value in XSLT:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:apply-templates
select="/merge/text/div/#*[name()='begin' or name()='end']">
<xsl:sort select="."/>
</xsl:apply-templates>
</xsl:template>
</xsl:stylesheet>
Note that I also merged your two expressions into one.
Edit: Use the following to output begin/end pairs in document order (as described in the comments):
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:strip-space elements="*"/>
<xsl:template match="div">
<xsl:value-of select="concat(#begin, #end)"/>
</xsl:template>
</xsl:stylesheet>

xsl sorting on an an average value of 3 child elements

I have the following xml. What I want to do with my XSL is sort the output on the total value of the elements productDesignRating, productPriceRating and productPerfromanceRating. So far no luck in trying to get this done. Any help will be appreciated i need to be able to do this in xsl 1 so no xsl2 functions.
<DocumentElement xmlns="DotNetNuke/UserDefinedTable">
<QueryResults>
<productCategory>cat1</productCategory>
<productTitle>product1</productTitle>
<productImage><img alt="productImage" title="productImage" src="/skinconversion/Portals/12/babynokiko.jpg" /></productImage>
<productDesignRating>3</productDesignRating>
<productPriceRating>4</productPriceRating>
<productPerformanceRating>4</productPerformanceRating>
<productPrice>10</productPrice>
<productSummary>description</productSummary>
<productUrl>http://www.2dnn.com</productUrl>
</QueryResults>
<QueryResults>
<productCategory>cat2</productCategory>
<productTitle>product2</productTitle>
<productImage><img alt="productImage" title="productImage" src="/skinconversion/Portals/12/babynokiko.jpg" /></productImage>
<productDesignRating>3</productDesignRating>
<productPriceRating>3</productPriceRating>
<productPerformanceRating>3</productPerformanceRating>
<productPrice>10</productPrice>
<productSummary>description</productSummary>
<productUrl>http://www.2dnn.com</productUrl>
</QueryResults>
<QueryResults>
<productCategory>cat3</productCategory>
<productTitle>product3</productTitle>
<productImage><img alt="productImage" title="productImage" src="/skinconversion/Portals/12/babynokiko.jpg" /></productImage>
<productDesignRating>1</productDesignRating>
<productPriceRating>2</productPriceRating>
<productPerformanceRating>3</productPerformanceRating>
<productPrice>56</productPrice>
<productSummary>description</productSummary>
<productUrl>http://www.2dnn.com</productUrl>
</QueryResults>
</DocumentElement>
Try this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="DotNetNuke/UserDefinedTable">
<xsl:template match="node()">
<xsl:copy>
<xsl:apply-templates select="node()">
<xsl:sort select="my:productDesignRating +
my:productPriceRating +
my:productPerformanceRating"
data-type="number"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
This is just a simplified identity template (not processing any attributes), but applying a numerical sort on the sum of the values of the 3 specified child elements – if they are present.
Specifically note the use of the data-type attribute, allowing to specify that the sorting base should be numbers, so the order here is 6,9,11 (for strings, which is default, it would be 11,6,9)…
[To reverse the order, simply add the order="descending" attribute]

Resources