XSLT split sorted data into different tables - sorting

I'm trying to format my xml data into two HTML tables. I successfully can sort some dummy data xls:sort, but I can't split up the sorted data into different tables.
My xml:
<a>
<b id="N">text1</b>
<b id="N">text2</b>
<b id="N+1">text3</b>
<b id="N">text4</b>
<b id="N+2">text5</b>
<b id="N+3">text6</b>
<b id="N">text7</b>
<b id="N+2">text8</b>
</a>
N is in this case a number, but I don't know which number. It could be 2 and 55, 3 and 4, 44 and 52 and 78 and 98.
Each number I wish to send to their own table, so the result would be:
<table>
<tr><td>text1</td></tr>
<tr><td>text2</td></tr>
<tr><td>text4</td></tr>
<tr><td>text7</td></tr>
</table>
<table>
<tr><td>text3</td></tr>
</table>
<table>
<tr><td>text5</td></tr>
<tr><td>text8</td></tr>
</table>
<table>
<tr><td>text6</td></tr>
</table>
How can I devide the sorted data into different tables depending on their attribute?
Any pointers would be appreciated.

The standard approach to this kind of problem in XSLT 1.0 is called Muenchian grouping. You define a key that groups your target elements in the way you want
<xsl:key name="bsById" match="b" use="#id" />
then use a trick with generate-id to extract just the first node in each group as a proxy for the group as a whole
<xsl:apply-templates select="b[generate-id()
= generate-id(key('bsById', #id)[1])]"
mode="group">
<xsl:sort select="#id" />
</xsl:apply-templates>
So now the following template would fire once per group, and you can use the key function within it to get all the nodes in the group
<xsl:template match="b" mode="group">
<table>
<!-- extract all the nodes that are grouped with this one -->
<xsl:apply-templates select="key('bsById', #id)">
<!-- you could <xsl:sort> here if you want to sort within groups -->
</xsl:apply-templates>
</table>
</xsl:template>
<xsl:template match="b">
<tr><td>...</td></tr>
</xsl:template>
All the above is fine if that example is your entire XML document, but if there's more than one a element within the document each with its own set of b elements that need grouping independently, then the key needs to be more complex. The usual trick here is to use the generate-id of the parent a node as part of the grouping key value for its b children:
<xsl:key name="bsByParentAndId" match="a/b" use="concat(generate-id(..), '|', #id)" />
and for the Muenchian grouping expression
<xsl:template match="a">
<xsl:apply-templates select="b[generate-id()
= generate-id(key('bsByParentAndId', concat(
generate-id(current()), '|', #id))[1])]"
mode="group"/>
</xsl:template>
For the record, if you could use XSLT 2.0 then it becomes significantly easier. No need to define a complex key, you simply use for-each-group
<xsl:template match="a">
<xsl:for-each-group select="b" group-by="#id">
<xsl:sort select="current-grouping-key()" />
<table>
<xsl:apply-templates select="current-group()" />
</table>
</xsl:for-each-group>
</xsl:template>
<xsl:template match="b">
<tr><td>...</td></tr>
</xsl:template>

Related

XSLT Function Return Type

Originally: **How to apply XPath query to a XML variable typed as element()* **
I wish to apply XPath queries to a variable passed to a function in XSLT 2.0.
Saxon returns this error:
Type error at char 6 in xsl:value-of/#select on line 13 column 50 of stackoverflow_test.xslt:
XTTE0780: Required item type of result of call to f:test is element(); supplied value has item type text()
This skeleton of a program is simplified but, by the end of its development, it is meant to pass an element tree to multiple XSLT functions. Each function will extract certain statistics and create reports from the tree.
When I say apply XPath queries, I mean I wish to have the query consider the base element in the variable... if you please... as if I could write {count(doc("My XSLT tree/element variable")/a[1])}.
Using Saxon HE 9.7.0.5.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="f:f">
<xsl:template match="/root">
<xsl:variable name="first" as="element()*">
<xsl:copy-of select="(./a[1])" />
</xsl:variable>
<html>
<xsl:copy-of select="f:test($first)" />
</html>
</xsl:template>
<xsl:function name="f:test" as="element()*">
<xsl:param name="frstElem" as="element()*" />
<xsl:value-of select="count($frstElem/a)" />
<!-- or any XPath expression -->
</xsl:function>
</xsl:stylesheet>
Some example data
<root>
<a>
<b>
<c>hi</c>
</b>
</a>
<a>
<b>
<c>hi</c>
</b>
</a>
</root>
Possibly related question: How to apply xpath in xsl:param on xml passed as input to xml
What you are doing is perfectly correct, except that you have passed an a element to the function, and the function is looking for an a child of this element, and with your sample data this will return an empty sequence.
If you want f:test() to return the number of a elements in the sequence that is the value of $frstElem, you can use something like
<xsl:value-of select="count($frstElem/self::a)" />
instead of using the (implicit) child:: axis.

Saxon he 9.4 performance -xslt

I have following two templates in xslt :
<xsl:template name="calculateAbsoluteEntryNodeIndex">
<!-- current 'entry' node -->
<xsl:param name="entryNode"/>
<!-- current 'entry' node index (position) in xml tree-->
<xsl:param name="entryNodePosition"/>
<xsl:choose>
<!--if the current 'entry' node contains 'namest' attribute then its ('namest') value is treated as
the absolute index (of the current 'entry' node)-->
<xsl:when test="$entryNode/#namest">
<!--writing result-->
<xsl:value-of select="number($entryNode/#namest)"/>
<xsl:text>;</xsl:text>
<xsl:value-of select="number($entryNode/#nameend)"/>
</xsl:when>
<xsl:otherwise>
<!--getting last 'Nameend' attribute value-->
<xsl:variable name="lastNameEndValue">
<xsl:choose>
<!--check if exists any 'entry' node before the current 'entry' node (on the current 'row' level) having 'nameend' attribute defined
('entry' has to have index number less than $entryNodePosition) -->
<xsl:when test="$entryNode/preceding-sibling::entry[#nameend]">
<!--get 'named' attribute value of the last matched "entry" node and convert it to number -->
<xsl:value-of select="number(($entryNode/preceding-sibling::entry/#nameend)[last()])"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="0"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<!--getting 'entry' node index of the matched 'Nameend' attribute -->
<xsl:variable name="lastNameendNodePosition">
<xsl:choose>
<!-- if lastNameEndValue != 0 -->
<xsl:when test="$lastNameEndValue != '0'">
<!-- calculate index of the 'entry' node matched in $lastNameEndValue selection =>it is done by counting all preceding siblings of the node matched in
$lastNameEndValue increased by 1-->
<xsl:value-of select="count(($entryNode/preceding-sibling::entry[#nameend])[last()]/preceding-sibling::entry) + 1"/>
</xsl:when>
<xsl:otherwise>
<xsl:value-of select="0"/>
</xsl:otherwise>
</xsl:choose>
</xsl:variable>
<!--writing result-->
<xsl:value-of select="$entryNodePosition - $lastNameendNodePosition + $lastNameEndValue"/>
<xsl:text>;</xsl:text>
<xsl:value-of select="$entryNodePosition - $lastNameendNodePosition + $lastNameEndValue"/>
</xsl:otherwise>
</xsl:choose>
</xsl:template>
And
I runned some profiling that goes with saxon he 9.4.-TP:profile.html
template calculateAbsoluteEntryNodeIndex 25268028
Total time spent on this template is 174966.587ms.
Whole xslts is executing in Total time: 337196.696 milliseconds.
It seems that is having a problems with transfomration of big tables around 14 thousands lines of xml. Any idea what can be issue here.
Structure of table is.
<?xml version="1.0" encoding="UTF-8"?>
<table tabledef="excel">
<tgroup cols="1">
<colspec colname="1" colnum="1" colwidth="100%"/>
<thead>
<row>
<entry morerows="1">
<p>
Text
</p>
</entry>
</thead>
<tbody>
<row>
<entry align="left">
<p>1</p>
</entry>
</row>
</tbody>
</tgroup>
</table>
There's not really enough information here: for example, what's the typical number of sibling entry elements within a row? And how often is this template executed? I guess you are probably executing it once per entry, and it is obviously quadratic in the number of entries.
The repetition of the expression $entryNode/preceding-sibling::entry[#nameend] is obviously wasteful.
It's very hard to offer advice on whether there are other ways of writing this that would go faster, without knowing anything about what the code is actually doing. Perhaps using xsl:number for some of the counting would work better; it's very difficult to tell. Alternatively, instead of doing a "for-each" that processes each entry independently, consider doing sibling recursion that works forwards through the nodes, passing parameter information about nodes already processed so you don't have to search backwards to preceding-sibling nodes.

xpath: how to select items between item A and item B

I have an HTML page with this structure:
<big><b>Staff in:</b></big>
<br>
<a href='...'>Movie 1</a>
<br>
<a href='...'>Movie 2</a>
<br>
<a href='...'>Movie 3</a>
<br>
<br>
<big><b>Cast in:</b></big>
<br>
<a href='...'>Movie 4</a>
How do I select Movies 1, 2, and 3 using Xpath?
I wrote this query
'//big/b[text()="Staff in:"]/following::a'
but it returns Movies 1, 2, 3, and 4. I guess I need to find a way to get items after <big><b>Staff in: but before the next <big>.
Thanks,
Assuming that <big><b>Staff in:</b></big> is a unique element that we can use as 'anchor', you can try this way :
//big[b='Staff in:']/following-sibling::a[preceding-sibling::big[1][b='Staff in:']]
Basically, the xpath finds all <a> that is following sibling of the 'anchor' <big> element mentioned above, and restrict the result to those having nearest preceding sibling <big> equals the anchor element.
output in xpath tester given markup in question as input (with minimal adjustment to make it well-formed XML) :
Element='Movie 1'
Element='Movie 2'
Element='Movie 3'
//a[preceding::b[text()="Staff in:"] and following::b[text()="Cast in:"]]
Returns all a after the element b with text Staff in: but before the element b with the text Cast in:.
You may need to add some more conditions to make it more specific depending on whether or not these b elements are unique on the page.
Just to add up and following the stackoverflow link here XPath axis, get all following nodes until here is the complete solution that i have worked up with xslt editor. Firstly /*/ is used instead of // as this is faster. Second the logic says all anchor nodes which are siblings of big are returned if they satisfy the inner condition that they have preceding sibling of big node equal to what they are following. Also presumed you have distinct big node.
The x-path looks like
/*/big[b="Cast in:"]/following-sibling::a [1 = count(preceding-sibling::big[1]| ../big[b="Cast in:"])]
The xslt solution looks like
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My Movie Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
</tr>
<xsl:variable name="placeholder" select="/*/big" />
<xsl:for-each select="$placeholder">
<xsl:variable name="i" select="position()" />
<b>
<xsl:value-of select="$i" />
<xsl:value-of select="$placeholder[$i]" />
</b>
<xsl:for-each
select="following-sibling::a [1 = count(preceding-
sibling::big[1]| ../big[b=$placeholder[$i]])]">
<tr>
<td>
<xsl:value-of select="." />
</td>
</tr>
</xsl:for-each>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

xslt 1.0 preceding-sibling for sorted group

I need to run a conditional action based on the preceding sibling in a sorted group. I know that the preceding-sibling function acts on the original document, not the sorted results. Is there a way to operate on the sorted results list? I do not think the muenchian grouping method is what I need because I do not want to group based on the preceding-sibling.
Given the xml below I want to sort by the value of the container, and then test to see if the type attribute of the preceding-sibling (within the sorted results) is different, if it is I need to output the value of the new #type, but I do not want the results sorted by #type.
XML
<c>
<did>
<container id="cid1059023" type="Box">C 3</container>
<container id="cid1059004" type="Map-case">C 1</container>
<container id="cid1059002" type="Binder">OSxxx-3</container>
<container id="cid1059042" type="Box">OSxxx-1</container>
</did>
</c>
<c>
<did>
<container id="cid1059025" type="Box">C 4</container>
<container id="cid1059006" type="Map-case">C 2</container>
</did>
</c>
XSL
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:xd="http://www.oxygenxml.com/ns/doc/xsl" version="1.0">
<xsl:template match="/">
<table>
<xsl:for-each select="child::*/container[#id]">
<xsl:sort select="."/>
<tr>
<td class="container">
<xsl:if test="#type != preceding-sibling::*/#type">
<xsl:value-of select="#type"/>
</xsl:if>
<xsl:value-of select="."/>
</td>
</tr>
</xsl:for-each>
</table>
</xsl:template>
</xsl:stylesheet>
Thanks.
I don't see how you can do this with XSLT 1.0 without using extension. So I would either use XSLT 2.0, or, if you have someone with a gun pointed at you yelling you shall use XSLT 1.0, then you could create a pipeline with two XSLT steps, do the sorting in a first step, and the filtering in a second.

XSLT 1.0: restrict entries in a nodeset

Being relatively new to XSLT I have what I hope is a simple question. I have some flat XML files, which can be pretty big (eg. 7MB) that I need to make 'more hierarchical'. For example, the flat XML might look like this:
<D0011>
<b/>
<c/>
<d/>
<e/>
<b/>
....
....
</D0011>
and it should end up looking like this:
<D0011>
<b>
<c/>
<d/>
<e/>
</b>
<b>
....
....
</D0011>
I have a working XSLT for this, and it essentially gets a nodeset of all the b elements and then uses the 'following-sibling' axis to get a nodeset of the nodes following the current b node (ie. following-sibling::*[position()=$nodePos]). Then recursion is used to add the siblings into the result tree until another b element is found (I have parameterised it of course, to make it more generic).
I also have a solution that just sends the position in the XML of the next b node and selects the nodes after that one after the other (using recursion) via a *[position() = $nodePos] selection.
The problem is that the time to execute the transformation increases unacceptably with the size of the XML file. Looking into it with XML Spy it seems that it is the 'following-sibling' and 'position()=' that take the time in the two respective methods.
What I really need is a way of restricting the number of nodes in the above selections, so fewer comparisons are performed: every time the position is tested, every node in the nodeset is tested to see if its position is the right one. Is there a way to do that ? Any other suggestions ?
Thanks,
Mike
Yes there is a way to do it much more efficiently: See Muenchian grouping. If having looked at this you need more help with the details, let us know. The key you'll need is something like:
<xsl:key name="elements-by-group" match="*[not(self::b)]"
use="generate-id(preceding-sibling::b[1])" />
Then you can iterate over the <b> elements, and for each one, use key('elements-by-group', generate-id()) to get the elements that immediately follow that <b>.
The task of "making the XML more hierarchical" is sometimes called up-conversion, and your scenario is a classic case for it. As you may know, XSLT 2.0 has very useful grouping features that are easier to use than the Muenchian method.
In your case it sounds like you would use <xsl:for-each-group group-starting-with="b" /> or, to parameterize the element name, <xsl:for-each-group group-starting-with="*[local-name() = 'b']" />. But maybe you already considered that and can't use XSLT 2.0 in your environment.
Update:
In response to the request for parameterization, here's a way to do it without a key.
Note though that it may be much slower, depending on your XSLT processor.
<xsl:template match="D0011">
<xsl:for-each select="*[local-name() = $sep]">
<xsl:copy>
<xsl:copy-of select="following-sibling::*[not(local-name() = $sep)
and generate-id(preceding-sibling::*[local-name() = $sep][1]) =
generate-id(current())]" />
</xsl:copy>
</xsl:for-each>
</xsl:template>
As noted in the comment, you can keep the performance benefit of keys by defining several different keys, one for each possible value of the parameter. You then select which key to use by using an <xsl:choose>.
Update 2:
To make the group-starting element be defined based on /*/*[2], instead of based on a parameter, use
<xsl:key name="elements-by-group"
match="*[not(local-name(.) = local-name(/*/*[2]))]"
use="generate-id(preceding-sibling::*
[local-name(.) = local-name(/*/*[2])][1])" />
<xsl:template match="D0011">
<xsl:for-each select="*[local-name(.) = local-name(../*[2])]">
<xsl:copy>
<xsl:copy-of select="key('elements-by-group', generate-id())"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
<xsl:key name="k1" match="D0011/*[not(self::b)]" use="generate-id(preceding-sibling::b[1])"/>
<xsl:template match="D0011">
<xsl:copy>
<xsl:apply-templates select="b"/>
</xsl:copy>
</xsl:template>
<xsl:template match="D0011/b">
<xsl:copy>
<xsl:copy-of select="key('k1', generate-id())"/>
</xsl:copy>
</xsl:template>
This is the fine grained trasversal pattern:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="b[1]" name="group">
<xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::b[1]" mode="group"/>
</xsl:template>
<xsl:template match="b[position()!=1]"/>
<xsl:template match="b" mode="group">
<xsl:call-template name="group"/>
</xsl:template>
</xsl:stylesheet>
Output:
<D0011>
<b>
<c></c>
<d></d>
<e></e>
</b>
<b>
....
....
</b>
</D0011>

Resources