xpath with node(), how to express `node()[.//x]` condition? - xpath

I have a XPath that must match text and tags, except the tag <aa>; so,
./node()[name()!='aa']
is the correct xpath.
But it is insufficient for cases where tag aa is into the node, I need something like,
./node()[name()!='aa' and not(.//aa)]
but this xpath not works (!).
NOTE
I used
./*[not(self::aa or .//aa)] | ./text()
but it lost the original sequence order of the nodes. This problem is more evident when working with XSLT, example:
<xsl:for-each select="./*[not(self::aa or .//aa)] | ./text()">
<xsl:copy-of select="."/>
<xsl:for-each>
not works as expected (the order of nodes is not ensured). When using ./node() the order is always correct.
PS: with XSLT we have a solution using all the explained xpaths,
<xsl:for-each select="./node()[name()!='aa']">
<xsl:if test="not(.//aa)"><xsl:copy-of select="."/><xsl:if>
<xsl:for-each>
but the ideal/simplest one not works with the same result (when processing big and complex inputs),
<xsl:copy-of select="*[not(self::aa or .//aa)] | ./text()"/>

I'm imagining your file looks like:
<?xml version="1.0" encoding="UTF-8"?>
<root>
<aa/>
<b>
<aa/>
</b>
<c>
<b>
<aa/>
</b>
</c>
<d/>
<e>
<b/>
</e>
</root>
Then the expression
//node()[not(descendant-or-self::aa)]
returns all nodes (including the whitespace text nodes) that are not themselves an <aa> element or have an <aa> descendant. Children of <aa> are matched as well.
You'll probably want to do something like
<xsl:copy-of select="node()[not(descendant-or-self::aa)]"/>

Related

XSLT Function Return Type

Originally: **How to apply XPath query to a XML variable typed as element()* **
I wish to apply XPath queries to a variable passed to a function in XSLT 2.0.
Saxon returns this error:
Type error at char 6 in xsl:value-of/#select on line 13 column 50 of stackoverflow_test.xslt:
XTTE0780: Required item type of result of call to f:test is element(); supplied value has item type text()
This skeleton of a program is simplified but, by the end of its development, it is meant to pass an element tree to multiple XSLT functions. Each function will extract certain statistics and create reports from the tree.
When I say apply XPath queries, I mean I wish to have the query consider the base element in the variable... if you please... as if I could write {count(doc("My XSLT tree/element variable")/a[1])}.
Using Saxon HE 9.7.0.5.
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:f="f:f">
<xsl:template match="/root">
<xsl:variable name="first" as="element()*">
<xsl:copy-of select="(./a[1])" />
</xsl:variable>
<html>
<xsl:copy-of select="f:test($first)" />
</html>
</xsl:template>
<xsl:function name="f:test" as="element()*">
<xsl:param name="frstElem" as="element()*" />
<xsl:value-of select="count($frstElem/a)" />
<!-- or any XPath expression -->
</xsl:function>
</xsl:stylesheet>
Some example data
<root>
<a>
<b>
<c>hi</c>
</b>
</a>
<a>
<b>
<c>hi</c>
</b>
</a>
</root>
Possibly related question: How to apply xpath in xsl:param on xml passed as input to xml
What you are doing is perfectly correct, except that you have passed an a element to the function, and the function is looking for an a child of this element, and with your sample data this will return an empty sequence.
If you want f:test() to return the number of a elements in the sequence that is the value of $frstElem, you can use something like
<xsl:value-of select="count($frstElem/self::a)" />
instead of using the (implicit) child:: axis.

Is there any method to get any type of sibling of a particular node in Xpath 2.0

Is there any method to get any type of sibling of a particular node in Xpath 2.0
The axes "following-sibling" only supports for the same type of siblings.
Ex:
<node>
<b name="bold">abc</b>
<div>gef</div>
</node>
I want to select all the sibling of the <b name="bold">.
Is there any method to get any type of sibling of a particular node in Xpath 2.0
The axes following-sibling only supports for the same type of siblings.
Use:
following-sibling::node()
this select all siblings nodes of any type -- elements, text-nodes, processing-instruction nodes and comment nodes.
Here is a complete XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:for-each select="/*/b[#name='bold']/following-sibling::node()">
"<xsl:copy-of select="."/>"
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<node>
<b name="bold">abc</b>
<div>gef</div>
</node>
the XPath expression is applied (off the wanted element) and all selected three nodes are copied to the output:
"
"
"<div>gef</div>"
"
"
As we can see, all sibling nodes are selected -- a whitespace-only text node, a div element and another whitespace-only text node.
Do note: This is an XPath 1.0 expression and I don't believe XPath 2.0 adds any new feature for selecting siblings than what is already in XPath 1.0.
In case by "sibling" you mean something different than the meaning of "sibling" in XPath, then you must define precisely what you mean.
Not sure I understand the question, but how about:
//*[preceding-sibling::b]
That will get all previous siblings of the <b name="bold">abc</b> element. The * selects any type of element.
If you want all siblings:
//*[preceding-sibling::b or following-sibling::b]
And if you want to be more specific in how you select the b element:
//*[preceding-sibling::b[#name="bold"]]

XSLT 1.0: restrict entries in a nodeset

Being relatively new to XSLT I have what I hope is a simple question. I have some flat XML files, which can be pretty big (eg. 7MB) that I need to make 'more hierarchical'. For example, the flat XML might look like this:
<D0011>
<b/>
<c/>
<d/>
<e/>
<b/>
....
....
</D0011>
and it should end up looking like this:
<D0011>
<b>
<c/>
<d/>
<e/>
</b>
<b>
....
....
</D0011>
I have a working XSLT for this, and it essentially gets a nodeset of all the b elements and then uses the 'following-sibling' axis to get a nodeset of the nodes following the current b node (ie. following-sibling::*[position()=$nodePos]). Then recursion is used to add the siblings into the result tree until another b element is found (I have parameterised it of course, to make it more generic).
I also have a solution that just sends the position in the XML of the next b node and selects the nodes after that one after the other (using recursion) via a *[position() = $nodePos] selection.
The problem is that the time to execute the transformation increases unacceptably with the size of the XML file. Looking into it with XML Spy it seems that it is the 'following-sibling' and 'position()=' that take the time in the two respective methods.
What I really need is a way of restricting the number of nodes in the above selections, so fewer comparisons are performed: every time the position is tested, every node in the nodeset is tested to see if its position is the right one. Is there a way to do that ? Any other suggestions ?
Thanks,
Mike
Yes there is a way to do it much more efficiently: See Muenchian grouping. If having looked at this you need more help with the details, let us know. The key you'll need is something like:
<xsl:key name="elements-by-group" match="*[not(self::b)]"
use="generate-id(preceding-sibling::b[1])" />
Then you can iterate over the <b> elements, and for each one, use key('elements-by-group', generate-id()) to get the elements that immediately follow that <b>.
The task of "making the XML more hierarchical" is sometimes called up-conversion, and your scenario is a classic case for it. As you may know, XSLT 2.0 has very useful grouping features that are easier to use than the Muenchian method.
In your case it sounds like you would use <xsl:for-each-group group-starting-with="b" /> or, to parameterize the element name, <xsl:for-each-group group-starting-with="*[local-name() = 'b']" />. But maybe you already considered that and can't use XSLT 2.0 in your environment.
Update:
In response to the request for parameterization, here's a way to do it without a key.
Note though that it may be much slower, depending on your XSLT processor.
<xsl:template match="D0011">
<xsl:for-each select="*[local-name() = $sep]">
<xsl:copy>
<xsl:copy-of select="following-sibling::*[not(local-name() = $sep)
and generate-id(preceding-sibling::*[local-name() = $sep][1]) =
generate-id(current())]" />
</xsl:copy>
</xsl:for-each>
</xsl:template>
As noted in the comment, you can keep the performance benefit of keys by defining several different keys, one for each possible value of the parameter. You then select which key to use by using an <xsl:choose>.
Update 2:
To make the group-starting element be defined based on /*/*[2], instead of based on a parameter, use
<xsl:key name="elements-by-group"
match="*[not(local-name(.) = local-name(/*/*[2]))]"
use="generate-id(preceding-sibling::*
[local-name(.) = local-name(/*/*[2])][1])" />
<xsl:template match="D0011">
<xsl:for-each select="*[local-name(.) = local-name(../*[2])]">
<xsl:copy>
<xsl:copy-of select="key('elements-by-group', generate-id())"/>
</xsl:copy>
</xsl:for-each>
</xsl:template>
<xsl:key name="k1" match="D0011/*[not(self::b)]" use="generate-id(preceding-sibling::b[1])"/>
<xsl:template match="D0011">
<xsl:copy>
<xsl:apply-templates select="b"/>
</xsl:copy>
</xsl:template>
<xsl:template match="D0011/b">
<xsl:copy>
<xsl:copy-of select="key('k1', generate-id())"/>
</xsl:copy>
</xsl:template>
This is the fine grained trasversal pattern:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="node()|#*" name="identity">
<xsl:copy>
<xsl:apply-templates select="node()[1]|#*"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:template>
<xsl:template match="b[1]" name="group">
<xsl:copy>
<xsl:apply-templates select="following-sibling::node()[1]"/>
</xsl:copy>
<xsl:apply-templates select="following-sibling::b[1]" mode="group"/>
</xsl:template>
<xsl:template match="b[position()!=1]"/>
<xsl:template match="b" mode="group">
<xsl:call-template name="group"/>
</xsl:template>
</xsl:stylesheet>
Output:
<D0011>
<b>
<c></c>
<d></d>
<e></e>
</b>
<b>
....
....
</b>
</D0011>

XPath to return default value if node not present

Say I have a pair of XML documents
<Foo>
<Bar/>
<Baz>mystring</Baz>
</Foo>
and
<Foo>
<Bar/>
</Foo>
I want an XPath (Version 1.0 only) that returns "mystring" for the first document and "not-found" for the second. I tried
(string('not-found') | //Baz)[last()]
but the left hand side of the union isn't a node-set
In XPath 1.0, use:
concat(/Foo/Baz,
substring('not-found', 1 div not(/Foo/Baz)))
If you want to handle the posible empty Baz element, use:
concat(/Foo/Baz,
substring('not-found', 1 div not(/Foo/Baz[node()])))
With this input:
<Foo>
<Baz/>
</Foo>
Result: not-found string data type.
Special case:
If you want to get 0 if numeric node is missing or empty, use sum(/Foo/Baz) function
#Alejandro provided the best XPath 1.0 answer, which has been known for years, since first used by Jeni Tennison almost ten years ago.
The only problem with this expression is its shiny elegance, which makes it difficult to understand by not only novice programmers.
In a hosted XPath 1.0 (and every XPath is hosted!) one can use more understandable expressions:
string((/Foo/Baz | $vDefaults[not(/Foo/Baz/text())]/Foo/Baz)[last())
Here the variable $vDefaults is a separate document that has the same structure as the primary XML document, and whose text nodes contain default values.
Or, if XSLT is the hosting language, one can use the document() function:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my">
<xsl:output method="text"/>
<my:defaults>
<Foo>
<Bar/>
<Baz>not-found</Baz>
</Foo>
</my:defaults>
<xsl:template match="/">
<xsl:value-of select=
"concat(/Foo/Baz,
document('')[not(current()/Foo/Baz/text())]
/*/my:defaults/Foo/Baz
)"/>
</xsl:template>
</xsl:stylesheet>
Or, not using concat():
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:my="my:my">
<xsl:output method="text"/>
<my:defaults>
<Foo>
<Bar/>
<Baz>not-found</Baz>
</Foo>
</my:defaults>
<xsl:variable name="vDefaults" select="document('')/*/my:defaults"/>
<xsl:template match="/">
<xsl:value-of select=
"(/Foo/Baz
| $vDefaults/Foo/Baz[not(current()/Foo/Baz/text())]
)
[last()]"/>
</xsl:template>
</xsl:stylesheet>
/Foo/(Baz/string(), 'not-found')[1]
If you are okay with printing an empty string instead of 'not-found' message then use:
/Foo/concat(Baz/text(), '')
Later, you can replace the empty strings with 'not-found'.

XPath "following siblings before"

I'm trying to select elements (a) with XPath 1.0 (or possibly could be with Regex) that are following siblings of particular element (b) but only preceed another b element.
<img><b>First</b><br>
<img> First Href - 19:30<br>
<img><b>Second</b><br>
<img> Second Href - 19:30<br>
<img> Third Href - 19:30<br>
I tried to make the sample as close to real world as possible. So in this scenario when I'm at element
<b>First</b>
I need to select
First Href
and when I'm at
<b>Second</b>
I need to select
Second Href
Third Href
Any idea how to achieve that? Thank you!
Dynamically create this XPath:
following-sibling::a[preceding-sibling::b[1][.='xxxx']]
where 'xxxx' is the replaced with the text of the current <b>.
This is assuming that all the elements actually are siblings. If they are not, you can try to work with the preceding and following axes, or you write a more specific XPath that better resembles document structure.
In XSLT you could also use:
following-sibling::a[
generate-id(preceding-sibling::b[1]) = generate-id(current())
]
Here is a solution which is just a single XPath expression.
Using the Kaysian formula for intersection of two nodesets $ns1 and $ns2:
$ns1[count(. | $ns2) = count($ns2)]
We simply substitute $ns1 with the nodeset of <a> siblings that follow the current <b> node, and we substitute $ns2 with the nodeset of <a> siblings that precede the next <b> node.
Here is a complete transformation that uses this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:apply-templates select="*/b"/>
</xsl:template>
<xsl:template match="b">
At: <xsl:value-of select="."/>
<xsl:variable name="vNextB" select="following-sibling::b[1]"/>
<xsl:variable name="vA-sAfterCurrentB" select="following-sibling::a"/>
<xsl:variable name="vA-sBeforeNextB" select=
"$vNextB/preceding-sibling::a
|
$vA-sAfterCurrentB[not($vNextB)]
"/>
<xsl:copy-of select=
"$vA-sAfterCurrentB
[count(.| $vA-sBeforeNextB)
=
count($vA-sBeforeNextB)
]
"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the following XML document:
<t>
<img/>
<b>First</b>
<br />  
<img/>  
First Href - 19:30
<br />
<img/>
<b>Second</b>
<br />
<img/>  
Second Href - 19:30
<br />
<img/> 
Third Href - 19:30
<br />
</t>
the correct result is produced:
At: First First Href
At: Second Second Href
Third Href

Resources