Xpath query to select all elements ignoring descendants based on criteria - xpath

I am trying to select all the <set>erase</set> elements such that if two or more elements have the <set>erase</set> in hierarchy (Ex: <b> and <d> both have <set>erase</set>) then only the element in parent node name has to be selected(ie <b> in this case).
Sample xml below:
<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<a>
<b>
<set>erase</set>
<d>
<set>erase</set>
</d>
</b>
<c>
<x></x>
</c>
<e>
<y>
<set>erase</set>
<q></q>
</y>
<z>
<p>
<set>erase</set>
</p>
</z>
</e>
</a>
When I use the query = //set[contains(.,'erase')] I get all <set>erase</set> of nodesList b,d,y,p in result set.
I need help in framing the query to select <set>erase</set> of b , y and p.

Here is the same solution:
One XPath expression that selects exactly the wanted elements is:
//*[set[. = 'erase' and not(node()[2])]
and
not(ancestor::*
[set
[. = 'erase' and not(node()[2])]
]
)
]
XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:for-each select=
"//*[set[. = 'erase' and not(node()[2])]
and
not(ancestor::*
[set
[. = 'erase' and not(node()[2])]
]
)
]">
<xsl:value-of select="name()"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<a>
<b>
<set>erase</set>
<d>
<set>erase</set>
</d>
</b>
<c>
<x></x>
</c>
<e>
<y>
<set>erase</set>
<q></q>
</y>
<z>
<p>
<set>erase</set>
</p>
</z>
</e>
</a>
The contained XPath expression is evaluated and the names of the selected elements are output -- correctly and as expected:
b
y
p
If you need to select the set children of the selected above elements, just append the above XPath expression with /set:
//*[set[. = 'erase' and not(node()[2])]
and
not(ancestor::*
[set
[. = 'erase' and not(node()[2])]
]
)
]
/set
Again, XSLT - based verification:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"//*[set[. = 'erase' and not(node()[2])]
and
not(ancestor::*
[set
[. = 'erase' and not(node()[2])]
]
)
]
/set
"/>
</xsl:template>
</xsl:stylesheet>
This transformation just evaluates the above XPath expression and copies to the output the correctly selected three set elements:
<set>erase</set>
<set>erase</set>
<set>erase</set>

Related

How select nodes combining preceding-sibling and following sibling?

I want to select all nodes preceding-sibling A and following-sibling A, excluding following-sibling C and D
XML :
<XMLCODE>
<ex>
<z>bla</z>
<z>bla</z>
<A/>
<k>want</k>
<b>want</b>
<A/>
<b>bla</b>
<h>bla</h>
<C/>
<z>bla</z>
<D/>
<e>bla</e>
<A/>
<j>want</j>
<A/>
<i>bla</i>
<C/>
<y>bla</y>
<C/>
<y>bla</y>
</ex>
</XMLCODE>
output:
<k>want</k>
<b>want</b>
<j>want</j>
I tried
//*[
preceding-sibling::*[self::A ]
and
following-sibling::*[self::A ]
]
[not(self::A)]
Thanks
This is how I would approach this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="trail" match="*[not(self::A)]" use="generate-id(preceding-sibling::A[1])" />
<xsl:template match="/XMLCODE">
<result>
<xsl:for-each select="ex/A[position() mod 2 = 1]">
<xsl:copy-of select="key('trail', generate-id())"/>
</xsl:for-each>
</result>
</xsl:template>
</xsl:stylesheet>
Applied to your input example, this will return:
Result
<?xml version="1.0" encoding="UTF-8"?>
<result>
<k>want</k>
<b>want</b>
<j>want</j>
</result>
This is actually an XSLT 1.0 method. In XSLT 2.0 you could ostensibly do something with:
<xsl:for-each-group select="ex/*" group-starting-with="A">
but I don't see an elegant method to distinguish between the "on" and "off" groups, since the first group could start with an A or not.
Use this XPath 1.0 expression:
/*/ex/*[not(self::A or self::B or self::C)
and
(
preceding-sibling::A[1] | preceding-sibling::B[1] | preceding-sibling::C[1]
)[last()][self::A]
and
(
following-sibling::A[1] | following-sibling::B[1] | following-sibling::C[1]
)[1][self::A]
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/ex/*[
not(self::A or self::B or self::C)
and
(
preceding-sibling::A[1] | preceding-sibling::B[1] | preceding-sibling::C[1]
)[last()][self::A]
and
(
following-sibling::A[1] | following-sibling::B[1] | following-sibling::C[1]
)[1][self::A]
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<XMLCODE>
<ex>
<z>bla</z>
<z>bla</z>
<A/>
<k>want</k>
<b>want</b>
<A/>
<b>bla</b>
<h>bla</h>
<C/>
<z>bla</z>
<D/>
<e>bla</e>
<A/>
<j>want</j>
<A/>
<i>bla</i>
<C/>
<y>bla</y>
<C/>
<y>bla</y>
</ex>
</XMLCODE>
The Xpath expression is evaluated and the results of this are copied to the output --the correct, wanted result is produced:
<k>want</k>
<b>want</b>
<j>want</j>
I think you can do
let $A := //A
return for-each-pair(
$A[position() mod 2 = 1],
$A[position() mod 2 = 0],
function($A1, $A2) {
$A1/following-sibling::* intersect $A2/preceding-sibling::*
}
)
XPath 3.1 with higher-order function support but Saxon 10 and later in all editions, SaxonJS 2 and Saxon 9.8/9.9 PE/EE do that.
Using xslt-1.0 and exslt to extract element nodes between pairs of A elements:
xmlstarlet select -t \
-m '*/*/A[following-sibling::A][count(preceding-sibling::A) mod 2 = 0]' \
-c 'set:leading(following-sibling::*,following-sibling::A[1])' \
file.xml
-m iterates over A element pairs, changing the context to the first A
-c copies following sibling elements up to, and excluding, the second A
set:leading documentation on github.io,
implementation on github.com.
Output:
<k>want</k><b>want</b><j>want</j>
To have xmlstarlet select list the generated XSLT add a -C
option before -t:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" xmlns:set="http://exslt.org/sets" version="1.0" extension-element-prefixes="exslt set">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="*/*/A[following-sibling::A][count(preceding-sibling::A) mod 2 = 0]">
<xsl:copy-of select="set:leading(following-sibling::*,following-sibling::A[1])"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Sorting in xslt does not work

I am trying to sort the xml based on the field value person_id_external.
The code which I am using is:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()" />
</xsl:copy>
</xsl:template>
<xsl:template match="A/B">
<xsl:copy>
<xsl:apply-templates>
<xsl:sort select="C/person_id_external" order="ascending" />
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
The payload is:
<A>
<B>
<C>
<logon_user_name>10027</logon_user_name>
<person_id>1100111</person_id>
<person_id_external>10027</person_id_external>
</C>
</B>
<B>
<C>
<logon_user_name>428122</logon_user_name>
<person_id>11141</person_id>
<person_id_external>111358</person_id_external>
</C>
</B>
<B>
<C>
<logon_user_name>428122</logon_user_name>
<person_id>100441</person_id>
<person_id_external>10636</person_id_external>
</C>
</B>
</A>
The result provides a copy of the input but does not sort.
Expected result is :
<A>
<B>
<C>
<logon_user_name>10027</logon_user_name>
<person_id>1100111</person_id>
<person_id_external>10027</person_id_external>
</C>
</B>
<B>
<C>
<logon_user_name>428122</logon_user_name>
<person_id>11141</person_id>
<person_id_external>10636</person_id_external>
</C>
</B>
<B>
<C>
<logon_user_name>428122</logon_user_name>
<person_id>100441</person_id>
<person_id_external>111358</person_id_external>
</C>
</B>
</A>
Cheers,
Vikcy
-- edited in response to your edit --
In your example, each B node has only one C node. Therefore, you must sort the B nodes in order to get the expected result - and you must do so from the context of their parent A:
<xsl:template match="A">
<xsl:copy>
<xsl:apply-templates>
<xsl:sort select="C/person_id_external" order="ascending"/>
</xsl:apply-templates>
</xsl:copy>
</xsl:template>
Note that the default sort data-type is text (i.e. alphabetical).
The below code works:
<xsl:template match="A">
<xsl:copy>
<xsl:apply-templates>
<xsl:sort select="C/person_id_external" data-type="number" order="ascending"/>
</xsl:apply-templates>
</xsl:copy>
The incoming payload had the type as number.
Cheers,
Vikas Singh

how can I get a list of indexes of nodes that have a value using xpath

using the following;
<a>
<b>false</b>
<b>true</b>
<b>false</b>
<b>false</b>
<b>true</b>
</a>
I want to get the following result using something like
/a/b[.='true'].position()
for a result like
2,5 (as in a collection of the 2 positions)
I. XPath 1.0 solution:
Use:
count(/*/*[.='true'][1]/preceding-sibling::*)+1
This produces the position of the first b element whose string value is "true":
2
Repeat the evaluation of a similar expression, where [1] is replaced by [2] ,..., etc, up to count(/*/*[.='true'])
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:for-each select="/*/*[.='true']">
<xsl:variable name="vPos" select="position()"/>
<xsl:value-of select=
"count(/*/*[.='true'][$vPos]
/preceding-sibling::*) +1"/>
<xsl:text>
</xsl:text>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<a>
<b>false</b>
<b>true</b>
<b>false</b>
<b>false</b>
<b>true</b>
</a>
The XPath expression is constructed and evaluated for everyb, whose string value is"true". The results of these evaluations are copied to the output:
2
5
II. XPath 2.0 solution:
Use:
index-of(/*/*, 'true')
XSLT 2.0 - based verification:
<xsl:stylesheet version="2.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text"/>
<xsl:template match="/">
<xsl:sequence select="index-of(/*/*, 'true')"/>
</xsl:template>
</xsl:stylesheet>
When this XSLT 2.0 transformation is applied on the same XML document (above), the XPath 2.0 expression is evaluated and the result of this evaluation is copied to the output:
2 5
A basic (& working) approach in python language :
from lxml import etree
root = etree.XML("""
<a>
<b>false</b>
<b>true</b>
<b>false</b>
<b>false</b>
<b>true</b>
</a>
""")
c = 0
lst = []
for i in root.xpath('/a/b/text()'):
c+=1
if i == 'true':
lst.append(str(c))
print ",".join(lst)

Longer node in XPath

I'd like to use XPath to retrieve the longer of two nodes.
E.g., if my XML is
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
And I do document.SelectSingleNode(your XPath here)
I would expect to get back the url1 node. If url2 is longer, or there is no url1 node, I'd expect to get back the url2 node.
Seems simple but I'm having trouble figuring it out. Any ideas?
This works for me, but it is ugly. Cannot you do the comparison outside XPath?
record/*[starts-with(name(),'url')
and string-length(.) > string-length(preceding-sibling::*[1])
and string-length(.) > string-length(following-sibling::*[1])]/text()
<xsl:for-each select="*">
<xsl:sort select="string-length(.)" data-type="number"/>
<xsl:if test="position() = last()">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
Even works in XSLT 1.0!
Use this single XPath expression:
/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
the Xpath expression is evaluated and the result of this evaluation (the selected element) is copied to the output:
<url1>http://www.google.com</url1>

Navigate HTML table columns with XPath 1.0

Using only an XPath expression (and not in XSLT or DOM - just pure XPath), I'm trying to create a relative path from the current node (in a td) to an associated td in the same column of the same HTML table.
For example, suppose I have this type of data:
<table>
<tr> <td><a>Blue Jeans</a></td> <td><a>Shirt</a></td> </tr>
<tr> <td><span>$21.50</span></td> <td><span>$18.99</span></td> </tr>
</table>
and I'm on the a with "Blue Jeans" and want to find the price ($21.50). In XSLT, I could use the current() function to get the answer like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:apply-templates select="//a" />
</xsl:template>
<xsl:template match="a">
Name: <xsl:value-of select="."/>
Price: <xsl:value-of select="../../following-sibling::tr[1]/td[position() = count(current()/../preceding-sibling::td) + 1]" />
</xsl:template>
</xsl:stylesheet>
But the problem I'm running into is that there is no current() defined in XPath 1.0. I tried using the self:: axis, but like the "." shorthand, that only points to the "context" node, not the "current" node. The language that I'm seeing in the XPath standard suggests that XPath doesn't have a concept of "current node."
Is there perhaps another way to form this path or is this a limitation of XPath?
In XPath 1.0 you could do:
/table/tr/td/a[.='Blue Jeans']/following::td[count(../td)]/span
Of course, this assumes there is no colspan.
EDIT: The proof. This stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:param name="pProduct" select="'Blue Jeans'"/>
<xsl:template match="/">
<xsl:value-of select="/table/tr/td/a[.=$pProduct]
/following::td[count(../td)]/span"/>
</xsl:template>
</xsl:stylesheet>
Output:
$21.50
With param pProduct set to 'Shirt', output:
$18.99
Note: Of course, you need the a element in context in order to select the span element. So, with your stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="text()"/>
<xsl:template match="a">
Name: <xsl:value-of select="."/>
Price: <xsl:value-of select="following::td[count(../td)]/span" />
</xsl:template>
</xsl:stylesheet>
Output:
Name: Blue Jeans
Price: $21.50
Name: Shirt
Price: $18.99
This cannot be achieved with a single XPath 1.0 expression.
In XPath 2.0 one could write:
for $vPreceeding in count(../preceding-sibling::td)
return ../../following-sibling::tr[1]/td[$vPreceeding]

Resources