How select nodes combining preceding-sibling and following sibling? - xpath

I want to select all nodes preceding-sibling A and following-sibling A, excluding following-sibling C and D
XML :
<XMLCODE>
<ex>
<z>bla</z>
<z>bla</z>
<A/>
<k>want</k>
<b>want</b>
<A/>
<b>bla</b>
<h>bla</h>
<C/>
<z>bla</z>
<D/>
<e>bla</e>
<A/>
<j>want</j>
<A/>
<i>bla</i>
<C/>
<y>bla</y>
<C/>
<y>bla</y>
</ex>
</XMLCODE>
output:
<k>want</k>
<b>want</b>
<j>want</j>
I tried
//*[
preceding-sibling::*[self::A ]
and
following-sibling::*[self::A ]
]
[not(self::A)]
Thanks

This is how I would approach this:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="trail" match="*[not(self::A)]" use="generate-id(preceding-sibling::A[1])" />
<xsl:template match="/XMLCODE">
<result>
<xsl:for-each select="ex/A[position() mod 2 = 1]">
<xsl:copy-of select="key('trail', generate-id())"/>
</xsl:for-each>
</result>
</xsl:template>
</xsl:stylesheet>
Applied to your input example, this will return:
Result
<?xml version="1.0" encoding="UTF-8"?>
<result>
<k>want</k>
<b>want</b>
<j>want</j>
</result>
This is actually an XSLT 1.0 method. In XSLT 2.0 you could ostensibly do something with:
<xsl:for-each-group select="ex/*" group-starting-with="A">
but I don't see an elegant method to distinguish between the "on" and "off" groups, since the first group could start with an A or not.

Use this XPath 1.0 expression:
/*/ex/*[not(self::A or self::B or self::C)
and
(
preceding-sibling::A[1] | preceding-sibling::B[1] | preceding-sibling::C[1]
)[last()][self::A]
and
(
following-sibling::A[1] | following-sibling::B[1] | following-sibling::C[1]
)[1][self::A]
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/ex/*[
not(self::A or self::B or self::C)
and
(
preceding-sibling::A[1] | preceding-sibling::B[1] | preceding-sibling::C[1]
)[last()][self::A]
and
(
following-sibling::A[1] | following-sibling::B[1] | following-sibling::C[1]
)[1][self::A]
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<XMLCODE>
<ex>
<z>bla</z>
<z>bla</z>
<A/>
<k>want</k>
<b>want</b>
<A/>
<b>bla</b>
<h>bla</h>
<C/>
<z>bla</z>
<D/>
<e>bla</e>
<A/>
<j>want</j>
<A/>
<i>bla</i>
<C/>
<y>bla</y>
<C/>
<y>bla</y>
</ex>
</XMLCODE>
The Xpath expression is evaluated and the results of this are copied to the output --the correct, wanted result is produced:
<k>want</k>
<b>want</b>
<j>want</j>

I think you can do
let $A := //A
return for-each-pair(
$A[position() mod 2 = 1],
$A[position() mod 2 = 0],
function($A1, $A2) {
$A1/following-sibling::* intersect $A2/preceding-sibling::*
}
)
XPath 3.1 with higher-order function support but Saxon 10 and later in all editions, SaxonJS 2 and Saxon 9.8/9.9 PE/EE do that.

Using xslt-1.0 and exslt to extract element nodes between pairs of A elements:
xmlstarlet select -t \
-m '*/*/A[following-sibling::A][count(preceding-sibling::A) mod 2 = 0]' \
-c 'set:leading(following-sibling::*,following-sibling::A[1])' \
file.xml
-m iterates over A element pairs, changing the context to the first A
-c copies following sibling elements up to, and excluding, the second A
set:leading documentation on github.io,
implementation on github.com.
Output:
<k>want</k><b>want</b><j>want</j>
To have xmlstarlet select list the generated XSLT add a -C
option before -t:
<?xml version="1.0"?>
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" xmlns:exslt="http://exslt.org/common" xmlns:set="http://exslt.org/sets" version="1.0" extension-element-prefixes="exslt set">
<xsl:output omit-xml-declaration="yes" indent="no"/>
<xsl:template match="/">
<xsl:for-each select="*/*/A[following-sibling::A][count(preceding-sibling::A) mod 2 = 0]">
<xsl:copy-of select="set:leading(following-sibling::*,following-sibling::A[1])"/>
</xsl:for-each>
</xsl:template>
</xsl:stylesheet>

Related

Passing parameters from script to XSL

Using XSLT2 with the latest Saxon HE.
I'm trying to pass multiple coordinate parameters from a script to XSL in order to filter results based on a location boundary box
Script:
java -jar saxon9he.jar -s:litter_bins.xml -o:"bins.xml" -xsl:"Split xml coords.xsl" Coord_2=51.3725 Coord_4=51.3751 Coord_1=-2.3615 Coord_3=-2.3572
XSL:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="2.0">
<xsl:output indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:param name="Coord_2" select="Coord_2"/>
<xsl:param name="Coord_4" select="Coord_4"/>
<xsl:param name="Coord_1" select="Coord_1"/>
<xsl:param name="Coord_3" select="Coord_3"/>
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
<xsl:template match="node[#lat[ . < $Coord_2 or . > $Coord_4 ] or #lon[ . < $Coord_1 or . > $Coord_3]]"/>
</xsl:stylesheet>
The above returns:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="JOSM"/>
However if I hard code the coordinates into the match xpath, it returns the expected results.
Xpath:
<xsl:template match="node[#lat[ . < 51.3725 or . > 51.3751 ] or #lon[ . < -2.3615 or . > -2.3572]]"/>
Results:
<?xml version="1.0" encoding="UTF-8"?>
<osm version="0.6" generator="JOSM">
<node id="-102973" visible="true" lat="51.37283499216" lon="-2.359890029">
<tag k="date_creat" v="17/07/2014 07:59:04 AM UTC"/>
<tag k="form_recor" v="888"/>
</node>
<snip...>
</osm>
What am I misunderstanding?
Try to declare a numeric type for the parameters e.g. <xsl:param name="Coord_2" as="xs:double"/> or <xsl:param name="Coord_2" as="xs:decimal"/>. Of course for that your stylesheet needs to declare xmlns:xs="http://www.w3.org/2001/XMLSchema" as a namespace declaration on the root element.
Without a numeric type I think the comparison will be of two xs:untypedAtomic values and then https://www.w3.org/TR/xpath-31/#id-general-comparisons demands
If both atomic values are instances of xs:untypedAtomic, then the
values are cast to the type xs:string
and then the string comparison of negative numbers fails to give you the wanted result.

How to order self-referencing xml

I have a list of order lines with each one product on them. The products in may form a self-referencing hierarchy. I need to order the lines in such a way that all products that have no parent or whose parent is missing from the order are at the top, followed by their children. No child may be above its parent in the end result.
So how can i order the following xml:
<order>
<line><product code="3" parent="1"/></line>
<line><product code="2" parent="1"/></line>
<line><product code="6" parent="X"/></line>
<line><product code="1" /></line>
<line><product code="4" parent="2"/></line>
</order>
Into this:
<order>
<line><product code="6" parent="X"/></line>
<line><product code="1" /></line>
<line><product code="2" parent="1"/></line>
<line><product code="3" parent="1"/></line>
<line><product code="4" parent="2"/></line>
</order>
Note that the order within a specific level is not important, as long as the child node follows at some point after it's parent.
I have a solution which works for hierarchies that do not exceed a predefined depth:
<order>
<xsl:variable name="level-0"
select="/order/line[ not(product/#parent=../line/product/#code) ]"/>
<xsl:for-each select="$level-0">
<xsl:copy-of select="."/>
</xsl:for-each>
<xsl:variable name="level-1"
select="/order/line[ product/#parent=$level-0/product/#code ]"/>
<xsl:for-each select="$level-1">
<xsl:copy-of select="."/>
</xsl:for-each>
<xsl:variable name="level-2"
select="/order/line[ product/#parent=$level-1/product/#code ]"/>
<xsl:for-each select="$level-2">
<xsl:copy-of select="."/>
</xsl:for-each>
</order>
The above sample xslt will work for hierarchies with a maximum depth of 3 levels and is easily extended to more, but how can i generalize this and have the xslt sort arbitrary levels of depth correctly?
To start with, you could define a couple of keys to help you look up the line elements by either their code or parent attribute
<xsl:key name="products-by-parent" match="line" use="product/#parent" />
<xsl:key name="products-by-code" match="line" use="product/#code" />
You would start off by selecting the line elements with no parent, using a key to do this check:
<xsl:apply-templates select="line[not(key('products-by-code', product/#parent))]"/>
Then, within the template that matches the line element, you would just copy the element, and then select its "children" like so, using the other key
<xsl:apply-templates select="key('products-by-parent', product/#code)"/>
This would be a recursive call, so it would recursively look for its children until no more are found.
Try this XSLT
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="xml" indent="yes"/>
<xsl:key name="products-by-parent" match="line" use="product/#parent"/>
<xsl:key name="products-by-code" match="line" use="product/#code"/>
<xsl:template match="order">
<xsl:copy>
<xsl:apply-templates select="line[not(key('products-by-code', product/#parent))]"/>
</xsl:copy>
</xsl:template>
<xsl:template match="line">
<xsl:call-template name="identity"/>
<xsl:apply-templates select="key('products-by-parent', product/#code)"/>
</xsl:template>
<xsl:template match="#*|node()" name="identity">
<xsl:copy>
<xsl:apply-templates select="#*|node()"/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
Do note the use of the XSLT identity transform to copy the existing nodes in the XML.
Very interesting problem. I would do this in two passes: first, nest the elements according to their hierarchy. Then output the elements, sorted by the count of their ancestors.
XSLT 1.0 (+ EXSLT node-set() function):
<?xml version="1.0" encoding="utf-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform"
xmlns:exsl="http://exslt.org/common"
extension-element-prefixes="exsl">
<xsl:output method="xml" version="1.0" encoding="UTF-8" indent="yes"/>
<xsl:key name="product-by-code" match="product" use="#code" />
<!-- first pass -->
<xsl:variable name="nested">
<xsl:apply-templates select="/order/line/product[not(key('product-by-code', #parent))]" mode="nest"/>
</xsl:variable>
<xsl:template match="product" mode="nest">
<xsl:copy>
<xsl:copy-of select="#*"/>
<xsl:apply-templates select="../../line/product[#parent=current()/#code]" mode="nest"/>
</xsl:copy>
</xsl:template>
<!-- output -->
<xsl:template match="/order">
<xsl:copy>
<xsl:for-each select="exsl:node-set($nested)//product">
<xsl:sort select="count(ancestor::*)" data-type="number" order="ascending"/>
<line><product><xsl:copy-of select="#*"/></product></line>
</xsl:for-each>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
When applied to your input, the result is:
<?xml version="1.0" encoding="UTF-8"?>
<order>
<line>
<product code="6" parent="X"/>
</line>
<line>
<product code="1"/>
</line>
<line>
<product code="3" parent="1"/>
</line>
<line>
<product code="2" parent="1"/>
</line>
<line>
<product code="4" parent="2"/>
</line>
</order>
This still leaves the issue of the existing/missing parent X - I will try to address that later.

Longer node in XPath

I'd like to use XPath to retrieve the longer of two nodes.
E.g., if my XML is
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
And I do document.SelectSingleNode(your XPath here)
I would expect to get back the url1 node. If url2 is longer, or there is no url1 node, I'd expect to get back the url2 node.
Seems simple but I'm having trouble figuring it out. Any ideas?
This works for me, but it is ugly. Cannot you do the comparison outside XPath?
record/*[starts-with(name(),'url')
and string-length(.) > string-length(preceding-sibling::*[1])
and string-length(.) > string-length(following-sibling::*[1])]/text()
<xsl:for-each select="*">
<xsl:sort select="string-length(.)" data-type="number"/>
<xsl:if test="position() = last()">
<xsl:copy-of select="."/>
</xsl:if>
</xsl:for-each>
Even works in XSLT 1.0!
Use this single XPath expression:
/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]
XSLT - based verification:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="/">
<xsl:copy-of select=
"/*/*[not(string-length(preceding-sibling::*|following-sibling::*)
>
string-length()
)
]"/>
</xsl:template>
</xsl:stylesheet>
When this transformation is applied on the provided XML document:
<record>
<url1>http://www.google.com</url1>
<url2>http://www.bing.com</url2>
</record>
the Xpath expression is evaluated and the result of this evaluation (the selected element) is copied to the output:
<url1>http://www.google.com</url1>

Xpath: filter out childs

I'm looking for a xpath expression that filters out certain childs. A child must contain a CCC node with B in it.
Source:
<AAA>
<BBB1>
<CCC>A</CCC>
</BBB1>
<BBB2>
<CCC>A</CCC>
</BBB2>
<BBB3>
<CCC>B</CCC>
</BBB3>
<BBB4>
<CCC>B</CCC>
</BBB4>
</AAA>
This should be the result:
<AAA>
<BBB3>
<CCC>B</CCC>
</BBB3>
<BBB4>
<CCC>B</CCC>
</BBB4>
</AAA>
Hopefully someone can help me.
Jos
XPath is a query language for XML documents. As such it can only select nodes from existing XML document(s) -- it cannot modify an XML document or create a new XML document.
Use XSLT in order to transform an XML document and create a new XML document from it.
In this particular case:
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output omit-xml-declaration="yes" indent="yes"/>
<xsl:strip-space elements="*"/>
<xsl:template match="node()|#*">
<xsl:copy>
<xsl:apply-templates select="node()|#*"/>
</xsl:copy>
</xsl:template>
<xsl:template match="/*/*[not(CCC = 'B')]"/>
</xsl:stylesheet>
when this transformation is applied on the provided XML document:
<AAA>
<BBB1>
<CCC>A</CCC>
</BBB1>
<BBB2>
<CCC>A</CCC>
</BBB2>
<BBB3>
<CCC>B</CCC>
</BBB3>
<BBB4>
<CCC>B</CCC>
</BBB4>
</AAA>
the wanted, correct result is produced:
<AAA>
<BBB3>
<CCC>B</CCC>
</BBB3>
<BBB4>
<CCC>B</CCC>
</BBB4>
</AAA>
In order to select all of the desired element and text nodes, use this XPATH:
//node()[.//CCC[.='B']
or self::CCC[.='B']
or self::text()[parent::CCC[.='B']]]
This could be achieved with a more simply/easily using XPATH with a modified identity transform XSLT:
<?xml version="1.0"?>
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output indent="yes" />
<!--Empty template for the content we want to redact -->
<xsl:template match="*[CCC[not(.='B')]]" />
<!--By default, copy all content forward -->
<xsl:template match="#*|node()">
<xsl:copy>
<xsl:apply-templates/>
</xsl:copy>
</xsl:template>
</xsl:stylesheet>
try this ,
"//CCC[text() = 'B']"
It shall give all CCC nodes where the innertext is B.
If you want to get AAA, BBB3 and BBB4 you can use the following
//*[descendant::CCC[text()='B']]
If BBB3 and BBB4 only then
//*[CCC[text()='B']]

Navigate HTML table columns with XPath 1.0

Using only an XPath expression (and not in XSLT or DOM - just pure XPath), I'm trying to create a relative path from the current node (in a td) to an associated td in the same column of the same HTML table.
For example, suppose I have this type of data:
<table>
<tr> <td><a>Blue Jeans</a></td> <td><a>Shirt</a></td> </tr>
<tr> <td><span>$21.50</span></td> <td><span>$18.99</span></td> </tr>
</table>
and I'm on the a with "Blue Jeans" and want to find the price ($21.50). In XSLT, I could use the current() function to get the answer like this:
<xsl:stylesheet version="1.0" xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:output method="text" />
<xsl:template match="/">
<xsl:apply-templates select="//a" />
</xsl:template>
<xsl:template match="a">
Name: <xsl:value-of select="."/>
Price: <xsl:value-of select="../../following-sibling::tr[1]/td[position() = count(current()/../preceding-sibling::td) + 1]" />
</xsl:template>
</xsl:stylesheet>
But the problem I'm running into is that there is no current() defined in XPath 1.0. I tried using the self:: axis, but like the "." shorthand, that only points to the "context" node, not the "current" node. The language that I'm seeing in the XPath standard suggests that XPath doesn't have a concept of "current node."
Is there perhaps another way to form this path or is this a limitation of XPath?
In XPath 1.0 you could do:
/table/tr/td/a[.='Blue Jeans']/following::td[count(../td)]/span
Of course, this assumes there is no colspan.
EDIT: The proof. This stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:param name="pProduct" select="'Blue Jeans'"/>
<xsl:template match="/">
<xsl:value-of select="/table/tr/td/a[.=$pProduct]
/following::td[count(../td)]/span"/>
</xsl:template>
</xsl:stylesheet>
Output:
$21.50
With param pProduct set to 'Shirt', output:
$18.99
Note: Of course, you need the a element in context in order to select the span element. So, with your stylesheet:
<xsl:stylesheet xmlns:xsl="http://www.w3.org/1999/XSL/Transform" version="1.0">
<xsl:output method="text"/>
<xsl:template match="text()"/>
<xsl:template match="a">
Name: <xsl:value-of select="."/>
Price: <xsl:value-of select="following::td[count(../td)]/span" />
</xsl:template>
</xsl:stylesheet>
Output:
Name: Blue Jeans
Price: $21.50
Name: Shirt
Price: $18.99
This cannot be achieved with a single XPath 1.0 expression.
In XPath 2.0 one could write:
for $vPreceeding in count(../preceding-sibling::td)
return ../../following-sibling::tr[1]/td[$vPreceeding]

Resources