Using fn:path() to select an element in XML - xpath

I have an XML with a HTML-like structure:
<h1 id="1">
<table>
<tr>
<td>
<p>text</p>
</td>
</tr>
</table>
</h1>
<h1 id="2">
<table>
<tr>
<td><p>translated text</p>
</td>
</tr>
</table>
</h1>
I want to copy the text from nodes in h1 id="2" to the node that's at the same position in h1 id="1".
Required result:
<p>text/translated text</p>
I can create an Xpath that addresses a single node:
/h1[2]/table[1]/tr[1]/td[1]/p[1]
but I can't figure out how to create an xPath that finds "the node in h1 id="2" that's at the same position as the node I'm working on in h1 id="1""
i.e. when I'm in
/h1[1]/table[1]/tr[1]/td[1]/p[1]
I want to address
/h1[2]/table[1]/tr[1]/td[1]/p[1]
and also
/h1[3]/table[1]/tr[1]/td[1]/p[1]
etc. if more h1 elements are present in the XML.
I tried using the path() function. This returns the path of the current node:
/h1[1]/table[1]/tr[1]/td[1]/p[1]
I'll modify this string by replacing the first part:
<xsl:variable name="newpath" select="concat('/Q{}h1[1]', substring-after(path(),'/Q{}h1[2]'))">
and then read the contents of that xPath:
<xsl:apply-templates select="$newpath"/>
this fails because $newpath is seen as a string instead of a path.
How can I get the output of path() to be treated as a node set instead of a string?

To dynamically evaluate an XPath expression you have as a string, in XSLT 3 and where xsl:evaluate is supported (Saxon PE/EE 9.8 and later, Saxon HE 10 and later, SaxonJS 2 and later, Altova XML 2017 R3 and later) you can use e.g.
<xsl:evaluate context-item="/" xpath="$newpath"/>
to select and output the element(s) selected by $newpath or you can of course store the result of xsl:evaluate in a variable and push the nodes to apply-templates with e.g.
<xsl:variable name="nodes" as="item()*"><xsl:evaluate context-item="/" xpath="$newpath"/></xsl:variable>
<xsl:apply-templates select="$nodes"/>
Online sample using SaxonJS.

Related

How to select the specific sibling of an ancestor using XPath

I have the following HTML structure:
<p>
<!-- Span can be any level deep -->
<span>
Some text
</span>
</p>
<!-- Any number of different elements between span and table -->
<p></p>
<div></div>
<table>
<tr>
<td></td>
</tr>
</table>
Using Nokogiri and custom XPath functions I am able to select the <span> element containing context that matches the regex. I am forced to do it this way since Nokogiri is using XPath 1.0 and there is no support for the matches selector:
#doc.xpath("//span[regex_match(text(), '/some text/i')]")
Having the span node selected, how do I select the table that is visually following the span?
I use the contains function to match the text. Then use following::table to find the table following this span tag.
#doc.xpath("//span[contains(text(), 'Some text')]/following::table")

xpath: how to select items between item A and item B

I have an HTML page with this structure:
<big><b>Staff in:</b></big>
<br>
<a href='...'>Movie 1</a>
<br>
<a href='...'>Movie 2</a>
<br>
<a href='...'>Movie 3</a>
<br>
<br>
<big><b>Cast in:</b></big>
<br>
<a href='...'>Movie 4</a>
How do I select Movies 1, 2, and 3 using Xpath?
I wrote this query
'//big/b[text()="Staff in:"]/following::a'
but it returns Movies 1, 2, 3, and 4. I guess I need to find a way to get items after <big><b>Staff in: but before the next <big>.
Thanks,
Assuming that <big><b>Staff in:</b></big> is a unique element that we can use as 'anchor', you can try this way :
//big[b='Staff in:']/following-sibling::a[preceding-sibling::big[1][b='Staff in:']]
Basically, the xpath finds all <a> that is following sibling of the 'anchor' <big> element mentioned above, and restrict the result to those having nearest preceding sibling <big> equals the anchor element.
output in xpath tester given markup in question as input (with minimal adjustment to make it well-formed XML) :
Element='Movie 1'
Element='Movie 2'
Element='Movie 3'
//a[preceding::b[text()="Staff in:"] and following::b[text()="Cast in:"]]
Returns all a after the element b with text Staff in: but before the element b with the text Cast in:.
You may need to add some more conditions to make it more specific depending on whether or not these b elements are unique on the page.
Just to add up and following the stackoverflow link here XPath axis, get all following nodes until here is the complete solution that i have worked up with xslt editor. Firstly /*/ is used instead of // as this is faster. Second the logic says all anchor nodes which are siblings of big are returned if they satisfy the inner condition that they have preceding sibling of big node equal to what they are following. Also presumed you have distinct big node.
The x-path looks like
/*/big[b="Cast in:"]/following-sibling::a [1 = count(preceding-sibling::big[1]| ../big[b="Cast in:"])]
The xslt solution looks like
<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0"
xmlns:xsl="http://www.w3.org/1999/XSL/Transform">
<xsl:template match="/">
<html>
<body>
<h2>My Movie Collection</h2>
<table border="1">
<tr bgcolor="#9acd32">
<th>Title</th>
</tr>
<xsl:variable name="placeholder" select="/*/big" />
<xsl:for-each select="$placeholder">
<xsl:variable name="i" select="position()" />
<b>
<xsl:value-of select="$i" />
<xsl:value-of select="$placeholder[$i]" />
</b>
<xsl:for-each
select="following-sibling::a [1 = count(preceding-
sibling::big[1]| ../big[b=$placeholder[$i]])]">
<tr>
<td>
<xsl:value-of select="." />
</td>
</tr>
</xsl:for-each>
</xsl:for-each>
</table>
</body>
</html>
</xsl:template>
</xsl:stylesheet>

How can I find an element with XPath using its parent?

I need to get the "a" element inside a "td" element from a row in a table of several similar rows. The problem is I only have the name 'john'. How can I find john td -> get the parent "tr" -> and then get "a" in XPath?
Code example:
<?xml version="1.0" encoding="UTF-8"?>
<html>
<table>
...
<tr id='1'>
<td name='john'>
</td>
<td>
<a id='clickable'/>
</td>
<td>
</td>
</tr>
...
</table>
</html>
I would write this XPath expression like this:
//td[#name="john"]/following-sibling::td[1]/a
This does:
//
from any depth
td
find a td element
[#name="john"]
with a name attribute equal to 'john'
/following-sibling::
now look among its following sibling elements
td
and find another td
[1]
get the first one
/a
and get its children that are a elements
How about:
//a[ancestor::tr[td/#name = 'john']]
What I would do :
//*[#name="john"]/../td/a/#id

Xpath: howto return empty values

I have an Xpath like following:
"//<path to some table>/*/td[1]/text()"
and it returns text values of all non-empty tds, for example:
<text1>, <text2>, <text3>
But the problem is that between nodes, that contain mentioned values could be some empty tds elements:
What i want is to get result that contain some identifiers, that there is those empty values, for example:
<text1>,<>, <>, <text2>, <text3>, <>
or
<text1>,<null>, <null>, <text2>, <text3>, <null>
I tried to use next one:
"//<path to some table>/*/string(td[1]/text())"
but it returns undefined
Of course, I could just get whole node and then work with it in my code (cut all unnecessary info), but may be there is a better way?
html example for that case:
<html>
<body>
<table class="tablesorter">
<tbody>
<tr class="tr_class">
<td>text1</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text2</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text3</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
</tbody>
</table>
</body>
</html>
Well simply select the td elements, not its text() child nodes. So with the path changed to //<path to some table>/*/td[1] or maybe //<path to some table>/*/td you will get a node-set of td elements, whether they are empty or not, and you can then access the string contents of each node (with XPath (select string(.) for each element node) or host environment method e.g. textContent in the W3C DOM or text in the MSXML DOM.). That way the empty strings will be included.
In case you use XPath 2.0 or XQuery you can directly select //<path to some table>/*/td/string(.) to have a sequence of string values. But that approach with a function call in the last step is not supported in XPath 1.0, there you can select the td element nodes and then access the string value of each in a separate step.
Do you mean you want only the td[1] with text and get rid of ones without text? If so, you can use this xpath
//td[1][string-length(text()) > 1]

XPath query to identify untagged text

Consider this HTML:
<html>
<head>
</head>
<body>
<table>
<tr>
<td>
<h1>title</h1>
<h3>item 1</h3>
text details for item 1
<h3>item 2</h3>
text details for item 2
<h3>item 3</h3>
text details for item 3
</td>
</tr>
</table>
</body>
</html>
I'm not terribly familiar with XPath, but it seems to me that there is no notation which will match the "text details" sections individually. Can you confirm?
Use:
/html/body/table/tr/td/h3/following-sibling::text()[1]
This means: Get the first following sibling text node of every h3 element that is a child of every tr element that is a child of every table element that is a child of every body element that is a child of the html top element.
Or, if you only know that the wanted text nodes are the immediate following siblings of all h3 elements in the docunent, then tis XPath expression selects them:
//h3/following-sibling::text()[1]
in the world of Xml/Xpath
Text - is a type of Element Node.
so considering your example
TD has 7 child nodes
TD.getChild(3) should return the "text details for item 1" Value.
in XPath
$x//table/tr/td/text()[1]

Resources