XPath expression: selecting text nodes between element nodes - xpath

Based in the following HTML I want to extract TextA, TextC and TextE.
<div id='content'>
TextA
<br/>
<br/>
<p>TextB</p>
TextC
<br/>
TextC
<p>TextD</p>
TextE
</div>
I tried to get TextC like so but I don't get the result I want:
Query:
//*[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
Expected result:
["TextC", <br/>, "TextC"]
Actual result:
[<br/>]
Is there a way to select the text nodes without using indexes like //div/text()[1]?

The reason why the two text nodes aren't in the result of your XPath is because * only match elements. To match both element and text node you can use node() instead :
//node()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]
Demo
Or if you want to get the text nodes only i.e excluding <br/>, you can use text() instead of node():
//text()[preceding::p[contains(.,"TextB")] and following::p[contains(.,"TextD")]]

Related

XPATH - grab content of div after named element

There are a number of labels, I want to specify them in xpath and then grab the text after them, example:
<div class="info-row">
<div class="info-label"><span>Variant:</span></div>
<div class="info-content">
<p>750 ml</p>
</div>
</div>
So in this case, I want to say "after the span named 'Variant' grab the p tag:
Result: 750ml
I tried:
//span[text()='Variant:']/following-sibling::p
and variations of this but to no avail.
'following-sibling' function selects all siblings after the current node,
there no siblings for span with text 'Variant:', and correct to search siblings for span parent.
Here is an example which will work
//span[text()='Variant:']/ancestor::div[#class="info-label"]/following-sibling::div/p

How to find direct children which contain nodes with specified text with xpath?

I need to extract all children which have nodes with some text. Html structure might be the following:
<div>
<div>
A
</div>
<p>
<b>A</b>
</p>
<span>
B
</span>
</div>
I need to extract child nodes which have "A" text. It should return div and p nodes
I tried the following xpaths:
./*/*[contains(text(), 'A')]
./*/*[./*[contains(text(), 'A')]]
but the first one returns only div with "A" text and the second one returns only p with "A" text
Is it possible to construct xpath which will return both children?
Node containing "A" text might be at any level in the child node
If you need XPath that returns both child nodes, try to use
./*/*[contains(., "A")]
I suspect contains() is wrong here, unless you really want to select a node whose value is "HAT" as well as one whose value is "A".
Try
*/*[normalize-space(.)='A']

How can I select nodes that don't contain links but which do contain specific text using xpath

Given the following HTML:
$content =
'<html>
<body>
<div>
<p>During the interim there shall be nourishment supplied</p>
</div>
<div>
<p>During the interim there shall be interim nourishment supplied</p>
</div>
<div>
<ul><li>During the interim there shall be nourishment supplied</li></ul>
</div>
</body>
</html>';
I want all the nodes containing the word "interim" but not if the word "interim" is part of a link element.
The nodes I would expect back are the first P node and the LI node only.
I've tried the following:
'//*/text()[not(a) and contains(.,"interim")]'
... but this still returns the A and also returns part of it's parent P node (the part after the A), neither of which are desired. You can see my attempt here: https://glot.io/snippets/ehp7hmmglm
If you use the XPath expression //*[not(self::a) and not(a) and text()[contains(.,"interim")]] then you get all elements that do not contain an a element, are not a elements and contain a text node child containing that word.

How to find xpath of a text element without node

<h1>
<span class="visually-hidden">BBC Radio</span>
Search results for 'archers'
</h1>
I want to locate the text element "Search results for 'archers'" . What will be the xpath that will locate to it and not to the element in span node ??
For your input sample
/h1/text()
Tested on http://videlibri.sourceforge.net/cgi-bin/xidelcgi
returns Search results for 'archers'

Xpath - matching based on node() contains() content

I have the following HTML structure (there are many blocks using the same architecture):
<span id="mySpan">
<i>
Price
<b>
3 900
<small>€</small>
</b>
</i>
</span>
Now, I want to get the content of <b> using Xpath which I tried like so:
//span[#id="mySpan"]/i/node()[1][contains(text(),"Price")]
which does match anything. How can I match this using the node()[1] text as anchor?
Regarding the Xpath you tried, instead of text() which return text node child, simply use . :
//span[#id="mySpan"]/i/node()[1][contains(.,"Price")]
For the ultimate goal, I'd suggest this XPath :
//span[#id="mySpan"]/i[contains(.,"Price")]/b
or if you want specifically to match against the first node within <i> :
//span[#id="mySpan"]/i[contains(node(),"Price")]/b

Resources