I wish to select all <span> that have only <br> as children :
<html>
..
<span>
...
</span>
<span> <!-- I want those ones -->
<br/>
</span>
How would I select these elements?
Assuming you mean elements with no children except br elements, where br is mandatory:
/html/span
[br and not(
*[not(self::br)]
)]
Meaning: All span elements which have at least one br child and no other elements as children.
Related
I'm trying to access a certain element from by using XML but I just can't seem to get it, and I don't understand quite why.
<ul class="test1" id="content">
<li class="list">
<p>Insert random text here</p>
<div class="author">
</div>
</li>
<li class="list">
<p>I need this text here</p>
</li>
</ul>
Basically the text I want is the second one but I want/need to use something similar to p[not(div)] as to retrieve it.
I have tried the methods from the following link but to no avail (xpath find node that does not contain child)
Here is how I tried accessing the text:
ul[contains(#id,"content")]//p[not(.//div)]/text()
If you have any possible answers, thank you !
The HTML snippet posted in question shows that both p elements do not contain any div, so the expression //p[not(.//div)] would match both p. The first p element is sibling of the div (both shares the same parent element li) instead of parent or ancestor. The following XPath expression would match text nodes from the 2nd p and not those from the first one:
//ul[contains(#id,"content")]/li[not(div)]/p/text()
Brief explanation:
//ul[contains(#id,"content")]: find ul elements where id attribute value contains text "content"
/li[not(div)]: from such ul find child elements li that don't have child element div. This will match only the end li in the example HTML
/p/text(): from such li, find child elements p and then return child text nodes form such p
How do i get the text of a child element, if the parent element contains text with a specific string?
For example:
<li>
"string1"
<span>
"Hello"
</span>
</li>
<li>
"string2"
<span>
"Ola"
</span>
</li>
From the above html code, how to get only string "Ola" using xpath?
Without knowing scrapy, I would try
//li[text()[contains(.,"string2")]]/span/text()
//li[text()[contains(.,"string2")]] select a li element that text contains string2
/span select a element span below the selected li
/text(): return the text of the selected span element
Update: This is simpler and should also work:
//li[contains(text(),"string2")]/span/text()
There are a number of labels, I want to specify them in xpath and then grab the text after them, example:
<div class="info-row">
<div class="info-label"><span>Variant:</span></div>
<div class="info-content">
<p>750 ml</p>
</div>
</div>
So in this case, I want to say "after the span named 'Variant' grab the p tag:
Result: 750ml
I tried:
//span[text()='Variant:']/following-sibling::p
and variations of this but to no avail.
'following-sibling' function selects all siblings after the current node,
there no siblings for span with text 'Variant:', and correct to search siblings for span parent.
Here is an example which will work
//span[text()='Variant:']/ancestor::div[#class="info-label"]/following-sibling::div/p
Using the xpath //ul//li[contains(text(),"outer")] to find a li in the outer ul does not work
<ul>
<li>
<span> not unique text, </span>
<span> not unique text, </span>
outer ul li 1
<ul >
<li> inner ul li 1 </li>
<li> inner ul li 2 </li>
</ul>
</li>
<li>
<span> not unique text, </span>
<span> not unique text, </span>
outer ul li 2
<ul >
<li> inner ul li 1 </li>
<li> inner ul li 2 </li>
</ul>
</li>
</ul>
Any idea how to find a li with a specific text in the outer ul?
Thank you
This will work for you //ul//li[contains(.,"outer")]
I would expect that you only like to consider the text nodes which are direct child of the li. Therefore you are right with using text() (if you use contains(.,"outer") this will consider text form any children of li).
Therefore try this:
//ul/li[text()[contains(.,'outer')]]
Running this with Saxon, the original XPath expression gives:
XPTY0004: A sequence of more than one item is not allowed as the first argument of
contains() ("", "", ...)
Now, I guess Selenium is probably using XPath 1.0 rather than XPath 2.0, and in 1.0 the contains() function has "first item semantics" - it converts its argument to a string, which if the argument is a node-set containing more than one node, involves considering only the first node. And the first text node is probably whitespace.
If you want to test whether some child text node contains "outer", use
//ul//li[text()[contains(.,"outer")]]
Another reason for switching to XPath 2.0...
For above issue -
This solution will work
//ul//li[contains(.,"outer")]
"." Selects the current node
Consider this HTML:
<html>
<head>
</head>
<body>
<table>
<tr>
<td>
<h1>title</h1>
<h3>item 1</h3>
text details for item 1
<h3>item 2</h3>
text details for item 2
<h3>item 3</h3>
text details for item 3
</td>
</tr>
</table>
</body>
</html>
I'm not terribly familiar with XPath, but it seems to me that there is no notation which will match the "text details" sections individually. Can you confirm?
Use:
/html/body/table/tr/td/h3/following-sibling::text()[1]
This means: Get the first following sibling text node of every h3 element that is a child of every tr element that is a child of every table element that is a child of every body element that is a child of the html top element.
Or, if you only know that the wanted text nodes are the immediate following siblings of all h3 elements in the docunent, then tis XPath expression selects them:
//h3/following-sibling::text()[1]
in the world of Xml/Xpath
Text - is a type of Element Node.
so considering your example
TD has 7 child nodes
TD.getChild(3) should return the "text details for item 1" Value.
in XPath
$x//table/tr/td/text()[1]