Having the following markup:
<p>
<span>text</span>
foo
</p>
<p>
<span>text</span>
bar
</p>
<p>
<span>text</span>
baz
</p>
how can I select the second node based on the bar text using xpath?
Note that I do not want to select the text as it is discussed here: XPath - how to select text
what I need, is to select the parent node based on the containing text node.
Have you tried
//p[text()[normalize-space(.)='bar']]
Related
This is a follow-up question of this, but unfortunately the answer from that question doesn't apply.
Say I have the following XML:
<body>
<div id="global-header">
header
</div>
<div id="a">
<h3>some title</h3>
<p>text 1
<b>bold</b>
</p>
<div>
<p>abc</p>
<p>text 2</p>
<p>def</p>
</div>
</div>
</body>
I want to
find the <p> node whose value is "text 2" (assume we only have exactly one such <p>), and then
find all the nodes that precede this particular <p> but are also descendants of the <div id='a'> node(you can use something like [#id='a'] to locate it), and finally
extract text() from step 2.
The desired output should look like:
some title
text 1
bold
abc
The caveat is that
the preceding nodes may contain arbitrary node type, not only <h3> and <p>.
the <p>text 2</p> node may be embeded arbitrarly deep in the tree, hence xpath like .//p[text()="text 2"]/preceding-sibling::* would only extract <p>abc</p> and leave out others.
You can try this XPath expression:
//p[.='text 2']/preceding::text()[ancestor::div[#id='a']]
The disadvantage of this approach is that the text() nodes may not be clearly separated, but rather merged for the sub-elements. To separate them, you'd need some kind of for-loop.
I have a html like:
...
<div class="grid">
"abc"
<span class="searchMatch">def</span>
</div>
<div class="grid">
<span class="searchMatch">def</span>
</div>
...
I want to get the div which not contains text,but xpath
//div[#class='grid' and text()='']
seems doesn't work,and if I don't know the text that other divs have,how can I find the node?
Let's suppose I have inferred the requirement correctly as:
Find all <div> elements with #class='grid' that have no directly-contained non-whitespace text content, i.e. no non-whitespace text content unless it's within a child element like a <span>.
Then the answer to this is
//div[#class='grid' and not(text()[normalize-space(.)])]
You need a not() statement + normalize-space() :
//div[#class='grid' and not(normalize-space(text()))]
or
//div[#class='grid' and normalize-space(text())='']
I have the following HTML structure:
<p>
<!-- Span can be any level deep -->
<span>
Some text
</span>
</p>
<!-- Any number of different elements between span and table -->
<p></p>
<div></div>
<table>
<tr>
<td></td>
</tr>
</table>
Using Nokogiri and custom XPath functions I am able to select the <span> element containing context that matches the regex. I am forced to do it this way since Nokogiri is using XPath 1.0 and there is no support for the matches selector:
#doc.xpath("//span[regex_match(text(), '/some text/i')]")
Having the span node selected, how do I select the table that is visually following the span?
I use the contains function to match the text. Then use following::table to find the table following this span tag.
#doc.xpath("//span[contains(text(), 'Some text')]/following::table")
Consider this HTML:
<html>
<head>
</head>
<body>
<table>
<tr>
<td>
<h1>title</h1>
<h3>item 1</h3>
text details for item 1
<h3>item 2</h3>
text details for item 2
<h3>item 3</h3>
text details for item 3
</td>
</tr>
</table>
</body>
</html>
I'm not terribly familiar with XPath, but it seems to me that there is no notation which will match the "text details" sections individually. Can you confirm?
Use:
/html/body/table/tr/td/h3/following-sibling::text()[1]
This means: Get the first following sibling text node of every h3 element that is a child of every tr element that is a child of every table element that is a child of every body element that is a child of the html top element.
Or, if you only know that the wanted text nodes are the immediate following siblings of all h3 elements in the docunent, then tis XPath expression selects them:
//h3/following-sibling::text()[1]
in the world of Xml/Xpath
Text - is a type of Element Node.
so considering your example
TD has 7 child nodes
TD.getChild(3) should return the "text details for item 1" Value.
in XPath
$x//table/tr/td/text()[1]
basically i want to select a node (div) in which it's children node's(h1,b,h3) contain specified text.
<html>
<div id="contents">
<p>
<h1> Child text 1</h1>
<b> Child text 2 </b>
...
</p>
<h3> Child text 3 </h3>
</div>
i am expecting, /html/div/ not /html/div/h1
i have this below, but unfortunately returns the children, instead of the xpath to the div.
expression = "//div[contains(text(), 'Child text 1')]"
doc.xpath(expression)
i am expecting, /html/div/ not /html/div/h1
So is there a way to do this simply with xpath syntax?
The following expression gives a node (div) in which any children nodes (not just h1,b,h3) contain specified text (not the div itself):
doc.xpath('//div[.//*[contains(text(), "Child text 1")]]')
you can refine that and return the only the div with the id contents like in your example:
doc.xpath('//div[#id="contents" and .//*[contains(text(), "Child text 1")]]')
It does not match, if the text is a text node of the div (directly inside the div), which is my interpretation of the question.
You could append "/.." to anchor back to the parent. Not sure if there's a more robust method.
expression = "//div[contains(text(), 'Child text 1')]/.."