xpath to get all the childrens text

xpath to get all the childrens text - xpath

Is there any way to get all the childrens node values within the ul tag.
Input:
<ul>
<li class="type">Industry</li>
<li>Automotive</li>
<li>Parts </li>
<li>Tires</li>
</ul>
Output: Industry, Automotive, Parts, Tires.

This will retrieve all text elements with a parent ul element.
//ul/descendant::*/text()

You can use XPath axis. An axis represents a relationship to the context node, and is used to locate nodes relative to that node on the tree. As of today there are 13 axes. You can use descendant for all of the children (including nested) of the context node or descendant-or-self axis which indicates the context node and all of its descendants. For example:
//ul/descendant::*/text()
//ul/descendant-or-self::*/text()

Related

XPath to text node whose ancestor has a descendant that contains specific text string

I'm trying to find the xpath to a text node that follows the immediately preceding text node in the DOM - but they aren't siblings and their xpath relationship can change.
I'm trying to find the dollar amount, which changes. There is a unique ID up top. The unique text and the dollar amount are the only two pieces of text within their closest div ancestor. I want to find the unique text, then move up to the closest div ancestor, then return the text string within that div ancestor that contains "$".
I want to make sure it's returning the CLOSEST div ancestor (not the FIRST div under uniqueID) because there are many div ancestors where this could be true.
I use ancestor and descendent because the number of elements between uniqueID and text and uniqueID and $ amount changes.
my best guess (that doesn't work):
//text()[contains(.,"$")][ancestor::div[1]//*[text()[contains(.,"uniqueText")]][ancestor::div[#id="uniqueID"]]
I think that 'div[1]' is returning the top div though, not the closest div above the uniqueText
<div id="unique">
<div>
<h4></h4>
<div>
<div>
<div>
<span>
<span>
<span>uniqueText
</span>
</span>
</span>
<div>
<div>
<p>$xxx</p>
</div>
</div>

Your question isn't entirely clear (and the html sample is invalid), but if I understand you correctly and you want to start from the text and get the $ amount where these are both under the closest common <div> ancestor, and the $ amount is contained within a single <p> tag under that common ancestor, try this on your actual html
div[#id="unique"]//span[contains(./text()[1],"uniqueText")]/ancestor::div[1]/div[contains(.//p/text(),"$")]//p/text()
and see if it works.

Removing empty nodes but keep nodes with image tags

I am trying to remove all the empty nodes but the code also detects nodes with image tag as empty. I need the nodes with img tag to remain. Also I don't need nodes with whitespaces and other non printable characters. This is my current code:
$empties= $xpath->query('//*[not((*))]');
foreach($empties as $empty){
$empty->parentNode->removeChild($empty);
}
I need this to go:
<div class='blah'> </div>
and these to stay
<div class='blah'><img src='bla'/></div>
<div class='blah'>some text</div>

I'm not sure you've fully specified which nodes you want to stay, but the following XPath is consistent with your stated needs:
//*[not(self::img) and not(*) and not(text()[normalize-space()])]
(Building on Martin's comment.)
This will select for removal all elements that are not <img>, and have no element children, and have no direct text node children that contain more than just whitespace.

First, let's clear up the ambiguity by using a more comprehensive example:
<div id="d1">
<div id="d2"/>
<div id="d3" class='blah'><img src='bla'/></div>
<div id="d4" class='blah'>some text</div>
<div id="d5" class='blah'> </div>
<div id="d6" class='blah'>
</div>
</div>
Then
//*[not(*) and text()[not(normalize-space())]]
says
select
elements without child elements but with child text
consisting of only whitespace.
For the above XML, it selects the d5 and d6 divs, not the img, and not the d1 through d4 divs.

following-sibling for any descendant confusion

For a html text
<html>
<body>
<div id="1">1</div>
<div id="2">1</div>
<div id="3">1</div>
</body>
</html>
I query
//following-sibling::div[3]
And the result is there
<div id="3">1</div>
But according to XPath specs
The following-sibling axis contains the context node's following
siblings, those children of the context node's parent that occur after
the context node in document order;
So what is the context node after that 3rd div is successfully found? It seems that when // founds first div, there's no 3rd div after it, the last accessible should be [2]. If the context node is not div but body or html then divs are not siblings for them.

The context node is the first text node (containing only whitespace) in the body element.

Find all elements that has a specified nested child

Hey, i've parsed html doc. need to find all element that has a specified child(can be not a direct child).
for ex:
<center>
<table>
...
<a />
</center>
find all "center" tags that has nested link
thanks!

Use:
//center[.//a]
This selects all center elements in the document that have an a descendent.
And this:
//center[.//*/a]
selects all center elements in the document that have an a descendent, which is not a child of this center element.

How about the following:
//center[element()//a]
This says to find all 'center' elements that contain any 'a' elements that
are descendents of 'center's direct element children.

Can't you use the descendant axis in the predicate?
//center[descendant::a]

Xpath - get only node content without other elements

I have an div elemet:
<div>
This is some text
<h1>This is a title</h1>
<div>Some other content</div>
</div>
What xpath expression should I use to only get the div content without his child elements
h1 and div
//div[not(h1)&not(div)]
Something like that? I cannot figure it out

To get the string value of div use:
string(/div)
This is the concatenation of all text nodes that are descendents of the (top) div element.
To select all text node descendents of div use:
/div//text()
To get only the text nodes that are direct children of div use:
/div/text()
Finally, get the first (and hopefully only) non-whitespace-only text node child of div:
/div/text()[normalize-space()][1]

What xpath expression should I use to
only get the div content without his
child elements h1 and div
This XPath expression:
/div/node()[not(self::h1|self::div)]
It selects every div root element's children except those h1 or div elements.

expression like ./text() will retrieve only the content of root element only.
Regards,
Nitin

You can use this XPath expression:
./div[1]/text()[1]
to test, I use this online tester : http://xpather.com/

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

xpath to get all the childrens text - xpath

Is there any way to get all the childrens node values within the ul tag. Input: <ul> <li class="type">Industry</li> <li>Automotive</li> <li>Parts </li> <li>Tires</li> </ul> Output: Industry, Automotive, Parts, Tires.

This will retrieve all text elements with a parent ul element. //ul/descendant::*/text()

Related

XPath to text node whose ancestor has a descendant that contains specific text string

Removing empty nodes but keep nodes with image tags

following-sibling for any descendant confusion

Find all elements that has a specified nested child

Xpath - get only node content without other elements

Categories

Resources