Xpath for selecting inner text of an element - xpath

I need to select the following element on a webpage
< td colspan = "1" >
< span class="text-left">
<strong>QTY Total
</strong></span>
<span style = "float: right;" >
< strong > 20.99 </ strong >
</ span >
</ td >
The element I need to extract is the 20.99 text which is a child element
I have tried the following to locate the element and it returns nothing
$x("//*[#style='float: right;' and contains(text(), '20.99')]")
I tried two locate the element using the following
$x("//*[#style='float: right;']")
But this locates to elements with the same style.
Does anybody know an Xpath I can try which will locate the 20.99?

In your first XPath the "20.99" text is the child strong. So you have to use that in the contains(...) clause like
//*[#style='float: right;' and contains(strong/text(), '20.99')]/strong
Probably an even better XPath would be searching explicitly for span elements
//span[#style='float: right;' and contains(strong/text(), '20.99')]/strong
which also works.
Both select the strong element child of a span element with the text "20.99".
The element I need to extract is the 20.99 text
To get the text() node of the strong tag append a /text() to the expressions:

Using text() is usually wrong: the predicate [contains(text(), 'xyz')] is almost always better written as [contains(., 'xyz')].
That's because . (which here is equivalent to string(.)) selects the content of the context element as a string regardless of any nested elements, comments etc, whereas text() only looks at immediate text node children.

Related

XPath with specific following sibling case

I have structure that looks something like this
<p>
<br>
<b>Text to fetch </b>
<br>
"Some random text"
<b>Text not to fetch</b>
I need XPath that will allow me to fetch following sibling of the br element only if there is no text between br element and his following sibling.
If I do something like this
//br/following-sibling::b/text()[1]
It will fetch both Text to fetch and Text not to fetch, while I only need Text to fetch.
Another possible XPath :
//br/following-sibling::node()[normalize-space()][1][self::b]/text()
brief explanation:
//br/following-sibling::node(): find all nodes that is following-sibling of br element, where the nodes are..
[normalize-space()]: not empty (whitespace only), then..
[1]: for each br found, take only the first of such node, then..
[self::b]: check if the node is a b element, then if it is a b element..
/text(): return text node that is child of the b element
Try below XPath to avoid matching b nodes with preceding sibling text:
//br/following-sibling::b[not(preceding-sibling::text()[1][normalize-space()])]/text()

XPath difference between two similar path and other questions

I've to made some exercices but
I don't really understand the difference between two similar path
I've the tree :
<b>
<t></t>
<a>
<n></n>
<p></p>
<p></p>
</a>
<a>
<n></n>
<p></p>
</a>
<a></a>
</b>
And we expect that each final tag contain one text node.
I've to explain the difference between //a//text() and //a/text()
I see that //a//text() return all text nodes and it seems legit,
but why //a/text() return the last "a node" -> text node ?
Another question :
why //p[1] return for each "a node", the first "p" child node ?
-> I've two results
<b>
<t></t>
<a>
<n></n>
**<p></p>**
<p></p>
</a>
<a>
<n></n>
**<p></p>**
</a>
<a></a>
</b>
Why the answer is not the first "p" node for the whole document ?
Thanks for all !
Difference between 1: //a//text() and 2: //a/text()
Let's break it down: //a selects all a elements, no matter where they are in the document. Suppose you have /a, that would select all root a elements.
If the / path expression comes after another element in an XPath expression, it will select elements directly descending the element before that in the XPath expression (ie child elements).
If the // path expression comes after another element in an XPath expression, it will select all elements that are descendant of the previous element, no matter where they are under the previous element.
Applying to your two XPath expressions:
//a//text(): Select all a elements no matter where they are in the document, and for those elements select text() no matter where they are under the a elements selected.
//a/text(): Select all a elements no matter where they are in the document, and for those elements select any direct descendant text().
Why //p[1] returns for each "a node", the first "p" child node?
Suppose you were to write //a/p[1], this would select the first p child element of any a element anywhere in the document. By writing //p[1] you are omitting an explicit parent element, but the predicate still selects the first child element of any parent the p element has.
In this case there are two parent a elements, for which the first p child element is selected.
It would be good to search for a good introduction to XPath on your favorite search engine. I've always found this one from w3schools.com to be a good one.

Need a xpath : where parent having multiple child, but i required only parent value

In below code: parent "div" having three child "span", "script" and "span". but i required the value of Parent "div" which "N/A". "N/A" not comes under any attribute of div. Its just a value of parent "div".
<div class="ah-text-align-right ah-font-xsmall" style="">
<span id="_dcmanageinvestmentsportlet_WAR_ahdcmnginvportlet__FDROR_110hidden" style="display:none">
<script type="text/javascript">
<span class="ah-float-left">
N/A
</div>
For getting parent element you can use double dot .. after child element xpath.
For getting text of an element you can use xpath text() function, but depending on implementation of xpath in whatever environment and code you use, it might be unavailable. Note, that text of an element will return actual text node of this element as well as all text nodes of child elements.
For your case if you search a parent of a span with ah-float-left class, then xpath should be something like following:
//span[#class='ah-float-left']/..
For getting text of a parent, you'll need following:
//span[#class='ah-float-left']/../text()
Note: looking elements up by class name may return you a collection of elements which in turn will return you collection of parent elements and collection of parent nodes texts, which may not be desired. I would recommend lookup child element by id, since xhtml prescribes that elements ids are unique. Thus, an xpath for a parent div should better look like following:
//span[#id='_dcmanageinvestmentsportlet_WAR_ahdcmnginvportlet__FDROR_110hidden']/..

XPath / XQuery: find text in a node, but ignoring content of specific descendant elements

I am trying to find a way to search for a string within nodes, but excluding ythe content of some subelements of those nodes. Plain and simple, I want to search for a string in paragraphs of a text, excluding the footnotes which are children elements of the paragraphs.
For example,
My document being:
<document>
<p n="1">My text starts here/</p>
<p n="2">Then it goes on there<footnote>It's not a very long text!</footnote></p>
</document>
When I'm searching for "text", I would like the Xpath / XQuery to retrieve the first p element, but not the second one (where "text" is contained only in the footnote subelement).
I have tried the contains() function, but it retrieves both p elements.
Any help would be much appreciated :)
I want to search for a string in
paragraphs of a text, excluding the
footnotes which are children elements
of the paragraphs
An XPath 1.0 - only solution:
Use:
//p//text()[not(ancestor::footnote) and contains(.,'text')]
Against the following XML document (obtained from yours but added p s within a footnote to make this more interesting):
<document>
<p n="1">My text starts here/</p>
<p n="2">Then it goes on there
<footnote>It's not a very long text!
<p>text</p>
</footnote>
</p>
</document>
this XPath expression selects exactly the wanted text node:
My text starts here/
//p[(.//text() except .//footnote//text())[contains(., 'text')]]
/document/p[text()[contains(., 'text')]] should do.
For the record, as a complement to the other answers, I've found this workaround that also seems to do the job:
//p[contains(child::text()|not(descendant::footnote), "text")]

XPATH filter tag-less children

Is there any way to specify that I want to select only tag-less child elements (in the following example - "text")?
<div>
<p>...</p>
"text"
</div>
The text() function matches text nodes. Example: //div/text() — matches all text children within all div elements.
Use:
/*/text()[normalize-space()]
This selects all text nodes that are children of the top element of the document and that do not consist only of white-space characters.
In the concrete example this will select only the text node with string value:
'
"text"
'
The XPath expressions:
/*/text()
or
/div/text()
both select two text nodes, the first of which contains only white-space and the second is the same text node as above:
'
"text"
'
select only tag-less child elements
To me this sounds like selecting all elements that don't have other elements as children. But then again, "text" in your example is not an element, but a text node, so I'm not really sure what do you want to select...
Anyway, here is a solution for selecting such elements.
//*[not(*)]
Selects all elements that don't have an element as a child. Replace the first * with an element name if you only want to select certain elements that don't have child elements. Also note that using // is generally slow since it runs through the whole document. Consider using more specific path when possible (like /div/*[not(*)] in this case).

Resources