xpath normalize-space with contains [duplicate] - xpath

This question already has answers here:
XPath contains(text(),'some string') doesn't work when used with node with more than one Text subnode
(7 answers)
Closed 4 years ago.
I have an xpath string //*[normalize-space() = "some sub text"]/text()/.. which works fine if the text I am finding is in a node which does not have multiple text sub nodes, but if it does then it won't work, so I am trying to combine it with contains() as follows: //*[contains(normalize-space(), "some sub text")]/text()/.. which does work, but it always returns the body and html tags as well as the p tag which contains the text. How can I change it so it only returns the p tag?

It depends exactly what you want to match.
The most likely scenario is that you want to match some text if it appears anywhere in the normalized string value of the element, possibly split across multiple text nodes at different levels: for example any of the following:
<p>some text</p>
<p>There was some text</p>
<p>There was <b>some text</b></p>
<p>There <b>was</b> some text</p>
<p>There was <b>some</b> <!--italic--> <i>text</i></p>
<p>There was <b>some</b> text</p>
If that's the case, then use //p[contains(normalize-space(.), "some text")].
As you point out, using //* with this predicate will also match ancestor elements of the relevant element. The simplest way to fix this is by using //p to say what element you are looking for. If you don't know what element you are looking for, then in XPath 3.0 you could use
innermost(//*[contains(normalize-space(.), "some text")])
but if you have the misfortune not to be using XPath 3.0, then you could do (//*[contains(normalize-space(.), "some text")])[last()], though this doesn't do quite the same thing if there are multiple paragraphs with the required content.
If you don't want to match all of the above, but want to be more selective, then you need to explain your requirements more clearly.
Either way, use of text() in a path expression is generally a code smell, except in the rare cases where you want to select text in an element only if it is not wrapped in other tags.

Related

Possible to run two completely different x-path

Can anyone please help me here ?
I want to run two xpath together and store the value, I am not sure if it is possible.
My one xpath is fetching City and second is state
//div[(text()='city')]/following-sibling::div
//div[contains(text(),'state')]/following-sibling::div
As xpath is telling name of city and state is provided in next div of city and state. I want to run both and capture output in string format.
On side note: both xpath is working fine for me.
<div>
<div>City</div>
<div>London</div>
</div>
<--In between some other elements like p, section other divs-->
<div>
<div>state</div>
<div>England</div>
</div>
It sounds like you want to convert the results of the two XPath expressions to strings, and concatenate those strings. The expression below concatenates them (with a single space between) using the XPath concat function.
concat(
//div[(text()='city')]/following-sibling::div,
' ',
//div[contains(text(),'state')]/following-sibling::div
)
One other thing: note that in your example XML the text of the first div is "City" rather than "city". Make sure the strings in your XPath expression match the text exactly because the expression 'City'='city' evaluates to false

How to write Xpath expressions to distinguish between results?

I am new to xpath expression. Need help on a issue
Consider the following Document :
<tbody><tr>
<td>By <strong>Bec</strong></td>
<td><strong>Great Support</strong></td>
</tr></tbody>
In this I have to find the text inside tags separately.
Following is my xpath expression:
//tbody//td//strong/text();
It evaluates output as expected:
Bec
Great Support
How can I write xpath expressions to distinguish between the results i.e Becand Great Support
It's rather unclear what you're trying to do, but the following should succeed in selecting them separately:
//tbody/tr/td[1]/strong
and
//tbody/tr/td[2]/strong
Note that the text() you had at the end is most likely not needed in this case.
Not sure I understand 100%, but if you're trying to get the text of the first and the second strong tags, you can use position (1 based index)
//tbody/td[position()=1]/strong/text() //first text
//tbody/td[position()=2]/strong/text() //second text
This solution only applies to the current sample though, where your strong tags are inside either the first or second td tag.
Not sure this is what you're looking for... anyway, assuming you're asking to retrieve a node based on its text you can look up for text content by doing something like:
//tbody//td//strong/text()[.="Bec"]
PS
in [.=""] the dot is an alias for text() self::node() (thanks JLRishe for pointing out the mistake).

XPath selector for matching multiple classes [duplicate]

This question already has answers here:
How can I match on an attribute that contains a certain string?
(10 answers)
Closed 9 years ago.
I've been searching for the past 30 minutes or so, but I can't seem to an answer to how to create an xpath selector that will match multiple classes.
After reading this: How can I match on an attribute that contains a certain string?
The closest solution I can find is:
//div[contains(#class,'atag') and contains(#class ,'btag')]
However, one of the comment suggests that it would also match:
<div class="Patagonia Halbtagsarbeit">
What XPath selector should I use to select a div with multiple classes?
Example:
<div class="fl badge bolded shadow">
I would suggest backing the xpath up to locate the div more specifically so that other divs with the same classes could not be selected instead. You can use FireBug's FirePath to get the absolute xpath.

xPath expression for nested nodes

I'm trying to come up with a complex xPath expression but I can't figure out how to do that. Imagine you have some HTML like this:
<span>
something1
<br>
something2
<br>
something3
</span>
Imagine that sometimes the second <br> and the subsequent "something3" are not present. I would like to create an xPath expression that takes all the span nodes and its content up to the first <br> so that I end up parsing just "something1". I don't know if this is possible, if not does anyone know a way to get that after having parsed all the <span> nodes?
I have to say that I'm using HtmlParser, which is a Java library which parses HTML and supports xPath expressions.
Thanks,
Masiar
I'm a bit confused by your description of the problem, but it sounds something like
//span/br[1]/preceding-sibling::text()

xpath expression to select text from link

I have such content of html file:
<a class="bf" title="Link to book" href="/book/229920/">book name</a>
Help me to construct xpath expression to get link text (book name).
I try to use /a, but expression evaluates without results.
If the context is the entire document you should probably use // instead of /. Also you may (not sure about that) need to get down one more level to retrieve the text.
I think it should look like this
//a/text()
EDIT: As Tomalak pointed out it's text() not text
Have you tried
//a
?
More specific is better:
//a[#class='bf' and starts-with(#href, '/book/')]
Note that this selects the <a> element. In your host environment it's easy to extract the text value of that node via standard DOM methods (like the .textContent property).
To select the actual text node, see the other answers in this thread.
It depends also on the rest of your document. If you use // in the beginning all the matching nodes will be returned, which might be too many results in case you have other links in your document.
Apart from that a possible xpath expression is //a/text().
The /a you tried only returns the a-tag itself, if it is the root element. To get the link text you need to append the /text() part.

Resources