Contains(.,'text') function for matching text - xpath

I'm trying to select things in a table, and currently have the following expression
//*[#id='row']/tbody/tr[contains(., 'user2')]/td[contains(., 'user2')]
however, this is obviously a problem when there are users entered such as 'user 25', because that also contains 'user 2'. Can someone help me fix what's wrong with the following expressions in which I tried to match the text values exactly? (just the row for now)
//*[#id='row']/tbody/tr[text()='user2']
I tried normalizing space too, didn't seem to work
//*[#id='row']/tbody/tr[normalize-space(text())='user2']
If it will help here is the html of the page
<table id="row" class="gradientTable">
<td>
user2
</td>
<td>User2</td>
<td>User2</td>
<td>user2#mail.com</td>
<td>2</td>
<td>Student</td></tr>
<tr class="even">

The expression
//*[#id='row']/tbody/tr[.//text()[normalize-space(.)='user2']]
matches any <tr> for which any single descendant text node has the exact content user2 (after space normalization).
Note that this won't match anything in your example html. That example seems to be broken, because there's only one <tr> there, and it has no content that we can see.
Addendum:
You asked, "how exactly does .//text()[] work"?
. selects the context node (which in the above case is a tr element).
//text() selects any text node that is a descendant (of the aforementioned tr element).
[...] gives a predicate that "filters" what the preceding expression selects. So in this case it filters all text nodes that are descendants of the context tr, keeping only those whose space-normalized text content is 'user2'.
All this, as a predicate for tr, means to filter the tr elements, keeping only those for which there is at least one descendant text node whose space-normalized text content is 'user2'.
As Michael Kay pointed out, that may or may not be exactly what you want, depending on whether you want to match a table cell that contains user2 spread across b or i elements.
Addendum 2:
Can someone help me fix what's wrong with the following expressions in
which I tried to match the text values exactly? (just the row for now)
//*[#id='row']/tbody/tr[text()='user2']
What this expression matches is tr elements that have a direct child (not grandchild) text node whose value is exactly 'user2', e.g. <tr>textNode1<td>...</td>user2</tr>. Since text in tables is usually in a td element instead of directly under a tr, the above expression typically matches nothing.
//*[#id='row']/tbody/tr[normalize-space(text())='user2']
Aside from space normalization, this expression also collapses the generality of the = comparison. In other words... The previous XPath expression asks whether the tr element has any text node child whose value is user2; but this one only asks whether the tr element's first text node child has a value user2.
Why? Because the normalize-space() function takes a single string value as its argument. So if you supply text() as the argument, and there are several text() children, you are supplying a node-set (or sequence in XPath 2.0). The node-set gets converted to a string by taking the string-value of the first node in the node-set.
To get a general comparison back, with normalization, you would use
//*[#id='row']/tbody/tr[text()[normalize-space(.)='user2']]
(The . argument is the default anyway, but I prefer making it explicit.) Again, this will only work with text nodes that are direct children of tr, so you'll probably want a descendant axis in there:
//*[#id='row']/tbody/tr[.//text()[normalize-space(.)='user2']]

If you are trying to find the table cells (td) elements that contain the exact value "user 2", then you want
//*[#id='row']/tbody/tr/td[. = 'user2']
People often misuse "contains" here because they think it has the same meaning as in the English sentence above, "a node contains a value". But that's what "=" does in XPath; the XPath contains() function tests whether the content of the node has a substring equal to "user2".
Don't use text() here. The text() expression selects individual text nodes. But your content isn't necessarily all part of the same text node, for example it might be "user<b>2</b>".

Related

How are these two XPath expressions different?

I'm parsing a website using XPath. I've got two queries, one which finds the node I'm looking for:
//td[.//text()[contains(., "Date Filed:")]]
And one which doesn't:
//td[contains(.//text(), "Date Filed:")]
I don't understand how these are different. I'd read them both to mean, "Find td nodes which have a descendant text node containing Date Filed."
Can anybody explain how these are different?
Here's the HTML, though I don't think it's relevant to the question:
<td width="40%" valign="top">
<br><br><br><br><br>
<b>Date Filed:</b> 11/13/2008<br>
<b>Jury Demand: </b> No<br><br>
<br><b>Date Terminated: </b><br>
<br><b>Date Reopened: </b><br>
<br><b>Does this action raise an issue of constitutionality?: </b>Y<br>
</td>
(Don't look at me that way. I didn't make the website, the U.S. Gov't did.)
That is how string conversion works in XPath:
In the second query contains(.//text(), "Date Filed:") you call contains function. It accepts two arguments of type string, you first parameter .//text() is node-set datatype, which means string function gets called internally to convert list of nodes to string. In this case string(.//text()) returns only first text node. If you change your second query to this: //td[contains(., "Date Filed:")] it will select the wanted td.
In XPath 1.0, if you supply a node-set to a function like contains() that expects a string, it uses the string-value of the first node in the node-set (in document order).
In XPath 2.0 and later versions, if you supply a node-set to a function like contains() that expects a string, the node-set is atomized, and if the result contains more than one string (which will normally be the case when more than one node is selected), then you get a type error XPTY0004.
When you ask questions about XSLT or XPath on StackOverflow, please always say which version you are talking about.

Xpath Expression Interpretation

Can someone please explain what the below Xpath expressions mean?
//node()[not(*)][not(normalize-space())]
//node()[not(*)][not(normalize-space())][not(boolean(#Key))]
//node()[not(text())]
I understand //node() means any node, but not sure with the following expressions.
//node()[not(*)][not(normalize-space())]
All element, text, comment, and processing-instruction nodes, anywhere in the document, that do not have a child element node and whose string value is either empty or consists entirely of whitespace
//node()[not(*)][not(normalize-space())][not(boolean(#Key))]
As above, with the extra condition that there is no #Key attribute. The last predicate is badly written: it could be shortened to [not(#Key)] without changing its meaning.
//node()[not(text())]
All element, text, comment, and processing-instruction nodes, anywhere in the document, that do not have a child text node.
Updated (thanks to #Michael Kay comment)
First one:
//node() all nodes in the document (including text, comment and processing instruction but not attributes)
[not(*)] which does not have any child element nodes
[not(normalize-space())] which does not have any text content (beside of whitespace).
Second one:
Same as first one but additional:
[not(boolean(#Key))] the node has no attribute Key
Update:
For the third one have a look to e.g. this
In your example this will also ignore nodes with any text content (even white space)

How to match br tag in XPath text() function

I have a following element.
driver = Selenium::WebDriver.for :phantomjs
driver.xpath("/html/body/form/table/tbody/tr[14]/td/table/tbody/tr/td/table/tbody/tr[1]/td[1]/font").text
=> "unique\ntext"
But I don't want to rely on unstable table layout, so I decided to use text() function in xpath like:
driver.xpath("//font[text()='unique\ntext']")
=> nil
But as you see, I couldn't find the element by the text() function. The original text is unique<br>text.
How can I match the <br> tag by using XPath?
There is no id or name attributes that I can use.
The text() test selects any text nodes. In this example there are two such nodes, before and after the <br>. It is not the same as the text method or the string value of the parent node.
One way of selecting what you want could be like this:
driver.xpath("//font[ . ='unique\ntext']")
You might need to add extra newlines before or after the text. Note that this relies on Ruby doing the conversion of \n into an actual newline character before passing the query to the XPath processor, so you need to be careful about getting your quotes right. This compares the string-value of the node, which for an element is the concatenation of all the descendent text nodes, which is what you want.
A better solution might be to use the normalize-space() function here (as long as the unique aspect of the text doesn’t depend on the newlines).
Try:
driver.xpath("//font[normalize-space()='unique text']")
Note that all leading and trailing whitespace in the target text has been removed, and any internal whitespace is changed to a single space character.

Selenium: How to locate a node using exact text match

I want to locate a Element on a Web Page using text.
I know there is a method name contains to do so, for example:
tr[contains(.,'hello')]/td
But problem is if I have two elements name hello and hello1 then this function does not work properly.
Is there any other method like contains for exact string match for locating elements?
tr[.//text()='hello']/td
this will select all td child elements of all tr elements having a child with exactly 'hello' in it. Such an XPath still sounds odd to me.
I believe that this makes more sense:
tr/td[./text()='hello']
because it selects only the td that contains the text.
does that help?
It all depends on what your HTML actually contains, but your tr[contains(.,'hello')]/td XPath selector means "the first cell of the first row that contains the string 'hello' anywhere within it" (or, more accurately, "the first TD element in the TR element that contains the string 'hello' anywhere within it", since Selenium has no idea what the elements involved really do). That's why it's getting the wrong result when there are rows containing "hello" and "hello1" - both contain "hello".
The selector tr[. ='hello']/td would be more accurate, but it's a little unusual (because HTML TR elements aren't supposed to contain text - the text is supposed to be in TH or TD elements within the TR), and it probably won't work (because text in any other cells would break the comparison). You probably want tr[td[.='hello']]/td, which means "the first TD element contained in the TR element that contains a TD element that has the string 'hello' as it's complete text".
Well, your problem is that you are searching text into the tr (which is not correct anyway) and this cause a problem to the function contains which cannot accept a list of text. Try to use this location path instead. It should retrieve what you want.
//tr/td[contains(./text(),"hello")]
This location path will retrieve a set of node on which you have to iterate to get the text. You can try to append the
/text()
but this will cause (at least on my test) a result that is a string which is a concatenation of all the matched strings.
I had the same probem. I had a list of elements, one was named "Image" and another one was named "Text and Image". After reading all the posts here, non of the sugestions worked for me. So I tryed the following and it worked:
List<WebElement> elementList = new ArrayList<WebElement>();
elementList = driver.findElements(By.xpath("//*[text()= '" +componentName+"']"));
for(WebElement element : elementList){
if(element.getText().equals(componentName)){
element.click();
}
}

What is the difference between normalize-space(.) and normalize-space(text())?

I was writing an XPath expression, and I had a strange error which I fixed, but what is the difference between the following two XPath expressions?
"//td[starts-with(normalize-space()),'Posted Date:')]"
and
"//td[starts-with(normalize-space(text()),'Posted Date:')]"
Mainly, what will the first XPath expression catch? Because I was getting a lot of strange results. So what does the text() make in the matching? Also, is there is a difference if I said normalize-space() & normalize-space(.)?
Well, the real question is: what's the difference between . and text()?
. is the current node. And if you use it where a string is expected (i.e. as the parameter of normalize-space()), the engine automatically converts the node to the string value of the node, which for an element is all the text nodes within the element concatenated. (Because I'm guessing the question is really about elements.)
text() on the other hand only selects text nodes that are the direct children of the current node.
So for example given the XML:
<a>Foo
<b>Bar</b>
lish
</a>
and assuming <a> is your current node, normalize-space(.) will return Foo Bar lish, but normalize-space(text()) will fail, because text() returns a nodeset of two text nodes (Foo and lish), which normalize-space() doesn't accept.
To cut a long story short, if you want to normalize all the text within an element, use .. If you want to select a specific text node, use text(), but always remember that despite its name, text() returns a nodeset, which is only converted to a string automatically if it has a single element.

Resources