How to match br tag in XPath text() function - ruby

I have a following element.
driver = Selenium::WebDriver.for :phantomjs
driver.xpath("/html/body/form/table/tbody/tr[14]/td/table/tbody/tr/td/table/tbody/tr[1]/td[1]/font").text
=> "unique\ntext"
But I don't want to rely on unstable table layout, so I decided to use text() function in xpath like:
driver.xpath("//font[text()='unique\ntext']")
=> nil
But as you see, I couldn't find the element by the text() function. The original text is unique<br>text.
How can I match the <br> tag by using XPath?
There is no id or name attributes that I can use.

The text() test selects any text nodes. In this example there are two such nodes, before and after the <br>. It is not the same as the text method or the string value of the parent node.
One way of selecting what you want could be like this:
driver.xpath("//font[ . ='unique\ntext']")
You might need to add extra newlines before or after the text. Note that this relies on Ruby doing the conversion of \n into an actual newline character before passing the query to the XPath processor, so you need to be careful about getting your quotes right. This compares the string-value of the node, which for an element is the concatenation of all the descendent text nodes, which is what you want.
A better solution might be to use the normalize-space() function here (as long as the unique aspect of the text doesn’t depend on the newlines).
Try:
driver.xpath("//font[normalize-space()='unique text']")
Note that all leading and trailing whitespace in the target text has been removed, and any internal whitespace is changed to a single space character.

Related

Xpath Expression Interpretation

Can someone please explain what the below Xpath expressions mean?
//node()[not(*)][not(normalize-space())]
//node()[not(*)][not(normalize-space())][not(boolean(#Key))]
//node()[not(text())]
I understand //node() means any node, but not sure with the following expressions.
//node()[not(*)][not(normalize-space())]
All element, text, comment, and processing-instruction nodes, anywhere in the document, that do not have a child element node and whose string value is either empty or consists entirely of whitespace
//node()[not(*)][not(normalize-space())][not(boolean(#Key))]
As above, with the extra condition that there is no #Key attribute. The last predicate is badly written: it could be shortened to [not(#Key)] without changing its meaning.
//node()[not(text())]
All element, text, comment, and processing-instruction nodes, anywhere in the document, that do not have a child text node.
Updated (thanks to #Michael Kay comment)
First one:
//node() all nodes in the document (including text, comment and processing instruction but not attributes)
[not(*)] which does not have any child element nodes
[not(normalize-space())] which does not have any text content (beside of whitespace).
Second one:
Same as first one but additional:
[not(boolean(#Key))] the node has no attribute Key
Update:
For the third one have a look to e.g. this
In your example this will also ignore nodes with any text content (even white space)

Contains(.,'text') function for matching text

I'm trying to select things in a table, and currently have the following expression
//*[#id='row']/tbody/tr[contains(., 'user2')]/td[contains(., 'user2')]
however, this is obviously a problem when there are users entered such as 'user 25', because that also contains 'user 2'. Can someone help me fix what's wrong with the following expressions in which I tried to match the text values exactly? (just the row for now)
//*[#id='row']/tbody/tr[text()='user2']
I tried normalizing space too, didn't seem to work
//*[#id='row']/tbody/tr[normalize-space(text())='user2']
If it will help here is the html of the page
<table id="row" class="gradientTable">
<td>
user2
</td>
<td>User2</td>
<td>User2</td>
<td>user2#mail.com</td>
<td>2</td>
<td>Student</td></tr>
<tr class="even">
The expression
//*[#id='row']/tbody/tr[.//text()[normalize-space(.)='user2']]
matches any <tr> for which any single descendant text node has the exact content user2 (after space normalization).
Note that this won't match anything in your example html. That example seems to be broken, because there's only one <tr> there, and it has no content that we can see.
Addendum:
You asked, "how exactly does .//text()[] work"?
. selects the context node (which in the above case is a tr element).
//text() selects any text node that is a descendant (of the aforementioned tr element).
[...] gives a predicate that "filters" what the preceding expression selects. So in this case it filters all text nodes that are descendants of the context tr, keeping only those whose space-normalized text content is 'user2'.
All this, as a predicate for tr, means to filter the tr elements, keeping only those for which there is at least one descendant text node whose space-normalized text content is 'user2'.
As Michael Kay pointed out, that may or may not be exactly what you want, depending on whether you want to match a table cell that contains user2 spread across b or i elements.
Addendum 2:
Can someone help me fix what's wrong with the following expressions in
which I tried to match the text values exactly? (just the row for now)
//*[#id='row']/tbody/tr[text()='user2']
What this expression matches is tr elements that have a direct child (not grandchild) text node whose value is exactly 'user2', e.g. <tr>textNode1<td>...</td>user2</tr>. Since text in tables is usually in a td element instead of directly under a tr, the above expression typically matches nothing.
//*[#id='row']/tbody/tr[normalize-space(text())='user2']
Aside from space normalization, this expression also collapses the generality of the = comparison. In other words... The previous XPath expression asks whether the tr element has any text node child whose value is user2; but this one only asks whether the tr element's first text node child has a value user2.
Why? Because the normalize-space() function takes a single string value as its argument. So if you supply text() as the argument, and there are several text() children, you are supplying a node-set (or sequence in XPath 2.0). The node-set gets converted to a string by taking the string-value of the first node in the node-set.
To get a general comparison back, with normalization, you would use
//*[#id='row']/tbody/tr[text()[normalize-space(.)='user2']]
(The . argument is the default anyway, but I prefer making it explicit.) Again, this will only work with text nodes that are direct children of tr, so you'll probably want a descendant axis in there:
//*[#id='row']/tbody/tr[.//text()[normalize-space(.)='user2']]
If you are trying to find the table cells (td) elements that contain the exact value "user 2", then you want
//*[#id='row']/tbody/tr/td[. = 'user2']
People often misuse "contains" here because they think it has the same meaning as in the English sentence above, "a node contains a value". But that's what "=" does in XPath; the XPath contains() function tests whether the content of the node has a substring equal to "user2".
Don't use text() here. The text() expression selects individual text nodes. But your content isn't necessarily all part of the same text node, for example it might be "user<b>2</b>".

What is the difference between normalize-space(.) and normalize-space(text())?

I was writing an XPath expression, and I had a strange error which I fixed, but what is the difference between the following two XPath expressions?
"//td[starts-with(normalize-space()),'Posted Date:')]"
and
"//td[starts-with(normalize-space(text()),'Posted Date:')]"
Mainly, what will the first XPath expression catch? Because I was getting a lot of strange results. So what does the text() make in the matching? Also, is there is a difference if I said normalize-space() & normalize-space(.)?
Well, the real question is: what's the difference between . and text()?
. is the current node. And if you use it where a string is expected (i.e. as the parameter of normalize-space()), the engine automatically converts the node to the string value of the node, which for an element is all the text nodes within the element concatenated. (Because I'm guessing the question is really about elements.)
text() on the other hand only selects text nodes that are the direct children of the current node.
So for example given the XML:
<a>Foo
<b>Bar</b>
lish
</a>
and assuming <a> is your current node, normalize-space(.) will return Foo Bar lish, but normalize-space(text()) will fail, because text() returns a nodeset of two text nodes (Foo and lish), which normalize-space() doesn't accept.
To cut a long story short, if you want to normalize all the text within an element, use .. If you want to select a specific text node, use text(), but always remember that despite its name, text() returns a nodeset, which is only converted to a string automatically if it has a single element.

How to match text sequences that continue through child nodes (e.g. with sgml-style markup)?

<bits>
<thing>Match this please</thing>
<thing>Don't match this</thing>
<thing>Match <b>this</b> please</thing>
</bits>
An expression like this:
//thing[text()='Match this please']
will locate the first 'thing' but not the third, because the phrase is distributed through a child node.
Is there an expression that would match the first and the third 'thing' in my example?
Try:
//thing[string()='Match this please']
jsfiddle:
http://jsfiddle.net/ZG9n3/2/
Please check the reference to see if this is going to work for you:
http://www.w3.org/TR/xpath/#function-string
Is there an expression that would
match the first and the third 'thing'
in my example?
You mean: Is there an expression that would select the first and the third element named thing, based on their string value.
Use:
/*/thing[. = 'Match this please']
The predicate compares the string value of the context node to the string "Match this please".
By definition the string value of an element is the concatenation (in document order) of all of its text-nodes descendents.
Note: Always try to avoid the // abbreviation because its use may incur big inefficiency. Whenever the structure of an XML document is known, use a chain of specific location steps.

Locating the node by value containing whitespaces using XPath

I need to locate the node within an xml file by its value using XPath.
The problem araises when the node to find contains value with whitespaces inside.
F.e.:
<Root>
<Child>value</Child>
<Child>value with spaces</Child>
</Root>
I can not construct the XPath locating the second Child node.
Simple XPath /Root/Child perfectly works for both children, but /Root[Child=value with spaces] returns an empty collection.
I have already tried masking spaces with %20, & #20;, & nbsp; and using quotes and double quotes.
Still no luck.
Does anybody have an idea?
Depending on your exact situation, there are different XPath expressions that will select the node, whose value contains some whitespace.
First, let us recall that any one of these characters is "whitespace":
-- the Tab
-- newline
-- carriage return
' ' or -- the space
If you know the exact value of the node, say it is "Hello World" with a space, then a most direct XPath expression:
/top/aChild[. = 'Hello World']
will select this node.
The difficulties with specifying a value that contains whitespace, however, come from the fact that we see all whitespace characters just as ... well, whitespace and don't know if a it is a group of spaces or a single tab.
In XPath 2.0 one may use regular expressions and they provide a simple and convenient solution. Thus we can use an XPath 2.0 expression as the one below:
/*/aChild[matches(., "Hello\sWorld")]
to select any child of the top node, whose value is the string "Hello" followed by whitespace followed by the string "World". Note the use of the matches() function and of the "\s" pattern that matches whitespace.
In XPath 1.0 a convenient test if a given string contains any whitespace characters is:
not(string-length(.)= stringlength(translate(., '
','')))
Here we use the translate() function to eliminate any of the four whitespace characters, and compare the length of the resulting string to that of the original string.
So, if in a text editor a node's value is displayed as
"Hello World",
we can safely select this node with the XPath expression:
/*/aChild[translate(., '
','') = 'HelloWorld']
In many cases we can also use the XPath function normalize-space(), which from its string argument produces another string in which the groups of leading and trailing whitespace is cut, and every whitespace within the string is replaced by a single space.
In the above case, we will simply use the following XPath expression:
/*/aChild[normalize-space() = 'Hello World']
Try either this:
/Root/Child[normalize-space(text())=value without spaces]
or
/Root/Child[contains(text(),value without spaces)]
or (since it looks like your test value may be the issue)
/Root/Child[normalize-space(text())=normalize-space(value with spaces)]
Haven't actually executed any of these so the syntax may be wonky.
Locating the Attribute by value containing whitespaces using XPath
I have a input type element with value containing white space.
eg:
<input type="button" value="Import Selected File">
I solved this by using this xpath expression.
//input[contains(#value,'Import') and contains(#value ,'Selected')and contains(#value ,'File')]
Hope this will help you guys.
"x0020" worked for me on a jackrabbit based CQ5/AEM repository in which the property names had spaces. Below would work for a property "Record ID"-
[(jcr:contains(jcr:content/#Record_x0020_ID, 'test'))]
did you try #x20 ?
i've googled this up like on the second link:
try to replace the space using "x0020"
this seems to work for the guy.
All of the above solutions didn't really work for me.
However, there's a much simpler solution.
When you create the XMLDocument, make sure you set PreserveWhiteSpace property to true;
XmlDocument xmldoc = new XmlDocument();
xmldoc.PreserveWhitespace = true;
xmldoc.Load(xmlCollection);

Resources