nokogiri: why is this an invalid xpath? - xpath

//br/preceding-sibling::normalize-space(text())
i am getting invalid xpath expression with nokogiri

normalize-space is a function. You can't use it there.
You need a node-set.
maybe you mean
//br/preceding-sibling::*
or you could use normalize-space in a predicate, inside square brackets. Think of the predicate as a filter or selector on the node-set. So you can do this:
//br/preceding-sibling::*[normalize-space()='Fred']
In English that translates to "all elements preceding <br> in the document, and for which the (normalized) text is 'Fred' ". In this document:
<html>
<p>
<h2>Fred</h2>
<br/>
</p>
<table>
<tr>
<td>
<br/>
</td>
</tr>
</table>
</html>
...the xpath expression selects the <h2> node.
I figured this out with the free XpathVisualizer tool available on codeplex.

Related

How to select the specific sibling of an ancestor using XPath

I have the following HTML structure:
<p>
<!-- Span can be any level deep -->
<span>
Some text
</span>
</p>
<!-- Any number of different elements between span and table -->
<p></p>
<div></div>
<table>
<tr>
<td></td>
</tr>
</table>
Using Nokogiri and custom XPath functions I am able to select the <span> element containing context that matches the regex. I am forced to do it this way since Nokogiri is using XPath 1.0 and there is no support for the matches selector:
#doc.xpath("//span[regex_match(text(), '/some text/i')]")
Having the span node selected, how do I select the table that is visually following the span?
I use the contains function to match the text. Then use following::table to find the table following this span tag.
#doc.xpath("//span[contains(text(), 'Some text')]/following::table")

Finding first matching sibling element while traversing the DOM

I am trying to create an xpath expression that will find the first matching sibling 'down' the dom given an initial sibling (note: initial siblings will be Tom and Steve). For example, I want to find 'jerry1' under the 'Tom' tr. I have looked into the following-sibling argument, but I'm not sure that's the best approach for this? Any ideas?
<tr>
<a title=”Tom”/>
</tr>
<tr>
<a title=”jerry1”/>
</tr>
<tr>
<a title=”jerry2”/>
</tr>
<tr>
<a title=”jerry3”/>
</tr>
<tr>
<a title=”Steve”/>
</tr>
<tr>
<a title=”jerry1”/>
</tr>
<tr>
<a title=”jerry2”/>
</tr>
<tr>
<a title=”jerry3”/>
</tr>
following-sibling will work. This will select the a node with the title "jerry1":
//a[#title='Tom']/../following-sibling::tr/a
The /.. traverses up to Tom's parent <tr>, then following-sibling to the next <tr>, then finally the <a> node within that.
Following XPath worked for me:
(//a[#title='Tom']/parent::*/following-sibling::tr/a[#title= 'jerry1'])[1]
First matching a with title jerry1 following a tr with an a-child with title Tom.
Starting at a[#title='Tom'], going to the parent tr with /parent , selecting all following sibling tr-nodes with ::*/following-sibling::tr, that have an /a[#title= 'jerry1'] as child node. Because this would select 2 jerry1-nodes and the first jerry1 following Tom is searched, selecting the first one by wrapping the XPath with () and choosing the first match with [1].
The following XPath statement finds the first tr element that has an a with the #title "jerry1" that is a following-sibling of the tr element that has an a with the #title of "Tom"
//tr[a/#title='Tom']/following-sibling::tr[a/#title='jerry1'][1]

xpath with multiple contains statements do not function correctly

I have html code as follows below. I am trying to access it with selenium. If I do a
//*[contains(text(),'Add OfficeContract (Portal)')]
it finds several (there is more html that has more occurrences). So I want to find a specific instance but when I try
//*[contains(text(),'Add OfficeContract (Portal)') and contains(text(),'7121995')]
There are no matches found. SImpy doing
//*[contains(text(),'7121995')]
Finds all sorts of stuff (html is full of that string)
HTML CODE
<tr class="pd" valign="top"><br>
<td> </td><br>
<td nowrap="">SQAAUTO</td><br>
<td nowrap="">01/30/2014 9:47:48 AM</td><br>
<td><br>
<b>Add OfficeContract (Portal)</b><br>
<br><br>
Office Id 7121995<br>
<br><br>
Contract ID added: "8976504"<br>
<br><br>
Term Date added: "12/31/9999"<br>
<br><br>
</td><br>
</tr>
I believe the issue here is that the two strings are not found together in the same element (based on your sample).
For the above xpath to return a result you would need an element like this:
<b>Add OfficeContract (Portal) 7121995</b>

Xpath: howto return empty values

I have an Xpath like following:
"//<path to some table>/*/td[1]/text()"
and it returns text values of all non-empty tds, for example:
<text1>, <text2>, <text3>
But the problem is that between nodes, that contain mentioned values could be some empty tds elements:
What i want is to get result that contain some identifiers, that there is those empty values, for example:
<text1>,<>, <>, <text2>, <text3>, <>
or
<text1>,<null>, <null>, <text2>, <text3>, <null>
I tried to use next one:
"//<path to some table>/*/string(td[1]/text())"
but it returns undefined
Of course, I could just get whole node and then work with it in my code (cut all unnecessary info), but may be there is a better way?
html example for that case:
<html>
<body>
<table class="tablesorter">
<tbody>
<tr class="tr_class">
<td>text1</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text2</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text3</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
</tbody>
</table>
</body>
</html>
Well simply select the td elements, not its text() child nodes. So with the path changed to //<path to some table>/*/td[1] or maybe //<path to some table>/*/td you will get a node-set of td elements, whether they are empty or not, and you can then access the string contents of each node (with XPath (select string(.) for each element node) or host environment method e.g. textContent in the W3C DOM or text in the MSXML DOM.). That way the empty strings will be included.
In case you use XPath 2.0 or XQuery you can directly select //<path to some table>/*/td/string(.) to have a sequence of string values. But that approach with a function call in the last step is not supported in XPath 1.0, there you can select the td element nodes and then access the string value of each in a separate step.
Do you mean you want only the td[1] with text and get rid of ones without text? If so, you can use this xpath
//td[1][string-length(text()) > 1]

XPath - Locate node using its flattened descendant text

I got this html :
<tr>
<td>
Some
<strong>
text
</strong>
<em>
and more
</em>
</td>
...
</tr>
I need to locate my td element with this text Some text and more. I know that I can get this text with this XPath expression :
//td//text()
But I can not find a solution to locate td element. I try this :
//td[//text()='Some text and more']
but I get errors. Do you know a working XPath expression for that ?
Firstly, XPath uses forward slashes, never backslashes.
Secondly, I believe this may be the XPath you need:
//td[normalize-space(.) = 'Some text and more']
Could you give that a try?

Resources