XPath cannot select right nodes

XPath cannot select right nodes - xpath

Here is an example:
<html>
<table>
<tbody>
<tr>
<td>07 Oct 13</td>
<td>a</td>
</tr>
<tr>
<td>07 Sep 13</td>
<td>b</td>
</tr>
<tr>
<td>07 Sep 13</td>
<td>c</td>
</tr>
</tbody>
</table>
</html>
So I need to select td[2] elements if td[1] has unique date. In this example we must get only {a, b} because element "b" and "c" has the same date. I can get only the unique dates:
//table//td[(position() = 1 and not(. = preceding::*/td))]
Output: {07 Oct 13, 07 Sep 13}
But how can I get only td[2] elements?

As written earlier in a comment, I would select the matching parent elements and from there, get the interesting child nodes.
I experimented a bit with this XPath //tr[td[1] and not(./td[1] = preceding::*/td)]/td[2] should do what you want.
So I select a tr for which the 1st td was not matched before. Of this tr we then select the second td element.
With that xPath I have your desired output.

Related

Xpath: Wildcards for descendant nodes not working

Desired output: 3333
<tbody>
<tr>
<td class="name">
<p class="desc">Intel</p>
</td>
</tr>
Other tr tags
<tr>
<td class="tel">
<p class="desc">3333</p>
</td>
</tr>
</tbody>
I want to select the last tr tag after the tr tag that has "Intel" in the p tag
//tbody//tr[td[p[contains(text(),'Intel')]]]/followingsibling::tr[position()=last()]//p/text()
The above works but I don't wish to reference td and p explicitly. I tried wildcards ? or *, but it doesn't work.
//tbody//tr[?[?[contains(text(),'Intel')]]]/followingsibling::tr[position()=last()]//p/text()

"...which contains a text node equal to 'Intel'"
//tbody/tr[.//text() = 'Intel']/following-sibling::tr[last()]/td/p/text()
"...which contains only the string 'Intel', once you remove all insignificant white-space"
//tbody/tr[normalize-space() = 'Intel']/following-sibling::tr[last()]/td/p/text()
I think the key take-away here is that you can use descendant paths (//) and pay attention to context in predicates once you make them relative (.//).

Xpath - How to select related cousin data

<html>
<table border="1">
<tbody>
<tr>
<td>
<table border="1">
<tbody>
<tr>
<th>aaa</th>
<th>bbb</th>
<th>ccc</th>
<th>ddd</th>
<th>eee</th>
<th>fff</th>
</tr>
<tr>
<td>111</td>
<td>222</td>
<td>333</td>
<td>444</td>
<td>555</td>
<td>666</td>
</tr>
</tbody>
</table>
</td>
</tr>
</tbody>
</table>
</html>
How can i select specific related cousin data using xpath, The desired output would be be:
<th>aaa</th>
<th>ccc</th>
<th>fff</th>
<td>111</td>
<td>333</th>
<td>666</td>
The most important aspect of the xpath is that i am looking to be able to include or exclude certain <th> tags and their corresponding <td>tags
So based on the answers so far the closest I have is:
//th[not(contains(text(), "ddd"))] | //tr[2]/td[not(position()=4)]
Is there any way of not explicitly using position()=4 but instead reference the corresponding th tag

Using XPath 3.0 you can structure that into
let $th := //table/tbody/tr[1]/th,
$filteredTh := $th[not(. = ("bbb", "ddd", "eee"))],
$pos := $filteredTh!index-of($th, .)
return ($filteredTh, //table/tbody/tr[position() gt 1]/td[position() = $pos])

I'm not sure that this is the best solution, but you might try
//th[not(.="bbb") and not(.="ddd") and not(.="eee")] | //tr[2]/td[not(position()=index-of(//th, "bbb")) and not(position()=index-of(//th, "ddd")) and not(position()=index-of(//th, "eee"))]
or shorter version
//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(index-of(//th, "bbb"), index-of(//th, "ddd"),index-of(//th, "eee")))]
that returns
<th>aaa</th>
<th>ccc</th>
<th>fff</th>
<td>111</td>
<td>333</td>
<td>666</td>
You can avoid using complicated XPath expressions to get required output. Try to use Python + Selenium features instead:
# Get list of th elements
th_elements = driver.find_elements_by_xpath('//th')
# Get list of td elements
td_elements = driver.find_elements_by_xpath('//tr[2]/td')
# Get indexes of required th elements - [0, 2, 5]
ok_index = [th_elements.index(i) for i in th_elements if i.text not in ('bbb', 'ddd', 'eee')]
for i in ok_index:
print(th_elements[i].text)
for i in ok_index:
print(td_elements[i].text)
Output is
'aaa'
'ccc'
'fff'
'111'
'333'
'666'
If you need XPath 1.0 solution:
//th[not(.=("bbb", "ddd", "eee"))]| //tr[2]/td[not(position()=(count(//th[.="bbb"]/preceding-sibling::th)+1, count(//th[.="ddd"]/preceding-sibling::th)+1, count(//th[.="eee"]/preceding-sibling::th)+1))]

XPath: Find first occurance in children and siblings

So I have some HTML that looks like thus:
<tr class="a">
<td>...</td>
<td>...</td>
</tr>
<tr>
<td>....</td>
<td class="b">A</td>
</tr>
<tr>....</tr>
<tr class="a">
<td class="b">B</td>
<td>....</td>
</tr>
<tr>
<td class="b">Not this</td>
<td>....</td>
</tr>
I'm basically wanting to find the first instance of td class b following a tr with a class of a. Problem comes about is that it could be in either a child of the tr or in the next tr after it.
I can get the second case with:
//tr[#class="a"]//td[#class="b"]
But that misses the first case, because the TD is in a sibling, not a direct descendant. Ideas?

For the 2nd case (td is direct descendant of tr) :
//tr[#class="a"]//td[#class="b"][1]
For the 1st case (td is following tr), that isn't fall in the the 2nd case category :
//tr[#class="a" and not(.//td[#class="b"])]/following::td[#class="b"][1]
Combining the two xpath queries together using union operator (|) yield the expected output :
//tr[#class="a"]//td[#class="b"][1] | //tr[#class="a" and not(.//td[#class="b"])]/following::td[#class="b"][1]
output :
Element='<td class="b">A</td>'
Element='<td class="b">B</td>'

Xpath: howto return empty values

I have an Xpath like following:
"//<path to some table>/*/td[1]/text()"
and it returns text values of all non-empty tds, for example:
<text1>, <text2>, <text3>
But the problem is that between nodes, that contain mentioned values could be some empty tds elements:
What i want is to get result that contain some identifiers, that there is those empty values, for example:
<text1>,<>, <>, <text2>, <text3>, <>
or
<text1>,<null>, <null>, <text2>, <text3>, <null>
I tried to use next one:
"//<path to some table>/*/string(td[1]/text())"
but it returns undefined
Of course, I could just get whole node and then work with it in my code (cut all unnecessary info), but may be there is a better way?
html example for that case:
<html>
<body>
<table class="tablesorter">
<tbody>
<tr class="tr_class">
<td>text1</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text2</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td>text3</td>
<td>{some text}</td>
</tr>
<tr class="tr_class">
<td></td>
<td>{some text}</td>
</tr>
</tbody>
</table>
</body>
</html>

Well simply select the td elements, not its text() child nodes. So with the path changed to //<path to some table>/*/td[1] or maybe //<path to some table>/*/td you will get a node-set of td elements, whether they are empty or not, and you can then access the string contents of each node (with XPath (select string(.) for each element node) or host environment method e.g. textContent in the W3C DOM or text in the MSXML DOM.). That way the empty strings will be included.
In case you use XPath 2.0 or XQuery you can directly select //<path to some table>/*/td/string(.) to have a sequence of string values. But that approach with a function call in the last step is not supported in XPath 1.0, there you can select the td element nodes and then access the string value of each in a separate step.

Do you mean you want only the td[1] with text and get rid of ones without text? If so, you can use this xpath
//td[1][string-length(text()) > 1]

Find all preceding sibling nodes until one is found with a specific child node attribute

I would like to get all table rows after a specific row identifier (an attribute on the row column) until that specific row identifier is found.
Here is the html I'm trying to parse:
<tr>
<td colspan="4">
<h3>Header 1</h3>
</td>
</tr>
<tr>
<td>Item desc - Header 1</td>
<td>more info</td>
<td>30</td>
<td>500</td>
</tr>
<tr>
<td colspan="4">
<h3>Header 2</h3>
</td>
</tr>
<tr>
<td>Item desc - header 2</td>
<td>other</td>
<td>4</td>
<td>49</td>
</tr>
<tr>
<td>Item 2 desc - header 2</td>
<td>other 2</td>
<td>65</td>
<td>87</td>
</tr>
I want to be able to grab the item under header 1 and stop when it finds header 2; then the items under header 2 and stop when it finds a header 3; etc.
Is this possible under xpath? I can't get it to only find the TR nodes until it finds a child node with a specific attribute (of colspan="4").

This is not possible under XPath 1.0. You somehow have to fixate the header tr, because you are trying to find all its following siblings whose first preceding header tr is the original one. Without the reference to the original header, everything is possible. But you probably work in some kind of a language that you can use to remember the value.
For example, in xsh:
for my $x in //tr[td/#colspan="4"] {
echo ($x/td/h3) ;
for $x/following-sibling::tr[count(td)=4
and preceding-sibling::tr[count(td)=1][1]=$x]
echo " " (td) ;
}
Output:
Header 1
Item desc - Header 1 more info 30 500
Header 2
Item desc - header 2 other 4 49
Item 2 desc - header 2 other 2 65 87

This might give you what you're looking for, not the most orthodox means though:
//*/tr/td[not(child::h3)]/ancestor::tr
This will give you all the <td> nodes within a <tr> that isn't a header block.
And you can specify the header with:
//*/tr/td[not(child::h3/text()='Header 1')]/ancestor::tr
Or a more general:
//*/tr/td[not(child::h3[contains(text(),'Header')])]/ancestor::tr

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio

XPath cannot select right nodes - xpath

Related

Xpath: Wildcards for descendant nodes not working

Xpath - How to select related cousin data

XPath: Find first occurance in children and siblings

Xpath: howto return empty values

Find all preceding sibling nodes until one is found with a specific child node attribute

Categories

Resources