What XPath expression is able to fetch rows from a table regardless of presence of implicit tbody tag in DOM? [duplicate] - xpath

As in this Stack Overflow answer imagine that you need to select a particular table and then all the rows of it. Due to the permissiveness of HTML, all three of the following are legal markup:
<table id="foo"><tr>...</tr></table>
<table id="foo"><tbody><tr>...</tr></tbody></table>
<table id="foo"><tr>...</tr><tbody><tr>...</tr></tbody></table>
You are worried about tables nested in tables, and so don't want to use an XPath like
table[#id="foo"]//tr.
If you could specify your desired XPath as a regex, it might look something like:
table[#id="foo"](/tbody)?/tr
In general, how can you specify an XPath expression that allows an optional element in the hierarchy of a selector?
To be clear, I'm not trying to solve a real-world problem or select a specific element of a specific document. I'm asking for techniques to solve a class of problems.

I don't see why you can't use this:
//table[#id='foo']/tr|//table[#id='foo']/tbody/tr
If you want one expression without node set union:
//tr[(.|parent::tbody)[1]/parent::table[#id='foo']]

In XPath 2.0, the optional step can be expressed as (tbody|.).
//table[#id="foo"]/(tbody|.)/tr
XPathTester.com demo
The pipe (|) denotes union (of two node-sets), the dot (.) denotes identity step (returning just what the previous step did).
This can be expanded to include more optional elements at once:
//table[#id="foo"]/(thead|tbody|tfoot|.)/tr

Use:
//table[#id="foo"]/*[self::tbody or self::thead or self::tfoot]/tr
|
//table[#id="foo"]/tr
Select any tr element that is a child of any table that has an id attribute "foo" or any tr element that is a child of a tbody that is a child any table.

Related

how can I obtain a td with no value?

I have a table in which sometimes some records dont have a value
I am using these Xpath
//table/tbody/tr/td[not(td[string-length(normalize-space(text()))=0])]
//td[not(td[string-length(normalize-space(text()))=0])]
but it selects the whole table, how can I select only the td which are empty?
Thank you for all the help :)
Let's keep things simple. If you want to select tds without text try:
//table/tbody/tr/td[not(text())]
Demo
To complete, two alternatives to select empty td elements (the first one remove the useless parts of your XPath expression (normalize-space(), text(), and td[] inside the predicate) :
//td[string-length()=0]
//td[.=""]
The first XPath will look for td elements where the content length is equal to 0.
The second XPath will look for td elements which contain nothing.
But regarding your XPath tryouts, it seems you want to select td elements which are non-empty. If that's the case, just add a not inside the predicate :
//td[not(string-length()=0)]
//td[not(.="")]

XPath only get nodes from table when another node exists

I have a specific problem concerning XPath.
Say I can get a column from the table using this query:
//div[#id="someid"]/table/tbody/tr/td[9]/text()
However, I want to only get this column when another specific node exists.
I tried using:
//div[#id="someid"]/table/tbody/tr/td[9 and boolean(//a/span[#title='specifictitle'])]
This however does not work as it returns all items in the table.
I have a few specific limitations:
- //div[#id="someid"]/table/tbody/tr is static and cannot be changed.
- The td contains no other info concerning what column it is in.
Thanks in advance!
2 approaches:
First - as a direct condition within square brackets:
//div[#id="someid"]/table/tbody/tr[//a/span[#title='specifictitle']]/td[9]/text()
this approach is simpler and the position does not matter
this is also the approach that fulfills the OPs requirement, that the query should start with //div[#id="someid"]/table/tbody/tr
* You can basically put the condition [//a/span[#title='specifictitle']] to whatever element in the query you want (could also be behind tbody or table etc.)
Second - using axes (for example ancestor)
2 cases regarding the position of your element within HTML code:
1) anchor-element "before" your div with "someid":
//a/span[#title='specifictitle']//div[#id="someid"]/table/tbody/tr/td[9]/text()
2) anchor-element "after" your div with "someid":
//a/span[#title='specifictitle']/ancestor::div[#id="someid"]/table/tbody/tr/td[9]/text()
In both cases the xpath-query will not return a result if the //a/span[#title='specifictitle'] does not exist, which is what you needed, if I understood correctly

xpath - get count of rows based on some text

I'm writing xpaths to select all the links under each category on left sidebar from following page:
http://www.indexmundi.com/commodities/'>http://www.indexmundi.com/commodities/
I want to select the link under each category one by one. I've written the following xpath and it is selecting the link under first category(Commodity Price Indices) somehow. But I was wondering how I will select the links under other categories. I want to add a check on h3 tha if it's text is Energy, count and select all the rows before that, then if h3 text is Beverages, count and select all rows between Energy and Beverages
.//*[#id='dlCommodities']/tbody/tr[position()< count(following-sibling::tr/td/h3)-1]/td/a
Here is another xpath:
.//*[#id='dlCommodities']/tbody/tr[preceding-sibling::tr/td/h3[. = 'Energy'] and following-sibling::tr/td/h3[. = 'Beverages']]/td/a
It is fulfilling the second requirement i.e. select rows between specific headings but it is missing one node.
Please help me fix these xpaths or suggest a better one.
Thanks
I understand your actual problem as: Find all links that belong to a given category. For doing so, find the category, and then retrieve all elements before the next category.
You might remove the newlines if you prefer, I added them for readability.
//tr[td/h3="Energy"]/(self::tr, following-sibling::tr[
. << //tr[td/h3="Energy"]/following-sibling::tr[td/h3][1]
])
If you do not have an XPath 2.0 compatible processor, you cannot use the << operator which test for node order (the current node must precede the next category). An XPath 1.0 solution is even slightly shorter, but in my opinion worse in readability:
//tr[td/h3="Energy"] | //tr[td/h3="Energy"]/following-sibling::tr[
./preceding-sibling::tr[td/h3][1][td/h3="Energy"] and not(td/h3)
]
Both queries will select all nodes of a category; to count them wrap them into count(...).

XPath concatenate table row cells

I am trying to extract the concatenated cells from a HTML table for each row using XPath. For example, if I have a table like
<table>
<tr><th>FirstName</th><th>LastName</th><th>Title</th></tr>
<tr><td>First1</td><td>Last1</td><td>Title1</td></tr>
<tr><td>First2</td><td>Last2</td><td>Title2</td></tr>
<tr><td>First3</td><td>Last3</td><td>Title3</td></tr>
</table>
I want to extract this data so that I get the full name of the person in each row
First1 Last1
First2 Last2
First3 Last3
I can get each column separately and then merge them in my code later, but prefer to get this done in a single XPath query. I have tried to use concat, but can't figure out where to use the concat.
Thanks in advance.
The concatenation you tried only concats the xpath, not the nodes. If you want to select more than one nodes, you should use | between them.
//tr//td[1] | //tr//td[2]

better selenium xpath is expecting

I'm trying to create xpath expression which will work with selenium using following html snippet.
Below is table contains various row that gets incremented with uniquely generatedid(for example in following snippet that id is 1000).
Selenium has created following expressions when row of id 1000 was added in table. However instead of using id, I want to create xpath by using 3rd data element in row which is (MyName) in html snippet.
A possible suggestion is to not use xpath whenever possible.
http://saucelabs.com/blog/index.php/2011/05/why-css-locators-are-the-way-to-go-vs-xpath/
You need to convert the places in the XPATH where it is referring to the row by its ID to its relative position in the table.
In all of your XPATHs, you would change tr[#id='1000'] to tr[3]
Your first example XPATH would look liek this:
//tr[3]/td[1]/a[1]/img //tr[#id='1000']/td[1]/span/a/img
Your second example would follow similarly:
//tr[3]/td[1]/span/a/img
As would your third:
//tr[3]/td[1]/a[2]/img
Hopefully you are now able change the rest of them.

Resources