xpath expression to find url and data - xpath

i want to get the values of every table and the href value for every within the table given below.
Being new to xpath, i am finding it difficult to write xpath expression.
However understanding what an xpath expression does lies somewhat in an easier category.
the expected output
http://a.com/ data for a 526735 Z
http://b.com/ data for b 522273 Z
http://c.com/ data for c 513335 Z
<table class = dataTabe>
<tbody>
<tr>
<td>data for a</td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td>data for b</td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
<tr>
<td>data for c</td>
<td class="numericalColumn">513335</td>
<td class="numericalColumn">B</td></tr>
</tbody>
</table>

You'll need two things: an XPath query which locates the wanted nodes and a second which outputs the text as you want it. Since you don't give more information about the languages you're using I'm putting together some pseudocode:
foreach node in document.select("//table[class='dataTable']//tr[td/a/#HREF]")
write node.select("concat(td/a/#HREF,' ',.)")

This site has a great free tool for building XPath Expressions (XPath Builder):
http://www.bubasoft.net/

Use this XPath: //tr/td/a/#HREF | //tr//text()

Related

correct way to scrape this table (using scrapy / xpath)

Given a table (unknown number of <tr> but always three <td>, and sometimes containing a strikethrough (<s>) of the first element which should be captured as additional item (with value 0 or 1))
<table id="my_id">
<tr>
<td>A1</td>
<td>A2</td>
<td>A3</td>
</tr>
<tr>
<td><s>B1</s></td>
<td>B2</td>
<td>B3</td>
</tr>
...
</table>
Where scraping should yield [[A1,A2,A3,0],[B1,B2,B3,1], ...], I currently try along those lines:
my_xpath = response.xpath("//table[#id='my_id']")
for my_cell in my_xpath.xpath(".//tr"):
print('record 0:', my_cell.xpath(".//td")[0])
print('record 1:', my_cell.xpath(".//td")[1])
print('record 2:', my_cell.xpath(".//td")[2])
And in principle it works (e.g. by adding a pipeline after add_xpath()), just I am sure there is a more natural and elegant way to do this.
Try contains :
my_xpath = response.xpath("//table[contains(#id, 'my_id')]").getall()

XPath: returning the index of specific tag inside a set of tags with the same type

Here is an excerpt of my xml:
<table>
...
<tbody>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
<tr>
<td></td>
<td></td>
<td></td>
</tr>
</tbody>
</table>
I know how to find specific <tr> tag.
Is it possible to define <tr> tag index or ordinal number inside the <tbody> tag? I guess, that it's possible to loop through the table, but the table is quite large and it will take lots of time.
Is it possible to get this index/ordinal number with single XPATH statement?
I've used following XPath expression:
//tbody//td[text()='findMe']/../following-sibling::tr
These expression calculates, how many 'tr' nodes are located under the node with 'findMe' text. Actually, it useful, because quantity of 'tr' nodes could be obtained.
But, prior to given XPath, a verification should be made, because in case 'finMe' string would be absent, XPath would return 0. The following expression works as validation fine:
//tbody//td[text()='findMe']

Filtering Elements in XPath based on their descendants' text

Suppose I have a table with the following rows,
...
<tr>
<th title="Library of Quintessential Memes">LQM:</th>
</tr>
<tr>
<th title="Library of Boring Books">LBB:</th>
</tr>
...
I would like to select all <tr> elements whose first <th> child's text starts with "L". How can I do this using XPath selectors?
Use the starts-with function:
//tr[starts-with(th[1],"L")]

XPath matching text in a table - Ruby - Nokigiri

I have a table that looks like this
<table cellpadding="1" cellspacing="0" width="100%" border="0">
<tr>
<td colspan="9" class="csoGreen"><b class="white">Bill Statement Detail</b></td>
</tr>
<tr style="background-color: #D8E4F6;vertical-align: top;">
<td nowrap="nowrap"><b>Bill Date</b></td>
<td nowrap="nowrap"><b>Bill Amount</b></td>
<td nowrap="nowrap"><b>Bill Due Date</b></td>
<td nowrap="nowrap"><b>Bill (PDF)</b></td>
</tr>
</table>
I am trying to create the XPATH to find this table where it contains the test Bill Statement Detail. I want the entire table and not just the td.
Here is what I have tried so far:
page.parser.xpath('//table[contains(text(),"Bill")]')
page.parser.xpath('//table/tbody/tr[contains(text(),"Bill Statement Detail")]')
Any Help is appreciated
Thanks!
Your first XPath example is the closest in that you're selecting table. The second example, if it ever matched, would select tr—this one will not work mainly because, according to your example, the text you want is in a b node, not a tr node.
This solution is as vague as I could make it, because of *. If the target text will always be under b, change it to descendant::b:
//table[contains(descendant::*, 'Bill Statement Detail')]
This is as specific, given the example, as I can make:
//table[tr[1]/td/b['Bill Statement Detail']]
You might want
//table[contains(descendant::text(),"Bill Statement Detail")]
The suggested codes don't work well if the match word is not in the first row. See the related post Find a table containing specific text

Two levels of contains in an XPath

I have this XPath:
//tr[contains(td, 'Europe')]
which was working when I had this:
<tr>
<td></td>
<td>Europe</td>
<td></td>
</tr>
but now I have this:
<tr>
<td></td>
<td><a>Europe</a></td>
<td></td>
</tr>
How can I get with an XPath now (based on the fact that Europe is in there).
I tried:
//tr[contains(a, "Europe")]
and
//tr[contains(text(), "Europe")]
and many other silly things without any success.
//tr[contains(td, 'Europe')]
This should work with both schema because fn:contains() cast both arguments to strings.
I do see a problem with a different schema where there can be more than one td element. For that case you should use:
//tr[td[contains(.,'Europe')]]

Resources