XPath matching text in a table - Ruby - Nokigiri - xpath

I have a table that looks like this
<table cellpadding="1" cellspacing="0" width="100%" border="0">
<tr>
<td colspan="9" class="csoGreen"><b class="white">Bill Statement Detail</b></td>
</tr>
<tr style="background-color: #D8E4F6;vertical-align: top;">
<td nowrap="nowrap"><b>Bill Date</b></td>
<td nowrap="nowrap"><b>Bill Amount</b></td>
<td nowrap="nowrap"><b>Bill Due Date</b></td>
<td nowrap="nowrap"><b>Bill (PDF)</b></td>
</tr>
</table>
I am trying to create the XPATH to find this table where it contains the test Bill Statement Detail. I want the entire table and not just the td.
Here is what I have tried so far:
page.parser.xpath('//table[contains(text(),"Bill")]')
page.parser.xpath('//table/tbody/tr[contains(text(),"Bill Statement Detail")]')
Any Help is appreciated
Thanks!

Your first XPath example is the closest in that you're selecting table. The second example, if it ever matched, would select tr—this one will not work mainly because, according to your example, the text you want is in a b node, not a tr node.
This solution is as vague as I could make it, because of *. If the target text will always be under b, change it to descendant::b:
//table[contains(descendant::*, 'Bill Statement Detail')]
This is as specific, given the example, as I can make:
//table[tr[1]/td/b['Bill Statement Detail']]

You might want
//table[contains(descendant::text(),"Bill Statement Detail")]

The suggested codes don't work well if the match word is not in the first row. See the related post Find a table containing specific text

Related

XPath to select specific text inside of text block

I am trying to figure out a way to pull specific values out of a big long text block.
So far I have //td[#class="PadLeft10"] which returns me a big long value starting with the company name and ending with the "View More Info" piece.
I am trying to break my results up into segments, so for example I want my code to look for the words "Primary Contact:" and then return the text that follows that, ending at the <br/>.
I need to get the Company Name, which is always the first bit of text, then the Primary Contact, then the Address, then the Phone and Fax, then the Website, and the Organization type.
The problem is that not every record has all the values. As you can see, the second entry has the address and website, but the first one doesn't.
I am using the Dataminer Chrome Plugin, for anyone familiar with that. It has separate xpath for rows and columns, so I am going to try to make a bunch of different columns that correspond to each of the fields that I am looking for.
Any direction would be greatly appreciated.
<td align="left" valign="top" width="2%">
<script>
if (0 == 1) document.write('<img src="https://website.com" border="0" alt=""/>');
</script>
<br/><br/></td>
<td class="PadLeft10" align="left" valign="top" width="32%" style="padding-left: 15px;">
<span style="font-weight: bold;font-size: 12pt;"><br/>Company Name Here</span><br/>Primary Contact: Mr. Eric Cartman <br/>Phone: (555) 555-5555<br/>Fax: (333) 333-3333<span style="text-decoration: underline;color: #0000ff"></span><br/>Organization Type: Distributor Branch
<br/>
» View More Info<br/>
<br/>
</td>
<td align="left" valign="top" width="2%">
<script>
if (0 == 1) document.write('<img src="https://website.com" border="0" alt=""/>');
</script>
<br/><br/></td>
<td class="PadLeft10" align="left" valign="top" width="32%" style="padding-left: 15px;">
<span style="font-weight: bold;font-size: 12pt;"><br/>Other Company</span><br/>Primary Contact: Mr. Jimmy Valmer<br/>100 N Ohio St 2rd Fl<br/>Rochester, IN 54225<br/>United States<br/>Phone: (888) 888-8888<br/>Fax: (999) 999-9999<span style="text-decoration: underline;color: #0000ff"><br/>Web Site: http://www.companywebsite.com</span><br/>Organization Type: Financial Service
<br/>
» View More Info<br/>
<br/>
</td>
</tr>
<tr>
I am new to xpath, but the least i can say: if you are the creator of the html code, you absolutely need to change it to be more structured
like : Primary Contact:<span id/class='primaryContact'>..</span>
Or else, you can get the elements by this selector (to edit) //td[#class="PadLeft10"]//child::span//following-sibling::text()[1] split by ':' and then proceed, but this solution stay just a diy.
Any direction would be greatly appreciated.
As far as a direction, the sections within table cell that you mention are neither nested DOM items, nor sibling-type DOM nodes. Those are sequential html elements that require special processing.
<br/>Company Name Here</span>
<br/>Primary Contact: Mr. Eric Cartman
<br/>Phone: (555) 555-5555
<br/>...
Both xpath and regex can be leveraged for such a case.
You can select the text node you're looking for using a predicate and the contains function:
//td[#class="PadLeft10"]/text()[contains(., "Primary Contact:")]
Then you can get the substring using the substring-after function:
substring-after(
//td[#class="PadLeft10"]/text()[contains(., "Primary Contact:")],
'Primary Contact:'
)
And remove leading and trailing whitespace using normalize-space:
normalize-space(
substring-after(
//td[#class="PadLeft10"]/text()[contains(., "Primary Contact:")],
'Primary Contact:'
)
)

How to specify Xpath that return whole table without last row?

Here is the code of a table. I need to extract whole table without last row.
Whole table:
<table class="product-content__table">
<tr><th class="product-content__th">Состав</th><td>нержавеющая сталь, натуральная кожа </td></tr>
<tr><th class="product-content__th">Ширина</th><td>2 см</td></tr><tr><th class="product-content__th">Цвет</th><td>серый </td></tr>
<tr><th class="product-content__th">Страна производства</th><td>Россия </td></tr><tr><th class="product-content__th">Сезон</th><td>Мульти </td></tr>
<tr><th class="product-content__th">Коллекция</th><td>Весна-лето </td></tr>
<tr><th class="product-content__th">Артикул</th><td itemprop="sku">RO003DMCMA98</td></tr>
</table>
I need to extract whole table without this row:
<tr><th class="product-content__th">Артикул</th><td itemprop="sku">RO003DMCMA98</td></tr>
I need all tags including table tag.
XPath can only select nodes that are present in your input. If there is a table element in your input with five rows, and you want a table element with four rows, then there is no such table element in your input so you cannot select it with XPath. If you want to get a node that differs from any node in your input, you need XSLT or XQuery.
<td> is sibling of <th> not child so you don't actually need th in your xpath. And you want to filter out the last tr within the same table instead of filtering out the last td within the same tr :
//table[#class="product-content__table"]//tr[position() < last()]/td
remove trailing /td if you want to get list of <tr> instead of <td>.
This works:
//table//tr[position()<last()]

Xpath to match following sibling in another node

This is my html code:
<tr>
<th class="left_cont"><strong>Hello world</strong></th>
<td class="right_cont padding_left16px"><strong>Hi There</strong></td>
</tr>
Now to select the text Hellow world i used.
//strong[contains(text(),'Hello world')]
Works fine for me.
Now I need to select the text Hi there relatively to the hello world text.
I need to do something like this but I can't figure out.
//strong[contains(text(),'Hello world')]/following-sibling::strong
Doesn't work out for me.
Elements with sibling relations are parent of <strong> instead of <strong> it self, so you can try this way :
//*[strong[contains(.,'Hello world')]]/following-sibling::*[strong]/strong
Or if you are sure parents involved are always <th> and <td> :
//th[strong[contains(.,'Hello world')]]/following-sibling::td[strong]/strong
2nd "strong" element is not actually sibling of the first one. But wrapping "td" elements are siblings. So you could probably use
//strong[contains(text(),'Hello world')]/../following-sibling::td/strong

XPATH - Ruby - Nokogiri - Nodeset

I have a NodeSet of a table that looks similar to this:
<table cellpadding="1" cellspacing="0" width="100%" border="0">
<tr>
<td colspan="9" class="csoGreen"><b class="white">Bill Statement Detail</b></td>
</tr>
<tr>
<td><b>Bill Date</b></td>
<td"><b>Bill Amount</b></td>
<td"><b>Bill Due Date</b></td>
<td"><b>Bill (PDF)</b></td>
</tr>
<tr vAlign="top">
<td>blahA</td>
<td>blahB</td>
<td>blahC</td>
<td>View Bill</td>
</tr>
Now I plan on looping through each onclick in the table.
I've been attempting to loop through the NodeSet unsuccessfully.
I ended up with many failed attempts, but I imagine it would end up looking something like this:
doc_list.each_element ("//a[td/text()='onclick']/#href") do | |
#here I want to scan and save BlahA into a Variable
end
You want to iterate through everything with an onclick? Maybe:
doc.css('*[onclick]').each do |el|
puts el[:onclick]
end
Edit: what you probably really want is the first td of every row starting with the row 3. in that case:
table.css('td[1]')[2..-1].each do |td|
puts td.text
end
The key to doing this efficiently is not in your question, but in your comment "I want to extract the first td in the tr where there is an onclick".
This expression does exactly that:
doc.xpath('//tr[td/a/#onclick]/td[1]/text()')
In fact this will give you the set of all such matches. No iteration needed.

xpath expression to find url and data

i want to get the values of every table and the href value for every within the table given below.
Being new to xpath, i am finding it difficult to write xpath expression.
However understanding what an xpath expression does lies somewhat in an easier category.
the expected output
http://a.com/ data for a 526735 Z
http://b.com/ data for b 522273 Z
http://c.com/ data for c 513335 Z
<table class = dataTabe>
<tbody>
<tr>
<td>data for a</td>
<td class="numericalColumn">526735</td>
<td class="numericalColumn">Z</td></tr>
<tr>
<td>data for b</td>
<td class="numericalColumn">522273</td>
<td class="numericalColumn">B</td></tr>
<tr>
<td>data for c</td>
<td class="numericalColumn">513335</td>
<td class="numericalColumn">B</td></tr>
</tbody>
</table>
You'll need two things: an XPath query which locates the wanted nodes and a second which outputs the text as you want it. Since you don't give more information about the languages you're using I'm putting together some pseudocode:
foreach node in document.select("//table[class='dataTable']//tr[td/a/#HREF]")
write node.select("concat(td/a/#HREF,' ',.)")
This site has a great free tool for building XPath Expressions (XPath Builder):
http://www.bubasoft.net/
Use this XPath: //tr/td/a/#HREF | //tr//text()

Resources