what xpath selector will retrieve both these prices and nothing else? - xpath

I have the following two td elements
<TD ALIGN="LEFT" VALIGN="top" WIDTH="150" STYLE="font-size: 11px; font-family: arial" HEIGHT="65"> <B><i>Brand </i></B><BR>Title<BR>
$65.00
</TD>
<TD ALIGN="LEFT" VALIGN="top" WIDTH="35"> </TD><TD ALIGN="LEFT" VALIGN="top" WIDTH="150" STYLE="font-size: 11px; font-family: arial" HEIGHT="65"> <B><i>Brand </i></B><BR>Title<BR>
<span style="color: #999999; font-weight: normal;"><strike>$212.00</strike></span> <B>$127.20</B>
</TD>
I want to retrieve the final price from both ($65.00 and $127.20). I can use
//td/br[last()]/following-sibling::text()[1]|//td/br[last()]/following-sibling::b[1]
to return
[0] =>
$65.00
[1] => Â Â
[2] => $127.20
where [1] is the preceding the second price. Is there an xpath that will retrieve only
[0] =>
$65.00
[1] => $127.20
?

In both cases the text in question is the final non-empty text node descendant. That's how you'd describe it in English; here's how to say it in XPath:
//td/descendant::text()[normalize-space()][last()]

I haven't tried this, but it seems to me that in this particular case,
//td/br[last()]/following-sibling::*[last()]
may work, because both the text node containing $65.00 and the B node containing $127.20 are the last of their siblings.

Related

How to use XPath to select following-sibling

<tr>
<td width="120" align="right" class="tit">application:</td>
<td style="width:345px;word-break:break-all;">
<a style="text-decoration: underline; color: #0066ff; cursor: pointer"
href="javascript:_search('pa', '<font color=red>PartA</font>PartB');"> <font color=red>PartA</font>PartB</a>;
</td>
</tr>
<tr>
In above code, I use part of the name (Part A) to search result, how can I get the whole name which combines PartA and PartB. I use below code and just get PartA and nothing
html.xpath('//td[contains(text(),"application")]/following-sibling::td[1]/a/font/text()')[0]
html.xpath('//td[contains(text(),"application")]/following-sibling::td[1]/a/text()')[0]
You need to fix your second XPath accordingly :
//td[contains(text(),"application")]/following-sibling::td[1]/a/text()[normalize-space()]
Output : Part B
To select the two items directly you can use :
//td[contains(text(),"application")]/following-sibling::td[1]//text()[normalize-space()]
To combine them :
concat(//td[contains(text(),"application")]/following-sibling::td[1]/a/font/text(),//td[contains(text(),"application")]/following-sibling::td[1]/a/text()[normalize-space()])
or
string(//td[contains(text(),"application")]/following-sibling::td[1]/a)
Output : PartAPartB

HTML Email : Outlook puts down the text after space

I am making HTML Emailer.
The issue i am facing is that , when i see the output of my code in Outlook, then
Register Online text gets down in the outlook.
like Register in one line and Online in new line.
<table cellspacing="0" cellpadding="0" border="0" style=";border-collapse: collapse;mso-table-lspace: 0pt;mso-table-rspace: 0pt; background: transparent;">
<tbody><tr>
<td valign="middle" height="40" align="center" class="main-bg-color" style=" background: #ffee00;color: black;display: block;padding-left: 20px;padding-right: 20px;!important; width:100px; cursor: pointer;">
<div class="modtxt"><span class="wrap_textbox"><a style="color: black;text-align: center; display:block; text-decoration: none;-webkit-text-size-adjust: none;font-size: 10px;line-height: 40px;text-transform:uppercase;font-family: \'proxima_novasemibold\', Arial, sans-serif;" href="http://www.hubilo.com/widget/webpanel/login.php?event=c1d1b1dc8d40c37429a8fd1f627c5c5e"><span style="font-weight:100;">Register Online</span></a></span></div>
</td>
</tr>
</tbody></table>
How can I solve it?
Thank You.
I'm not entirely sure what you want to do, but if it's make sure that "register online" doesn't ever break onto two lines, then the easy solution for Outlook is to use a non-breaking space character ( ) rather than a space.
REGISTER ONLINE
This should solve that particular issue.

Getting attributed html element

I'm trying to get table with content of MMEL codes from this site and I'm trying to accomplish it with CSS Selectors.
What I've got so far is:
require_relative 'sources/Downloader'
require 'nokogiri'
html_content = Downloader.download_page('http://www.s-techent.com/ATA100.htm')
parsed_html = Nokogiri::HTML(html_content)
tmp = parsed_html.css("tr[*]")
puts tmp.text
And I'm getting error while trying to get this tr with attribute. How can I complete this task to get this table in simple form because I want to parse it to JSON. It would be nice go get this in sections and call it in.each block.
EDIT:
I'd be nic if I can get things in block like this (look into pages source)
<TR><TD WIDTH="10%" VALIGN="TOP" ROWSPAN=5>
<B><FONT FACE="Arial" SIZE=2><P ALIGN="CENTER">11</B></FONT></TD>
<TD WIDTH="40%" VALIGN="TOP" COLSPAN=2>
<B><FONT FACE="Arial" SIZE=2><P>PLACARDS AND MARKINGS</B></FONT></TD>
<TD WIDTH="50%" VALIGN="TOP">
<FONT FACE="Arial" SIZE=2><P ALIGN="LEFT">All procurable placards, labels, etc., shall be included in the illustrated Parts Catalog. They shall be illustrated, showing the part number, Legend and Location. The Maintenance Manual shall provide the approximate Location (i.e., FWD -UPPER -RH) and illustrate each placard, label, marking, self -illuminating sign, etc., required for safety information, maintenance significant information or by government regulations. Those required by government regulations shall be so identified.</FONT></TD>
</TR>
This should print all those TR's from source at line 96. There are three tables in that page and table[1] has all the text you needed:
require 'nokogiri'
doc = Nokogiri::HTML(open('http://www.s-techent.com/ATA100.htm'))
doc.css("table")[1].css("tr").each do |i|
puts i #=> prints the exact html between TR tags (including)
puts i.text #=> prints the text
end
For instance:
puts doc.css("table")[1].css("tr")[2]
prints the following:
<tr>
<td valign="TOP" colspan="3">
<b><font face="Arial" size="2"><p align="CENTER">GROUP DEFINITION - AIRCRAFT</p></font></b>
</td>
<td valign="TOP">
<font face="Arial" size="2"><p align="LEFT">The complete operational unit. Includes dimensions and
areas, lifting and shoring, leveling and weighing, towing and taxiing, parking and mooring, requi
red placards, servicing.</p></font>
</td>
</tr>
You could do the same using xpath also:
Below is the content from the first table of the webpage given in the post by OP:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.HTML(open('http://www.s-techent.com/ATA100.htm'))
doc.xpath('(//table)[1]/tr').each do |tr|
puts tr.to_html(:encoding => 'utf-8')
end
Output:
<tr>
<td width="33%" valign="MIDDLE" colspan="2">
<p><img src="S-Tech-Logo-Blue2.gif" width="274" height="127"></p>
</td>
<td width="67%" valign="MIDDLE">
<b><i><font face="Arial" color="#0000ff">
<p align="CENTER"><big>AIRCRAFT PARTS MANUFACTURING ASSISTANCE (PMA)</big><br><big>DAR SERVICES</big></p></font></i></b>
</td>
</tr>
Now, if you want to collect the last table rows, then do:
require 'nokogiri'
require 'open-uri'
doc = Nokogiri.HTML(open('http://www.s-techent.com/ATA100.htm'))
p doc.xpath('(//table)[3]/tr').to_a.size # => 1
doc.xpath('(//table)[3]/tr').each do |tr|
puts tr.to_html(:encoding => 'utf-8')
end
Output:
<tr>
<td width="40%" valign="TOP" height="10">
<p align="CENTER"><b><font face="Arial" size="2" color="#0000ff">149 AZALEA CIRCLE • LIMERICK, PA 19468-1330</font></b></p>
</td>
<td width="30%" valign="TOP" height="10">
<p align="CENTER"><b><font face="Arial" size="2" color="#0000ff">610-495-6898 (Office) • 484-680-0507 (Cell)</font></b></p>
</td>
<td width="110%" valign="TOP" height="10">
<p align="CENTER"><b><font face="Arial" size="2">E-mail S-Tech</font></b></p>
</td>
</tr>

XPath expression that matches the bgcolor attribute of a TR element

Working with HtmlAgilityPack against table rows that have been generated without name or id. Instead, i need to select based on the value contained in the row's bgcolor attribute:
I understand that XPath will return all rows where the name attribute = display:
foreach(HtmlNode cell in doc.DocumentElement.SelectNodes("//tr[#name='display']/td")
Given the code snippet below, what expression will select all elements when the row's bgcolor ="#FFFFFF">?
I've tried: SelectNodes(//tr[#bgcolor='#FFFFFF']/td")
> <tr bgcolor="#EAF2FA">
> <td colspan="2">
> <font style="font-family: sans-serif; font-size:12px;"><strong>Name</strong></font>
> </td> </tr> <tr bgcolor="#FFFFFF">
> <td width="20"> </td>
> <td>
> <font style="font-family: sans-serif; font-size:12px;">Steve</font>
> </td> </tr>
thx
bgcolor is weird, I find that using a contains will fix the problem.
This will work...
SelectNodes(//*//tr[contains(#bgcolor, 'FFFFFF')]/td")

getting value using xpath, ruby

I need to get value 9,70 from the following code, but am unable to do so. The number's comma is part of number and not delimiter, so the whole number is needed in one string. id="cheapest wine" is unique, but it keeps returning error.
<tr class="chartTableHeader">
<tr class="chartTableRow">
<td class="chartTableColFirst" style="height: 19px">
<td class="chartTableCol" style="height: 19px">
<td class="chartTableCol" style="height: 19px">
<span id="cheapest wine">9,70</span>
</td>
<td class="chartTableCol" style="height: 19px">
<td class="chartTableCol" style="height: 19px">
<td class="chartTableCol" style="height: 19px">
Using Nokogiri, and assuming that your html is formatted properly, you can get the value as follows:
require 'nokogiri'
xml = <<-EOF
<root>
<span id="cheapest wine">9,70</span>
</root>
EOF
doc = Nokogiri::XML(xml)
doc.xpath('//span[#id="cheapest wine"]').map do |add|
puts add.inner_text
end
Here the key is the XPath query: //span[#id="cheapest wine"] which searches for the span nodes whose id is "cheapest wine" (being an id, there should only be one).
Use the following XPath expression:
number(
translate(tr[#class='chartTableRow']/td/span[#id='cheapest wine'],
',',
'.'
)
)
where the current node from which the XPath expression is evaluated is the parent of the XML fragment shown in your question.
The XPath expression above evaluates to 9.7

Resources