weird encode character in email - ruby

I have a mustache template parsed via ruby and then render it by marking it html_safe against email body but resultant HTML has some weird encode character embedded in it, for example
<body style=3D"min-width:640px;margin: 0 0 0 0;" bgcolor=3D"#f6f6f6" link==3D"#000000" vlink=3D"#000000" alink=3D"#000000" text=3D"#000000">
<br />
<table width=3D"100%" border=3D"0" align=3D"center"
cellpadding=3D"0" c=
ellspacing=3D"0" bgcolor=3D"#f6f6f6">
<tr>
<td bgcolor=3D"#f6f6f6" style=3D"border-bottom: 0;">
<table width=3D"640" style=3D"min-width:640px;"
cellspacing=3D"0"=
cellpadding=3D"0" border=3D"0" align=3D"center">
<tbody>
<tr>
<td bgcolor=3D"#000000">
<table width=3D"640" bgcolor=3D"#000000" cellspacing=3D"0=
" cellpadding=3D"0" border=3D"0" align=3D"center">
<tbody>
<tr>
<td width=3D"600" height=3D"10" bgcolor=3D"#000000"=
style=3D"line-height:0px;font-size:0px;">
<div width=3D"1" height=3D"10" alt=3D"" style=3D"=
display:block; border:0;"></div>
Why these character remains even after marking string as html safe? Am I missing something.
Mustache template is regular HTML template with mustache syntax in it that are to be replaced dynamically

That's quoted-printable style where it's similar to how things are escaped in a URL. You're probably used to %20 but here =20 is the same thing.
Since = is part of the escaping, like in HTML & becomes & and in a URL % becomes %25, = must be encoded as =3D.
HTML just so happens to use a lot of = characters so you'll see the =3D sigil all over.

Related

XPATH start scraping after certain word

I am trying to get the location from this html using XPATH. So what I want to say is [in human terms] "when you see Location: grab the next piece of text then stop.
<td width="670">
<h1>Accor Vacation Club - SOLD</h1>
<h2>All Australia, Australia</h2>
<p class="property_number">Property ref: 002</p>
<h3 class="cl2">Description</h3><p class="xh-highlight">Resort: Accor Vacation Club. <br>Location: Australia. <br>Type of Ownership: Points. <br>Season: All. <br>Size of Unit: Studio. <br>Price: SOLD</p><p class="xh-highlight"> </p><p class="xh-highlight"><span style="font-size: 16pt">SOLD</span> </p>
<table width="100%" border="0" cellspacing="0" cellpadding="0" id="photorealestate">
<tbody><tr>
I got this far but can't seem to isolate that word:
//p[./preceding-sibling::h3[contains(., 'Description')]]
//p/text()[./preceding-sibling::h3[contains(., 'Description')]]
If you need to get "Australia" as output you can use below expression
substring-after(//text()[starts-with(., 'Location')], 'Location: ')
This will select text node that starts with word "Location" and return sub-string preceded by "Location: "

Extract href from specific tr element using XPath

Given the following HTML code :
<tr>
<th scope="row" class="navbox-group">Family</th>
<td class="navbox-list navbox-even hlist" style="text-align:left;border-left-width:2px;border-left-style:solid;width:100%;padding:0px">
<div style="padding:0em 0.25em">
<ul>
<li>Andrew Parker Bowles <small>(first husband)</small></li>
<li>Tom Parker Bowles <small>(son)</small></li>
<li>Laura Lopes <small>(daughter)</small></li>
<li>Charles, Prince of Wales <small>(second husband)</small></li>
<li>Bruce Shand <small>(father)</small></li>
<li>Rosalind Shand <small>(mother)</small></li>
<li>Annabel Elliot <small>(sister)</small></li>
<li>Mark Shand <small>(brother)</small></li>
</ul>
</div>
</td>
</tr>
I want to get all the href within the tr element , but only from tr elements that contains :
<th scope="row" class="navbox-group">Family</th>
(Where th='Family')
I try to write the following XPath :
"//tr[#th='Family']//a/#href"
But I don't get any href.
Thanks a lot.
Shany
Try below XPath:
//tr[th="Family"]//#href
It should allow you to get list of links from tr that contains th with text "Family"

How can I search a table faster?

I am trying to search a table for specific a specific value using Ruby and Selenium-webdriver. I have a method that works but takes a lot of time for some reason. It is a one row table and the page HTML looks like this:
<div id="permitGridContainer">
<table id="calendar" class="items" style="width:430px;" name="calendar">
<thead>
<tbody>
<tr>
<td id="avail1" class="status r slct" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 0 of 4");">
<div class="permitStatus">R</div>
</td>
<td id="avail2" class="status r" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 0 of 4");">
<div class="permitStatus">R</div>
</td>
<td id="avail3" class="status a" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 89 of 99");">
<a onclick="javascript:setNewArrivalDate("Sun Sep 06 2015", 2);return false;" href="#">
A
<br>
<small>89</small>
</a>
</td>
<td id="avail4" class="status a" onmouseout="return nd();" onmouseover="return overlib("Available Quota<br>River Launches : 97 of 99");">
</tr>
</tbody>
</table>
</div>
... I shortened the table it has 14 columns.
I am looking for a column that has an Item available and I am checking the class for this, but the text also changes so there are other things I could look for.
This is the code I am using, but it visibly slow. I used puts statements to see the progress. My sense is that is has to do with time accessing the element. So I was hoping there is a better way to process the table quickly. Thank you.
for j in 1..days_to_check[i]
check_avail = driver.find_element(id: "avail#{j}")
check_availclass = check_avail.attribute ("class")
if check_availclass == "status a" or check_availclass == "status a slct"
#process if
end
Depending on your comment I would suggest to use the following xpath. I find this is often easier and feasible to use better xpath than looping though the html table
//td[(#class='status a') or (#class='status A')]
This xpath finds the class with status a or status A

Advice for replacing img tags with text in Ruby?

I'm trying to work out how to store an html table of drive stats in a database, but the developers have been a bit clever, and started using gifs to represent pass/fail/health stats
Here's a snippet of what I've got:
<tr class="status">
<td class="status"><img border="0" src="/tick_green.gif"></td>
<td class="status">8</td>
<td class="status">Ready</td>
<td class="status"><img border="0" src="/bar10.gif"></td>
<td class="status">SEAGATE ST3146807FC</td>
<td class="status">10000 RPM</td>
<td class="status">3HY61AG9</td>
<td class="status">XR12</td>
<td class="status">286749488</td>
<td class="status"> 28.0°C</td>
<td class="status" style="background-color: #00fa00"> 
</td>
**
And here's some of the ruby that I've written so far to strip the tags:
table = page.parser.xpath('//table/caption[contains(.,"Drive")]/..')
table.xpath('//table//tr').each do |row|
row.xpath('td').each do |cell|
puts cell.to_html.gsub(/<a[^>]+>/,'').gsub(/<td[^>]+>/,'').gsub(/<\/td[^>]*>/,'').gsub(/<\/a[^>]*>/,'')
#puts cell.text
end
end
I can now get semi-rational output
<img border="0" src="/tick_green.gif">
15
Ready
<img border="0" src="/bar10.gif">
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
 
But I want to replace a couple of other cell elements with other bits
For example, the tick_green can also be '/cross_red.gif' or '/caution.gif' which I want to replace with regular text, likewise, the img bar10.gif, I want to replace with just text of '10'
Is it best to come up with a whole bunch of values for all of my special cases?
I'd do some 'gsub'iing.
E.g.:
example = <<-STRING
<img border="0" src="/tick_green.gif">
15
Ready
<img border="0" src="/bar10.gif">
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
 
STRING
replace = Hash.new("#unknown")
replace['tick_green.gif'] = "[OK]"
replace['bar10.gif'] = "[10]"
regex = /<img [^>]* src="\/(.*)">/
result = example.gsub(regex) { replace[$1] }
Somehow the I'd like to replace the $1 with a named backreference, but don't know how yet.
http://ruby-doc.org/core-1.9.3/String.html#method-i-gsub
edit: result from above
[OK]
15
Ready
[10]
SEAGATE ST3146807FC
10000 RPM
3HY61ASW
XR12
286749488
29.0°C
 
A case statement will clean that up a little but:
row.css('td').each do |td|
img = td.at('img')
puts case
when img && img[:src][/bar(\d+)\.gif/] then $1
when img && img[:src][/tick_green/] then 'ok'
else td.text.strip
end
end

How do I parse a plain HTML table with Nokogiri?

I'd like to parse a HTML page with the Nokogiri. There is a table in part of the page which does not use any specific ID. Is it possible to extract something like:
Today,3,455,34
Today,1,1300,3664
Today,10,100000,3444,
Yesterday,3454,5656,3
Yesterday,3545,1000,10
Yesterday,3411,36223,15
From this HTML:
<div id="__DailyStat__">
<table>
<tr class="blh"><th colspan="3">Today</th><th class="r" colspan="3">Yesterday</th></tr>
<tr class="blh"><th>Qnty</th><th>Size</th><th>Length</th><th class="r">Length</th><th class="r">Size</th><th class="r">Qnty</th></tr>
<tr class="blr">
<td>3</td>
<td>455</td>
<td>34</td>
<td class="r">3454</td>
<td class="r">5656</td>
<td class="r">3</td>
</tr>
<tr class="bla">
<td>1</td>
<td>1300</td>
<td>3664</td>
<td class="r">3545</td>
<td class="r">1000</td>
<td class="r">10</td>
</tr>
<tr class="blr">
<td>10</td>
<td>100000</td>
<td>3444</td>
<td class="r">3411</td>
<td class="r">36223</td>
<td class="r">15</td>
</tr>
</table>
</div>
As a quick and dirty first pass I'd do:
html = <<EOT
<div id="__DailyStat__">
<table>
<tr class="blh"><th colspan="3">Today</th><th class="r" colspan="3">Yesterday</th></tr>
<tr class="blh"><th>Qnty</th><th>Size</th><th>Length</th><th class="r">Length</th><th class="r">Size</th><th class="r">Qnty</th></tr>
<tr class="blr">
<td>3</td>
<td>455</td>
<td>34</td>
<td class="r">3454</td>
<td class="r">5656</td>
<td class="r">3</td>
</tr>
<tr class="bla">
<td>1</td>
<td>1300</td>
<td>3664</td>
<td class="r">3545</td>
<td class="r">1000</td>
<td class="r">10</td>
</tr>
<tr class="blr">
<td>10</td>
<td>100000</td>
<td>3444</td>
<td class="r">3411</td>
<td class="r">36223</td>
<td class="r">15</td>
</tr>
</table>
</div>
EOT
# Today Yesterday
# Qnty Size Length Length Size Qnty
# 3 455 34 3454 5656 3
# 1 1300 3664 3545 1000 10
# 10 100000 3444 3411 36223 15
require 'nokogiri'
doc = Nokogiri::HTML(html)
Use CSS to find the start of the table, and define some places to hold the data we're capturing:
table = doc.at('div#__DailyStat__ table')
today_data = []
yesterday_data = []
Loop over the rows in the table, rejecting the headers:
table.search('tr').each do |tr|
next if (tr['class'] == 'blh')
Initialize arrays to capture the pertinent data from each row, selectively push the data into the appropriate array:
today_td_data = [ 'Today' ]
yesterday_td_data = [ 'Yesterday' ]
tr.search('td').each do |td|
if (td['class'] == 'r')
yesterday_td_data << td.text.to_i
else
today_td_data << td.text.to_i
end
end
today_data << today_td_data
yesterday_data << yesterday_td_data
end
And output the data:
puts today_data.map{ |a| a.join(',') }
puts yesterday_data.map{ |a| a.join(',') }
> Today,3,455,34
> Today,1,1300,3664
> Today,10,100000,3444
> Yesterday,3454,5656,3
> Yesterday,3545,1000,10
> Yesterday,3411,36223,15
Just to help you visualize what's going, at the exit from the "tr" loop, the today_data and yesterday_data arrays are arrays-of-arrays looking like:
[["Today", 3, 455, 34], ["Today", 1, 1300, 3664], ["Today", 10, 100000, 3444]]
Alternatively, instead of looping over the "td" tags and sensing the class for the tag, I could have grabbed the contents of the "tr" and then used scan to grab the numbers and sliced the resulting array into "today" and "yesterday" arrays:
tr_data = tr.text.scan(/\d+/).map{ |i| i.to_i }
today_td_data = [ 'Today', *tr_data[0, 3] ]
yesterday_td_data = [ 'Yesterday', *tr_data[3, 3] ]
In real-world development, like at work, I'd use that instead of what I first wrote because it's succinct.
And notice that I didn't use XPath. It's very doable in Nokogiri to use XPath and accomplish this, but for simplicity I prefer CSS accessors. XPath would have allowed accessing individual "td" tag contents, but it also would begin to look like line-noise, which is something we want to avoid when writing code, because it impacts maintenance. I could also have used CSS to drill down to the correct "td" tags like 'tr td.r', but I don't think it would improve the code, it would just be an alternate way of doing it.

Resources