Parsing HTML document - ruby

I am trying to parse the following HTML using Ruby and Nokogiri:
<div class="vevent">
<table width="750"><tr>
<td width="25"> </td>
<td valign="top" width="200">
<font size="2" face="sans-serif">
<font color="black"><b>June 30, 2015</b></font>
<br>
<span class="dtstart"><span class="value-title" title="2015-06-30"></span></span><br><span class="summary"><font color="#92161" size="3"><b>Band Concert</b></font></span>
<br><font color="#333333">Event</font><br>
<br>
<br>
<br clear="left">Have a question? email us.<br>
<br></font>
</td>
<td valign="top" width="10"></td>
<td valign="top">
<br clear="left"><font color="#92161">111 Main Street</font><br>
<font color="#92161">Mainstreet, Ohio 55111</font>
<a rel="nofollow" href="http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=%221700+111+MainStreet+NE+Mainstreet,+Ohio+55111%22" target="_blank"><font size="1" face="sans-serif">map link</font></a><br><br>
<font color="#92161"><font size="2" face="sans-serif">Telephone:</font> 3305551000</font><br><br>
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
<br>
</td>
</tr></table>
</div>
I am trying to grab the last bit of text:
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
Here is my code thus far:
events = doc.css("div.vevent")
events.collect do |row|
row.css("td")[3]
end
This will get me to the third td which has the text that I am looking for as follows:
<td valign="top">
<br clear="left"><font color="#92161">111 Main Street</font><br>
<font color="#92161">Mainstreet, Ohio 55111</font>
<a rel="nofollow" href="http://maps.google.com/maps?f=q&source=s_q&hl=en&geocode=&q=%221700+111+MainStreet+NE+Mainstreet,+Ohio+55111%22" target="_blank"><font size="1" face="sans-serif">map link</font></a><br><br>
<font color="#92161"><font size="2" face="sans-serif">Telephone:</font> 3305551000</font><br><br>
Visit our website for complete information.<br><br>
Enjoy a summer evening concert on Main Street at 8pm. Doors and cash bar open at 7pm.<br><br>Look for more details and ticket sales to be released soon on our website<br> <br><br>
<br>
</td>
However once there if I call text on that td I get all the text inside of the td. I only want the last bit that is not inside any element. I tried using XPath and parent so that I could say "just give me the text that is inside the td (not nested inside of another element)" but I couldn't get that to work. Anyone have any ideas on this?

Try this code: doc.css('td')[3].css('> text()').to_s.strip

I suggest using xpath which is more flexible.
If I understand you correctly, you would like:
I only want the last bit that is not inside any element
So, try this XPath:
//table//td[last()]/text()

Related

Office 365 Exchange + Office 365 Outlook: Warning banner is not displaying colours

I'm working on a warning banner to be displayed for our users to warn them not to click the links or attachment in a suspicious email, so that they can be warned about any phishing or spoofing. This is by using the "prepend a disclaimer" rule in exchange online.
I've followed a tutorial on prepending this banner onto an email but Outlook doesn't seem to render the background colours on the table? It just displays the text content only.
Code is here:
<!-- Yellow caution banner -->
<table border=0 cellspacing=0 cellpadding=0 align="left" width="100%">
<tr>
<!-- Remove the next line if you don't want the Yellow bar on the left side -->
<td bgcolor="#ffb900" style="background-color:#ffb900;padding:5pt 2pt 5pt 2pt"></td>
<td width="100%" bgcolor="#fff8e5" cellpadding="7px 6px 7px 15px" style="background-color:#fff8e5; padding:5pt 4pt 5pt 12pt; word-wrap:break-word; font-family:sans-serif">
<div style="color:#222222;">
<span style="color:#222; font-weight:bold;">Caution:</span>
This is an external email and has a suspicious subject or content. Please do not click on any links or download any files unless you know the sender and you are expecting this message. If you are unsure, please contact the IT Helpdesk.
</div>
</td>
</tr>
</table>
<br />
I'm trying to get it to look like this:
(https://i.stack.imgur.com/yltQ7.png)
But I receive this instead.
(https://i.stack.imgur.com/3fiZx.png)
It doesn't seem to matter whether dark mode is enabled or not. As far as I know, HTML is enabled in outlook.
Thanks in advance for any help with this.
Didn't work to me as well. Try this:
<table border=0 cellspacing=0 cellpadding=0 align="left" width="100%">
<tr>
<!-- Remove the next line if you don't want the Yellow bar on the left
side -->
<td style="background:#ffb900;padding:5pt 2pt 5pt 2pt"></td>
<td width="100%" cellpadding="7px 6px 7px 15px" style="background:#fff8e5;padding:5pt 4pt 5pt 12pt;word-wrap:break-word">
<div style="color:#222222;">
<span style="color:#222; font-size:13px; font-weight:bold;">CAUTION:</span>
<span style="color:#222; font-size:13px;">This email is from an external source. Do not click links or open attachments unless you recognize the sender and know the content is safe. When in doubt, contact Nova IT department
</div>
</td>

regex using dynamic input in Jmeter(regex extractor)

I have a query regarding Jmeter regex extractor. I am trying to implement 1 scenario however not able to do same. Below are the details:
Requirement :
In Jmeter I have defined user defined variable : String VAR = KZ
now I am trying to use Regex extractor so that from the HTML response, regex will match VAR value in HTML(defined below) and will fetch span class name, as I need to set checkbox ON for KZ.
Requirement is to handle checkbox ON functionality through user defined variable, that means I don't want to hardcode class name instead based on user defined variable(which will be td value i.e. in this example KZ) I have to fetch class name using Regex Extractor. Could someone please help how to proceed?
Below is HTML Code:
<tr class="trClass">
<td style="width: 13.5%;">
<span class="checkbox"><input id="ctl00ctl94" type="checkbox" name="$ctl95$"
onclick="return validatecheck();" /></span>
</td>
<td style="width: 41.2%;"> KZ </td>
<td style="width: 0%; display: none;"> 5581357 </td>
<td style="width: 32%;"> 06/03/2018 2:22:38 PM </td>
</tr>
<tr class="trClass">
<td style="width: 13.5%;">
<span class="checkbox"><input id="ctl00ctl95" type="checkbox" name="$ctl95$"
onclick="return validatecheck();" /></span>
</td>
<td style="width: 41.2%;"> TM </td>
<td style="width: 0%; display: none;"> 5581358 </td>
<td style="width: 32%;"> 06/03/2018 2:22:38 PM </td>
</tr>
<tr class="trClass">
<td style="width: 13.5%;">
<span class="checkbox"><input id="ctl00ctl96" type="checkbox" name="$ctl96$"
onclick="return validatecheck();" /></span> </td>
<td style="width: 41.2%;">TR </td>
<td style="width: 0%; display: none;"> 5581359 </td>
<td style="width: 32%;"> 06/03/2018 2:22:38 PM </td>
</tr>
Using regular expressions for parsing HTML is not the best idea as:
they are hard to develop and/or maintain
they are very sensitive to markup change hence fragile, i.e. if order of attributes changes or something will go to a new line - it will simply ruin your regex
So I would recommend going for another post-processor which can work with DOM directly, for instance XPath Extractor
The relevant XPath query which will fetch the classname of span which is above the KZ text would be something like:
//td[contains(text(),'KZ')]/preceding::*/span/#class
Of course you can substitute KZ with the JMeter Variable reference, i.e.
//td[contains(text(),'${VAR}')]/preceding::*/span/#class
However you will not be able to test your queries using XPath Tester mode of the View Results Tree listener, you will have to go for Debug Sampler instead to visualize the resulting variable.
Check out XPath Tutorial and Using the XPath Extractor in JMeter guide to get familiarized with XPath language.
Also be aware that according to JMeter project main page:
JMeter is not a browser, it works at protocol level. As far as web-services and remote services are concerned, JMeter looks like a browser (or rather, multiple browsers); however JMeter does not perform all the actions supported by browsers. In particular, JMeter does not execute the Javascript found in HTML pages.
So I don't believe fetching the span classname will solve your problem, most probably you will need to send underlying input name as a parameter so you should be looking for
//td[contains(text(),'KZ')]/preceding::*/span/input/#name

Click on deeply nested div in Watir

I am trying to click on checkbox (which represents as ) in Firefox using Watir. I have code left to me from the previous tester, and this code works in Chrome, but not Firefox.
Here is what I have.
Here is the code that leads to this div:
<div class="x-grid3-body" style="width:515px;" id="ext-gen201">
<div class="x-grid3-row x-grid3-row-first" style="width:515px;">
<table class="x-grid3-row-table" border="0" cellspacing="0" cellpadding="0" style="width:515px;">
<tbody>
<tr>
<td class="x-grid3-col x-grid3-cell x-grid3-td-0 x-grid3-cell-first " style="width: 158px;" tabindex="0">
<div class="x-grid3-cell-inner x-grid3-col-0 x-unselectable" unselectable="on">Organisation</div>
</td>
<td class="x-grid3-col x-grid3-cell x-grid3-td-1 " style="width: 298px;" tabindex="0">
<div class="x-grid3-cell-inner x-grid3-col-1 x-unselectable" unselectable="on">Catch Software</div>
</td>
<td class="x-grid3-col x-grid3-cell x-grid3-td-scopeCheckColumn x-grid3-cell-last x-grid3-check-col-td" style="width: 53px;" tabindex="0">
<div class="x-grid3-cell-inner x-grid3-col-scopeCheckColumn x-unselectable" unselectable="on">
<div class="x-grid3-check-col x-grid3-cc-scopeCheckColumn"> </div>
</div>
</td>
</tr>
</tbody>
</table>
</div>
</div>
What I need essentially is to click on this div
<div class="x-grid3-check-col x-grid3-cc-scopeCheckColumn"> </div>
I can't attach screenshots (rating is low), but this div is the checkbox that I need to click.
This piece of code works in Chrome:
#browser.div(:class => 'x-box-inner', :index => 1).table(:class => 'x-grid3-row-table').td(:text => 'Organisation').parent.td(:index => 2)
But in Firefox I can see that Watir is just click on the whole parent div (can see selection appear in browser), not on the checkbox div.
Thank you.
First off, there is only one element in the html you've showed that has the class 'x-grid3-cc-scopeCheckColumn', so is there a reason you can't do:
#browser.td(class: 'x-grid3-cc-scopeCheckColumn').click
If not, you can simplify your element location drastically to get the td you want, since there is only one td that has the text 'Organisation', and .parent would only get the td above it, but it looks like you want the tr tag above that, so perhaps you want:
#browser.td(text: 'Organisation').parent.parent.td(index: 2)
Using an XPath worked for me in Firefox:
#browser.div(:xpath, "//*[#id='ext-gen201']/div/table/tbody/tr/td[3]/div/div").click
I was able to verify it with .flash in place of .click since your div just contains a space and there was nothing to otherwise see.

Getting href by looking for another tag

I'm trying to extract href links from a web page that has multiple rows like below but I only want the a href of the ones that have the <b> block
<tr bgcolor="#ffffff">
<td>
<a href="?6384593.html" style="background-color: transparent;">
<span class="ts">
<font size="1">
<font color="#006633">
</font>
</font>
<b>Lee Swanson Research Update</b>
<font color="#7777CC"> - Swanson Health Products</font></span>
</a>
</td>
</tr>
In this case I use the xpath expression "//b" to find the bold tag, but what I specifically want is the a href link. Is this possible with xpath?
//b/ancestor::a[1]/#href
Try something like the above
The below XPATH will work :
//a[.//*[local-name(.)='b']]/#href
UPDATE AS #Jens Erat suggested.
//a[.//b]/#href

Firefox displays 3 columns in a table, IE8 only 2

Would love some help here... Firefox displays the last column in the table (an image they click on to edit their email address, it's a link), and IE8 displays nothing for the last column (doesn't even appear to display a column!) I've left out other rows in the table, but similar stuff happens.
Anyone know why?
<table class="profile-display">
<tr>
<td style="text-align: right; color: red;"> Email address: </td>
<td class="profile-content"> <?php echo("$evar"); ?> </td>
<td> <a href="profile_change.php?edit=13"
<img src="../images/writegreen.png" class="profile-edit" alt="Edit"
title="Edit Email Address"
border="0" />
</a>
</td>
</tr>
</table>
Your <a> tag is missing its >. That will cause a browser to not recognize the end of the tag until the first > it sees, which is the end of the img tag. Frankly, I'm surprised that Firefox shows the img.
Edit: Other common causes of this problem are missing quotes and misspelled tags.

Resources