I'm trying to interact with a site that has a large table that looks something like this:
<table id="tblID" cellspacing="1" cellpadding="0" border="0" width="100%">
<tbody><tr valign="top" align="left">
<td colspan="6" height="10"></td>
</tr>
<tr><td bgcolor="#666666" height="1" colspan="6"><img src="img.gif" width="1" height="1" border="0"></td></tr>
<tr class="GridHeader" valign="top" align="left">
<td width="60"><b>Select</b></td>
<td width="210" colspan="2"><b>Account Type</b></td>
<td width="160" colspan="2"><b>Number</b></td>
<td width="200"><b>Account known as</b></td>
</tr>
<tr align="left" valign="middle">
<td class="normal"><input type="radio" id="radButton" name="radButton" value="1233399,1515636"></td>
<td class="normal">ACCTYPE</td>
<td class="normal" width="10"> </td>
<td class="normal">ACCNUMBER</td>
<td class="normal" width="10"> </td>
<td class="normal">ACCNAME</td>
</tr>
<tr align="left" valign="top">
<td height="1" colspan="6" bgcolor="#cccccc"><img src=".img.gif" width="1" height="1" border="0"></td>
</tr>
<tr align="left" valign="middle">
<td class="normal"><input type="radio" id="radButton" name="radButton" value="2263763,2777747"></td>
<td class="normal">ACCTYPE</td>
<td class="normal" width="10"> </td>
<td class="normal">ACCNUMBER</td>
<td class="normal" width="10"> </td>
<td class="normal">ACCNAME</td>
</tr>
This goes on for many hundreds of rows.
My code's aim is to search the rows based on ACCNUMBER, and select the associated radio button. My code does this, but takes a LONG time to do it.
my ruby code so far is this:
require 'watir'
require 'nokogiri'
html = browser.html
doc = Nokogiri::HTML(html)
*csv import and loop stuff*
bkacc = CSV[0]
*nokogiri go fast!*
rows = doc.css("table[id='tblID'] tbody tr")
rows.each do |row|
target = row.text[bkacc]
if !target.nil?
cells = row.css("td[class='normal']")
#pushme = cells[0].css('input')[0]['value']
end
end
*watir goes slow*
browser.table(:id,"tblMaintenance").tbody.trs(:valign,"middle").find do |tr|
temp = tr.td(index: 0).radio(:id => "radButton").attribute_value("value")
if temp == #pushme
tr.td(index: 0).radio(:id => "radButton").set
break
end
end
*other commands and loop to next line in csv*
The finding of the :value of the button I want to push is very fast with nokogiri, but once found, using that :value to look for and set the button with watir is very slow.
My question is; how can I speed this up? I thought perhaps by using mechanize I could but the syntax escapes me. I'm still very new to Ruby so am probably missing some basic knowledge.
Assuming that the combination of the id and value attribute is unique for each radio button, you could locate the radio button directly. The entire Watir loop could be replaced by just:
browser.radio(:id => "radButton", :value => #pushme).set
Alternatively, I would try replacing both loops with the following. Finding the row will be a touch slower than the Nokogiri approach, but the code would be a lot simpler.
row = browser.table(id: 'tblID').tr(text: /#{bkacc}/)
row.radio.set
Related
I have a file which has a form the following HTML code:
<label for="subject">Subject</label>* : <input name="subject" id="subject" type="text">
<br>
<label for="message">Message</label>* : <textarea type="text" name="message" id="message"></textarea>
<br>
<input name="name" id="name" value="" type="hidden">
<input value="Submit Ticket" onclick="submitTicket()" type="button">
After I submit the form, the respective ticket will be in a table which has the following HTML code:
<table class="list" width="100%">
<tbody><tr class="messagelist">
<th>#</th>
<th>ID</th>
<th>Name</a></th>
<th>Subject</a></th>
<th>Owner</a></th>
<th>Priority</a></th>
</tr>
<tr class="list_row">
<td>1.</td>
<td>14</td>
<td class="name">X</td>
<td class="subject">Test1</td>
<td class="owner">AB</td>
<td class="priority">High</td>
</tr>
<tr class="list_row">
<td>2.</td>
<td>22</td>
<td class="name">Y</td>
<td class="subject">Test2</td>
<td class="owner">CD</td>
<td class="priority">Low</td>
</tr>
<tr class="list_row">
<td>3.</td>
<td>31</td>
<td class="name">Z</td>
<td class="subject">Test3</td>
<td class="owner">EF</td>
<td class="priority">Medium</td>
</tr>
<tr class="list_row">
<td>4.</td>
<td>42</td>
<td class="name">A</td>
<td class="subject">Test4</td>
<td class="owner">GH</td>
<td class="priority">High</td>
</tr>
<tr class="list_row">
<td>5.</td>
<td>34</td>
<td class="name">B</td>
<td class="subject">Test5</td>
<td class="owner">IJ</td>
<td class="priority">Low</td>
</tr>
<tr class="list_row">
<td>6.</td>
<td>43</td>
<td class="name">C</td>
<td class="subject">Test6</td>
<td class="owner">KL</td>
<td class="priority">Medium</td>
</tr>
</tbody></table>
I am writing a RUBY code for the above form and ticket. I want to verify the submitted form and ticket with respect to subject Test1. Based on the subject, I want to click the ID link of Test1.
Could anyone please help how to do this?
Here is what I tried:
require 'watir'
browser.tds(:class, 'list_row').each do |tds_row|
if tds_row.text =~ /Test1/
tds_row.a(:href, 'index.html').click
end
end
You can iterate over the table rows, and element of the iterated row will be a table cell. Then, you can examine the text of that cell. If the value of the 4th cell is the one you're looking for (i.e. "Test1"), then click the link in the second cell, and break out of the iteration. Here's a contrived example:
require 'watir'
b = Watir::Browser.new :chrome
b.goto("http://some_url")
b.button(:value => "Submit Ticket").click
b.trs.each do |tr|
if tr[3].text == "Test1"
tr[1].a.click
break
end
end
b.close
It's still clunky and fragile, so it might be better to target the cells based on their class attributes.
Also--in your watir example--it looks like you are trying to locate table cells based on the parent row's class attribute (i.e. browser.tds(:class, 'list_row') instead of browser.tds(:class, 'list_row').
Given that the HTML is well marked-up, iterating through the rows is more complicated than it has to be. It would be easier to find the specific subject cell and then navigate to its parent row.
# Find the cell with the specific subject text
subject = browser.td(class: 'subject', text: 'Test1')
# Get the row
row = subject.parent
# Click the tracking code link
row.link(class: 'trackingcode').click
I'd like to get items from this table:
<table style="margin: auto;width: 800px" id="myTable" class="tablesorter">
<thead>
<tr class="TableHeader">
<th >Game</th><th>Icon</th><th>Achievement</th>
<th>Achievers</th>
<th>Value</th>
</tr>
</thead>
<tbody>
<tr>
<td><img alt="Logo" src="http://cdn.akamai.steamstatic.com/steamcommunity/public/images/apps/440/07385eb55b5ba974aebbe74d3c99626bda7920b8.jpg" width=133 height=50 ></td>
<td> <table>
<tr>
<td class="AchievementBox" style="background-color: #347C17">
<a href="Steam_Achievement_Info.php?AchievementID=169&AppID=440"> <img alt="Icon" src="http://cdn.akamai.steamstatic.com/steamcommunity/public/images/apps/440/924764eea604817d3c14de9640ae6422c7cdfb7a.jpg" height='50' width='50'>
</a> </td>
</tr>
</table>
</td>
<td style="text-align: left" >Race for the Pennant<br>Run 25 kilometers.</td>
<td style="text-align: right">35505</td><td style="text-align: right">1.3</td>
The table has an id myTable so what I'd like to do is this:
go inside <tbody>
for each <tr> in table:
do something; maybe go inside <td> or get a link from <href>
I have:
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://astats.astats.nl/astats/TopListAchievements.php?DisplayType=2")
puts page.body
This prints the page but how do I actually iterate through the table rows?
Using css selector, to print text and href attribute values:
require 'nokogiri'
doc = Nokogiri::HTML(page.body)
doc.css('table#myTable tbody td[3] a').each {|a|
puts a.text, a[:href]
}
Hello and hopefully thanks for the help.
Honestly I am not very experienced at XPath and I am hoping a guru out there will have a quick answer for me.
I am scraping a web page for data. The defining aspect of the data I want is that it is contained in a row <tr> that has 7 <td> elements. Each <td> element has one of the pieces of data I need to import. I am using the HTML Agility Pack on CodePlex to grab the data, but I can't seem to figure out how to define the query.
Contained in the web page is a section like this:
<table border="0" cellpadding="3" cellspacing="1" width="100%">
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td class="dataHdrText02" valign="top" width="50" align="center"><nobr>SYMBOL</nobr></td>
<td class="dataHdrText02" valign="top" align="center">PERIOD</td>
<td class="dataHdrText02" valign="top" align="center" width="*">EVENT TITLE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ESTIMATE</td>
<td class="dataHdrText02" valign="top" align="center">EPS ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center">PREV. YEAR ACTUAL</td>
<td class="dataHdrText02" valign="top" align="center"><nobr>DATE/TIME (ET)</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO </nobr></td>
<td align="center">Q4 2011</td>
<td align="left" width="*">Q4 2011 CISCO Systems Inc Earnings Release</td>
<td align="center">$ 0.38 </td>
<td align="center">n/a </td>
<td align="center">$ 0.43 </td>
<td align="center"><nobr>10-Aug-11</nobr></td>
</tr>
<tr class="bgWhite">
<td align="center" width="50"><nobr>CSCO </nobr></td>
<td align="center">Q3 2011</td>
<td align="left" width="*">Q3 2011 Cisco Systems Earnings Release</td>
<td align="center">$ 0.37 </td>
<td align="center">$ 0.42 </td>
<td align="center">$ 0.42 </td>
<td align="center"><nobr>11-May-11 AMC</nobr></td>
</tr>
<tr class="bgWhite" xmlns:msxsl="urn:schemas-microsoft-com:xslt">
<td align="center" colspan="7"><img src="/format/cb/images/spacer.gif" width="1" height="4"></td>
</tr>
</table>
My goal is to grab the earnings event data and place it into a database for analysis. My original thought was to grab all <tr> elements with 7 <td> elements then work with that data. Any advice or alternative suggestions would be welcome.
This should do it for you.
//tr[count(td)=7]
I am trying to extract the name, ID, Phone, Email, Gender, Ethnicity, DOB, Class, Major, School and GPA from a page I am parsing with Nokogiri.
I tried some different xpath's but everything I try grabs much more than I want:
<span class="subTitle"><b>Recruit Profile</b></span>
<br><table border="0" width="100%"><tr>
<td>
<table bgcolor="#afafaf" border="0" cellpadding="0" width="100%">
<tr>
<td>
<table bgcolor="#cccccc" border="0" cellpadding="2" cellspacing="2" width="100%">
<tr>
<td bgcolor="#dddddd"><b>Name</b></td>
<td bgcolor="#dddddd">Some Person</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>EDU ID</b></td>
<td bgcolor="#dddddd">A12345678</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Phone</b></td>
<td bgcolor="#dddddd">123-456-7890</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Address</b></td>
<td bgcolor="#dddddd">1234 Somewhere Dr.<br>City ST, 12345</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Email</b></td>
<td bgcolor="#dddddd">someone#email.com</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Gender</b></td>
<td bgcolor="#dddddd">Female</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Ethnicity</b></td>
<td bgcolor="#dddddd">Unknown</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Date of Birth</b></td>
<td bgcolor="#dddddd">Jan 1st, 1901</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Class</b></td>
<td bgcolor="#dddddd">Sophomore</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>Major</b></td>
<td bgcolor="#dddddd">Biology</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>School</b></td>
<td bgcolor="#dddddd">University of Somewhere</td>
</tr>
<tr>
<td bgcolor="#dddddd"><b>GPA</b></td>
<td bgcolor="#dddddd">0.00</td>
</tr>
<tr>
<td bgcolor="#dddddd" valign="top"><b>Availability</b></td>
<td bgcolor="#dddddd">
<table border="0" cellspacing="0" cellpadding="0">
<tr>
I assume that there will be many "Recruit Profile" spans that are followed by tables that wrap up all the details. The following method takes your entire HTML page, finds just those spans, and for each of them it finds the following table and then finds the fields you want anywhere below that table:
require 'nokogiri'
# Pass in or set the array of labels you want to use
# Returns an array of hashes mapping these labels to the values
def recruits_details(html,fields=%W[Name #{"EDU ID"} Phone Email Gender])
doc = Nokogiri::HTML(html)
recruit_labels = doc.xpath('//span[b[text()="Recruit Profile"]]')
recruit_labels.map do |recruit_label|
recruit_table = recruit_label.at_xpath('following-sibling::table')
Hash[ fields.map do |field_label|
label_td = recruit_table.at_xpath(".//td[b[text()='#{field_label}']]")
[field_label, label_td.at_xpath('following-sibling::td/text()').text ]
end ]
end
end
require 'pp'
pp recruits_details(html_string)
#=> [{"Name"=>"Some Person",
#=> "EDU ID"=>"A12345678",
#=> "Phone"=>"123-456-7890",
#=> "Email"=>"someone#email.com",
#=> "Gender"=>"Female"}]
An XPath expression like .//foo[bar[text()="jim"]] means:
Find a 'foo' element anywhere under the current node
...but only if it has a 'bar' element as a child
...but only if that 'bar' element has the text "jim" as its content
An XPath expression like following-sibling::... means Find any elements that are siblings after the current node that match the expression ...
The XPath expression .../text() selects the Text node; the text method is used to extract the value (actual string) of that text node.
Nokogiri's xpath method returns an array of all elements matching the expression, while the at_xpath method returns the first element matching the expression.
I am using watij to automate my UI testing. I have many tables in a webpage. I need to find a table which has a width 95%. It contains many rows. I have to find each row with different text say "running first UI test on local" as below adn need to get the td value "Complete". I am not ble to get the value but I get the watij address. Let me know how I can find this.
<table width=95%>
<tr>
<th align="left">
<span id="lblHeaderComponent" style="font-size:10pt;font-weight:bold;">Component</span>
</th>
<th align="left">
<span id="lblHeaderServer" style="font-size:10pt;font-weight:bold;">Server</span>
</th>
<th align="left">
<span id="lblHeaderStatus" style="font-size:10pt;font-weight:bold;">
</span>
</th>
</tr>
<tr>
<td align="left"
nowrap="nowrap" style="font-size:12px;">running first UI test on local</td>
<td align="left" style="font-size:12px;">Google</td>
<td align="left" style="font-size:12px;">
<a style='color:#336600;'>Complete</a>
</td>
</tr>
<tr>
<td align="left"
style="border-top:1px solid #cfcfcf;border-bottom:1px solid #cfcfcf;"
colspan="3"
style="font-size:12px; color:#ff3300;">
</td>
</tr>
<tr>
<td align="left" nowrap="nowrap" style="font-size:12px;">running second UI test on local</td>
<td align="left" style="font-size:12px;">Google</td>
<td align="left" style="font-size:12px;">
<a style='color:#336600;'>Complete</a>
</td>
</tr>
</table>
You can try an xpath visualizer like this one to assist you in getting the right expression. It lets you see the results visually.
Using XPath on HTML assumes the HTML is XHTML - in other words it must be well-formed XML.