How can I use Capybara to check that the correct items are listed? - ruby

I'm trying to use Capybara to test that a list contains the correct items. For example:
<table id="rodents">
<tr><td class="rodent_id">1</td><td class="rodent_name">Hamster</td></tr>
<tr><td class="rodent_id">2</td><td class="rodent_name">Gerbil</td></tr>
</table>
This list should contain ids 1 and 2, but should not include 3.
What I'd like is something like:
ids = ? # get the contents of each row's first cell
ids.should include(1)
ids.should include(2)
ids.should_not include(3)
How might I do something like that?
I'm answering with a couple of unsatisfactory solutions I've found, but I'd love to see a better one.

Here is a slightly simplified expression:
rodent_ids = page.all('table#rodents td.rodent_id').map(&:text)
From there, you can do your comparisons.
rodent_ids.should include(1)
rodent_ids.should include(2)
rodent_ids.should_not include(3)

Looking for specific rows and ids
A bad solution:
within ('table#rodents tr:nth-child(1) td:nth-child(1)') do
page.should have_content #rodent1.id
end
within ('table#rodents tr:nth-child(2) td:nth-child(1)') do
page.should have_content #rodent1.id
end
page.should_not have_selector('table#rodents tr:nth-child(3)')
This is verbose and ugly, and doesn't really say that id 3 shouldn't be in the table.

Gathering the ids into an array
This is what I was looking for:
rodent_ids = page.all('table#rodents td:nth-child(1)').map{|td| td.text}
From there, I can do:
rodent_ids.should include(1)
rodent_ids.should include(2)
rodent_ids.should_not include(3)
Or just:
rodent_ids.should eq(%w[1 2])

Using has_table?
A bad solution:
page.has_table?('rodents', :rows => [
['1', 'Hamster'],
['2', 'Gerbil']
]
).should be_true
This is very clear to read, but:
It's brittle. If the table structure or text changes at all, it fails.
If it fails, it just says it expected false to be true; I don't know an easy way to compare what the table really looks like with what's expected, other than print page.html
The has_table? method might get removed at some point

Related

How to retrieve string using XPath without returning null errors

I'm trying to write "Private Equity Group; USA" to a file.
"Private Equity Group" prints fine, but I get an error for the "USA" portion
TypeError: null is not an object (evaluating 'style.display')"
HTML code:
<div class="cl profile-xsmall">
<div class="cl profile-small-bold">Private Equity Group</div>
USA
</div>
The XPath for "USA" is:
//*[#id="addrDiv-Id"]/div/div[3]/text()
I get the error when I print the XPath or have it in an if statement:
if (internet.has_xpath?('//*[#id="addrDiv-Id"]/div/div[3]/text()')){
file.puts "#{internet.find(:xpath, '//*[#id="addrDiv-Id"]/div/div[3]/text()')}"
}
Capybara is not a general purpose xpath library - it is a library aimed at testing, and therefore is element centric. The xpaths used need to refer to elements, not text nodes.
if (internet.has_xpath?('//*[#id="addrDiv-Id"]/div/div[3]')){
file.puts internet.find(:xpath, '//*[#id="addrDiv-Id"]/div/div[3]').text
}
although using XPath at all for this is just a bad idea. Whenever possible default to CSS, it's easier to read, and faster for the browser to process - something like
if (internet.has_css?('#addrDiv-Id > div > div:nth-of-type(3)')){
file.puts internet.find('#addrDiv-Id" > div > div:nth-of-type(3)').text
}
or if the HTML allows it (I don't know without seeing more of the HTML)
if (internet.has_css?('#addrDiv-id .cl.profile-xsmall')){
file.puts internet.find('#addrDiv-id .cl.profile-xsmall').text
}
or even cleaner if it works for your use case
file.puts internet.first('#addrDiv-id .cl.profile-xsmall')&.text
Another way to do it :
xml = %{<div class="cl profile-xsmall">
<div class="cl profile-small-bold">Private Equity Group</div>
USA</div>}
require 'rexml/document'
doc = REXML::Document.new xml
print(REXML::XPath.match(doc, 'normalize-space(string(//div[#class="cl profile-xsmall"]))'))
Output :
["Private Equity Group USA"]
I'd say the HTML isn't well-formed, using span would have been better, but this works:
require 'nokogiri'
doc = Nokogiri::HTML(<<EOT)
<div class="cl profile-xsmall">
<div class="cl profile-small-bold">Private Equity Group</div>
USA
</div>
EOT
div = doc.at('.profile-small-bold')
[div.text.strip, div.next_sibling.text.strip].join(' ')
# => "Private Equity Group USA"
which can be reduced to:
[div, div.next_sibling].map { |n| n.text.strip }.join(' ')
# => "Private Equity Group USA"
The problem is that you have two nested divs, with "USA" trailing, so it's important to point to the inner node which has the main text you want. Then "USA" is in the following text node, which is accessible using next_sibling:
div.next_sibling.class # => Nokogiri::XML::Text
div.next_sibling # => #<Nokogiri::XML::Text:0x3c "\n USA\n">
Note, I'm using CSS selectors; They're easier to read, which is echoed by the Nokogiri documentation. I have no proof they're faster, and, because Nokogiri uses libxml to process both, there's probably no real difference worth worrying about, so use whatever makes more sense, and run benchmarks if you're curious.
You might be tempted to use text against the div class="cl profile-xsmall" node, but don't be sucked into that, as it's a trap:
doc.at('.profile-xsmall').text # => "\n Private Equity Group\n USA\n"
doc.at('.profile-xsmall').text.gsub(/\s+/, ' ').strip # => "Private Equity Group USA"
text will return a string of the text nodes after they're concatenated together. In this particular, rare case, it results in a somewhat usable result, however, usually you'll get something like this:
doc = Nokogiri::HTML('<div><p>foo</p><p>bar</p></div>')
doc.at('div').text # => "foobar"
doc.search('p').text # => "foobar"
Once those text nodes have been concatenated it's really difficult to take them apart again. Nokogiri's documentation talks about this:
Note: This joins the text of all Node objects in the NodeSet:
doc = Nokogiri::XML('<xml><a><d>foo</d><d>bar</d></a></xml>')
doc.css('d').text # => "foobar"
Instead, if you want to return the text of all nodes in the NodeSet:
doc.css('d').map(&:text) # => ["foo", "bar"]
The XPath for "USA" is:
//*[#id="addrDiv-Id"]/div/div[3]/text()
Um, no, not according to the HTML you gave us. But, let's pretend.
Using an absolute path to a node is a good way to write fragile selectors. It takes only a small change in the HTML to break your access to the node. Instead, find way-points to skip through the HTML to find the node you want, taking advantage of CSS and XPath to search downward through the DOM.
Typically, a selector like yours is generated by a browser, which isn't a good source to trust. Often browsers do fixups on malformed HTML, which changes it from what Nokogiri or a parser would see, resulting in a non-existing target, or the browser presents the HTML after JavaScript has had a change to run, which can move nodes, hide them, add new ones, etc.
Instead of trusting the browser, use curl, wget or nokogiri at the command-line to dump the file and look at it using a text editor. Then you'll be seeing it just as Nokogiri sees it, prior to any fixups or mangling.

scrapy xpath : selector with many <tr> <td>

Hello I want to ask a question
I scrape a website with xpath ,and the result is like this:
[u'<tr>\r\n
<td>address1</td>\r\n
<td>phone1</td>\r\n
<td>map1</td>\r\n
</tr>',
u'<tr>\r\n
<td>address1</td>\r\n
<td>telephone1</td>\r\n
<td>map1</td>\r\n
</tr>'...
u'<tr>\r\n
<td>address100</td>\r\n
<td>telephone100</td>\r\n
<td>map100</td>\r\n
</tr>']
now I need to use xpath to analyze this results again.
I want to save the first to address,the second to telephone,and the last one to map
But I can't get it.
Please guide me.Thank you!
Here is code,it's wrong. it will catch another thing.
store = sel.xpath("")
for s in store:
address = s.xpath("//tr/td[1]/text()").extract()
tel = s.xpath("//tr/td[2]/text()").extract()
map = s.xpath("//tr/td[3]/text()").extract()
As you can see in scrappy documentation to work with relative XPaths you have to use .// notation to extract the elements relative to the previous XPath, if not you're getting again all elements from the whole document. You can see this sample in the scrappy documentation that I referenced above:
For example, suppose you want to extract all <p> elements inside <div> elements. First, you would get all <div> elements:
divs = response.xpath('//div')
At first, you may be tempted to use the following approach, which is wrong, as it actually extracts all <p> elements from the document, not only those inside <div> elements:
for p in divs.xpath('//p'): # this is wrong - gets all <p> from the whole document
This is the proper way to do it (note the dot prefixing the .//p XPath):
for p in divs.xpath('.//p'): # extracts all <p> inside
So I think in your case you code must be something like:
for s in store:
address = s.xpath(".//tr/td[1]/text()").extract()
tel = s.xpath(".//tr/td[2]/text()").extract()
map = s.xpath(".//tr/td[3]/text()").extract()
Hope this helps,

accessing the text value of last nested <tr> element with no id or class hooks

I need to access the value of the 10th <td> element in the last row of a table. I can't use an ID as a hook because only the table has an ID. I've managed to make it work using the code below. Unfortunately, its static. I know I will always need the 10th <td> element, but I won't ever know which row it needs to be. I just know it needs to be the last row in the table. How would I replace "tr[6]" with the actual last <tr> dynamically? (this is probably really easy, but this is literally my first time doing anything with ruby).
page = Nokogiri::HTML(open(url))
test = page.css("tr[6]").map { |row|
row.css("td[10]").text}
puts test
You want to do:
page.at("tr:last td:eq(10)")
If you do not need to do anything else with the page you can actually make this a single line with
test = Nokogiri::HTML(open(url)).search("tr").last.search("td")[10].text
Otherwise (this will work):
page = Nokogiri::HTML(open(url))
test = page.search("tr").last.search("td")[10].text
puts test
Example:(Used a large table from another question on StackOverflow)
Nokogiri::HTML(open("http://en.wikipedia.org/wiki/Richard_Dreyfuss")).search('table')[1].search('tr').last.search('td').children.map{|c| c.text}.join(" ")
#=> "2013 Paranoia Francis Cassidy"
Is there a particular reason you want an Array with 1 element? My example will return a string but you could easily modify it to return an Array.
You can use CSS pseudo class selectors for this:
page.css("table#the-table-id tr:last-of-type td:nth-of-type(10)")
This first selects the <table> with the appropriate id, then selects the last <tr> child of that table, and then selects the 10th <td> of that <tr>. The result is an array of all matching elements, if youexpect there to be only one you could use at_css instead.
If you prefer XPath, you could use this:
page.xpath("//table[#id='the-table-id']/tr[last()]/td[10]")

Hpricot: How to extract inner text without other html subelements

I'm working on a vim rspec plugin (https://github.com/skwp/vim-rspec) - and I am parsing some html from rspec. It looks like this:
doc = %{
<dl>
<dt id="example_group_1">This is the heading text</dt>
Some puts output here
</dl>
}
I can get the entire inner of the using:
(Hpricot.parse(doc)/:dl).first.inner_html
I can get just the dt by using
(Hpricot.parse(doc)/:dl).first/:dt
But how can I access the "Some puts output here" area? If I use inner_html, there is way too much other junk to parse through. I've looked through hpricot docs but don't see an easy way to get essentially the inner text of an html element, disregarding its html children.
I ended up figuring out a route by myself, by manually parsing the children:
(#context/"dl").each do |dl|
dl.children.each do |child|
if child.is_a?(Hpricot::Elem) && child.name == 'dd'
# do stuff with the element
elsif child.is_a?(Hpricot::Text)
text=child.to_s.strip
puts text unless text.empty?
end
end
Note that this is bad HTML you have there. If you have control over it, you should wrap the content you want in a <dd>.
In XML terms what you are looking for is the TextNode following the <dt> element. In my comment I showed how you can select this node using XPath in Nokogiri.
However, if you must use Hpricot, and cannot select text nodes using it, then you could hack this by getting the inner_html and then stripping out the unwanted:
(Hpricot.parse(doc)/:dl).first.inner_html.sub %r{<dt>.+?</dt>}, ''

Brain storming a method of validating a HTML table's content with Watir-webdriver

I am trying to work out a method to check the content of an HTML table with Watir-webdriver. Basically I want to validate the table contents against a saved valid table (CSV file) and they are the same after a refresh or redraw action.
Ideas I've come up with so far are to:
Grab the table HTML and compare that as a string with the baseline value.
Iterate through each cell and compare the HTML or text content.
Generate a 2D array representation on the table contents and do an array compare.
What would be the fastest/best approach? Do you have insights on how you handled a similar problem?
Here is an example of the table:
<table id="attr-table">
<thead>
<tr><th id="attr-action-col"><input type="checkbox" id="attr-action-col_box" class="attr-action-box" value=""></th><th id="attr-scope-col"></th><th id="attr-workflow-col">Status</th><th id="attr-type-col"></th><th id="attr-name-col">Name<span class="ui-icon ui-icon-triangle-1-n"></span></th><th id="attr-value-col">Francais Value</th></tr></thead>
<tbody>
<tr id="attr-row-209"><td id="attr_action_209" class="attr-action-col"><input type="checkbox" id="attr_action_209_box" class="attr-action-box" value=""></td><td id="attr_scope_209" class="attr-scope-col"><img src="images/attrib_bullet_global.png" title="global"></td><td id="attr_workflow_209" class="attr-workflow-col"></td><td id="attr_type_209" class="attr-type-col"><img src="images/attrib_text.png"></td><td id="attr_name_209" class="attr-name-col">Name of: Catalogue</td><td id="attr_value_209" class="attr-value-col"><p class="acms ws-editable-content lang_10">2010 EI-176</p></td></tr>
<tr id="attr-row-316"><td id="attr_action_316" class="attr-action-col"><input type="checkbox" id="attr_action_316_box" class="attr-action-box" value=""></td><td id="attr_scope_316" class="attr-scope-col"><img src="images/attrib_bullet_global.png" title="global"></td><td id="attr_workflow_316" class="attr-workflow-col"></td><td id="attr_type_316" class="attr-type-col"><img src="images/attrib_text.png"></td><td id="attr_name_316" class="attr-name-col">_[Key] Media key</td><td id="attr_value_316" class="attr-value-col"><p class="acms ws-editable-content lang_10"><span class="acms acms-choice" contenteditable="false" id="568">163</span></p></td></tr>
<tr id="attr-row-392"><td id="attr_action_392" class="attr-action-col"><input type="checkbox" id="attr_action_392_box" class="attr-action-box" value=""></td><td id="attr_scope_392" class="attr-scope-col"><img src="images/attrib_bullet_global.png" title="global"></td><td id="attr_workflow_392" class="attr-workflow-col"></td><td id="attr_type_392" class="attr-type-col"><img src="images/attrib_numeric.png"></td><td id="attr_name_392" class="attr-name-col">_[Key] Numéro d'ordre</td><td id="attr_value_392" class="attr-value-col"><p class="acms ws-editable-content lang_10">2</p></td></tr>
</tbody>
</table>
Just one idea I came up with. I used Hash and Class object instead of 2D array.
foo.csv
209,global,text.Catalogue,2010 EI-176
392,global,numeric,Numéro d'ordre,2
require 'csv'
expected_datas = CSV.readlines('foo.csv').map do |row|
{
:id => row[0],
:scope => row[1],
:type => row[2],
:name => row[3],
:value => row[4]
}
end
class Data
attr_reader :id,:scope,:type,:name,:value
def initialize(tr)
id = tr.id.slice(/attr-row-([0-9]+)/,1)
scope = tr.td(:id,/scope/).img.src.slice(/attr_bullet_(.+?).png/,1)
type = tr.td(:id,/type/).img.src.slice(/attrib_(.+?).png/,1)
name = tr.td(:id,/name/).text
value = tr.td(:id,/value/).text
end
end
browser = Watir::Browser.new
browser.goto 'foobar'
datas = browser.table(:id,'attr-table').tbody.trs.map{|tr| Data.new(tr)}
datas.zip(expected_datas).each do |data,expected_data|
Data.instance_methods(false).each do |method|
data.send(method).should == expected_data[method.to_sym]
end
end
# something action (refresh or redraw action)
browser.refresh
after_datas = browser.table(:id,'attr-table').tbody.trs.map{|tr| Data.new(tr)}
datas.zip(after_datas).each do |data,after_data|
Data.instance_methods(false).each do |method|
data.send(method).should == after_data.send(method)
end
end
What level of detail do you want the mismatch(es) reported with? I think that might well define the approach you want to take.
For example if you just want to know if there's a mismatch, and don't care where, then comparing arrays might be easiest.
If the order of the rows could vary, then I think comparing Hashes might be best
If you want each mismatch reported individually then iterating by row and column would allow you to report discrete errors, especially if you build a list of differences and then do your assert at the very end based on number of differences found
You could go for exact match
before_htmltable <=> after_htmltable
Or you could strip whitespace
before_htmltable.gsub(/\s+/, ' ') <=> after_htmltable.gsub(/\s+/, ' ')
I would think that creating the array then comparing each element would be more expensive.
Dave

Resources