I'm usisng cucmber to test a php app and it's working quite well actually.
I have a cucmber feature that uses the following step to check for the presence of a link in a page:
Then /^I should see a link that contains "(.+)"$/ do |link|
assert !!(response_body =~
/<a ([\w\.]*="[\w\.]*" )*href="(http:\/\/)?([\w\.\/]*)?(#{link})/m), response_body
end
Now, this works but it's butt ugly and complicated.
Originally I tried using the xpath thing:
response_body.should have_xpath("//a[#href=\"#{link}\"]")
But then if I check for a link to 'blah.com' then it won't match 'http://blah.com" - which kind of defeats the whole purpose of the test. Hence the reason I switched to regex.
So is there a simpler way to write the test which doesn't rely on complicated regular expressions?
Cheers.
EDIT:
After lots of hair-pulling... I did find a less messy way to find images on my page:
response_body.should include(image)
Where the image string is set to something like 'myimage.png' - of course, this will break if the actual text 'myimage.png' is on the page and not the image.
There must be a better way. I was considering Hpricot to see if I can parse the html and pull out the attribute I want to test, then test that with a regex but that all seems so... bloated.
Something like this should work:
response_body.should have_css("a[href*='#{link}']")
See this for details:
http://www.w3.org/TR/css3-selectors/#attribute-substrings
EDIT:
Looks like the equivalent method for webrat is have_selector, so:
response_body.should have_selector("a[href*='#{link}']")
Then /^I should see the image "(.+)"$/ do |image|
response_body.should have_selector("img[src*='#{image}']")
end
Then /^I should see a link that contains "(.+)"$/ do |link|
response_body.should have_selector("a[href*='#{link}']")
end
Thanks AlistairH - your advice worked! :)
I still don't undestand why it's searching html with a css selector syntax but maybe that was just a design choice they guy who wrote it took because it looked easier than regex...? I don't know.
Related
It seems like both of these gems perform very similar tasks. Can anyone give examples of where one gem would be more useful than the other? I don't have specific code that I'm referring to, I'm more wondering about general use cases for each gem. I know this is a short question, I will fill in the blanks upon request. Thanks.
The reason they look like they perform similar tasks is OpenURI is a wrapper for Net::HTTP, Net::HTTPS, and Net::FTP.
Usually, unless you feel you need a lower level interface, using OpenURI is better as you can get by with less code. Using OpenURI you can open a URL/URI and treat it as a file.
See: http://www.ruby-doc.org/stdlib-1.9.3/libdoc/open-uri/rdoc/OpenURI.html
and http://ruby-doc.org/stdlib-1.9.3//libdoc/net/http/rdoc/Net.html
I just found out that open does follow redirections, while Net::HTTP doesn't, which is an important difference.
For example, open('http://www.stackoverflow.com') { |content| puts content.read } will display the proper HTML after following the redirection, while Net::HTTP.get(URI('http://www.stackoverflow.com')) will show the redirection message and 302 status code.
I've set screenshots to be taken when a scenario fails, but my html report shows the same screenshot on all failed scenarios. Can anyone help and let me know how I can get unique screenshots taken for each failed scenario.
Here is my code in my env.rb:
After do |scenario|
if scenario.failed?
#browser.driver.save_screenshot("screenshot.png")
embed("screenshot.png", "image/png")
end
You are saving the screenshot to the same file each time (ie overwriting the previous screenshot each time). The report has also linked all the images to the same place. This is why you get the same image everywhere.
You need to provide a unique name for the screenshot.
For example, you could timestamp (with date and time) the images:
After do |scenario|
if scenario.failed?
screenshot_file = "screenshot-#{Time.now.strftime('%Y%m%d-%H%M%S')}.png"
#browser.driver.save_screenshot(screenshot_file)
embed(screenshot_file, "image/png")
end
end
Justin has the answer, But while timestamps are fine, they make it harder when looking at the files to know which one was for what scenario. When I run tests while creating and debugging, I often don't even look at the html report, I just look at the generated screenshot, so in that case it is doubly useful to have more logical names.
So the code I use looks like this, and embeds using the scenario name
After do |scenario|
if scenario.failed?
screenshot = "./FAILED_#{scenario.name.gsub(' ','_').gsub(/[^0-9A-Za-z_]/, '')}.png"
#browser.driver.save_screenshot(screenshot)
encoded_img = #browser.driver.screenshot_as(:base64)
embed("data:image/png;base64,#{encoded_img}",'image/png')
end
end
An even more robust approach, dealing with scenario outlines and appending timestamp is described in this 'Two Four One' blog posting so far I've not needed to go that far, but I may pull in stuff from it to deal with scenario outlines.
for the ones using RSpec, there is a really nice implementation of HtmlFormatter in the watir-rspec project
I have to work with some really ugly looking markup and I am running it through Tidy on ruby. For the most part it works great except for the fact that it lumps a ton of hidden inputs that are in the markup on to one line. I know there is a setting for a column wrap but it would be nicer if it just put sibling inputs on separate lines. It is important because it would simplify debugging when looking at the markup and seeing the info quickly in those hidden inputs.
I have yet to find a tool that does this. So is there anything out there or am I being foolish?
I should also add that a lot of the issues stem from the bad markup I get initially and there is nothing I can do to clean it up before it gets to me. I tried Nokogiri-pretty to clean it up and it was so close to being perfect but it turned script tags in to self closing tags which is no good.
Right now I am settling with Tidying the source and then (I know this is terrible) gsub(/<input[^>]*>/, '\0'+"\n"). I love the fact that I had to concat the capture with the newline.
Tidy tends to be problematic in Ruby. It has been reported to leak memory, it isn't 1.9 compatible, etc. However, you may be able to skip Tidy altogether by using Nokogiri and the nokogiri-pretty gem.
Assuming you have a Nokogiri doc:
require 'nokogiri-pretty'
puts doc.human
In addition to other tidying, all <input> tags will be on their own line and properly indented.
Nokogiri can do that easy enough:
doc.css('input').each{|input| input.before "\n"}
I would like to crawl a popular site (say Quora) that doesn't have an API and get some specific information and dump it into a file - say either a csv, .txt, or .html formatted nicely :)
E.g. return only a list of all the 'Bios' of the Users of Quora that have, listed in their publicly available information, the occupation 'UX designer'.
How would I do that in Ruby ?
I have a moderate enough level of understanding of how Ruby & Rails work. I just completed a Rails app - mainly all written by myself. But I am no guru by any stretch of the imagination.
I understand RegExs, etc.
Your best bet would be to use Mechanize.It can follow links, submit forms, anything you will need, web client-wise. By the way, don't use regexes to parse HTML. Use an HTML parser.
If you want something more high level, try wombat, which is this gem I built on top of Mechanize and Nokogiri. It is able to parse pages and follow links using a really simple and high level DSL.
I know the answer has been accepted, but Hpricot is also very popular for parsing HTML.
All you have to do is take a look at the html source of the pages and try to find a XPath or CSS expression that matches the desired elements, then use something like:
doc.search("//p[#class='posted']")
Mechanize is awesome. If you're looking to learn something new though, you could take a look at Scrubyt: https://github.com/scrubber/scrubyt. It looks like Mechanize + Hpricot. I've never used it, but it seems interesting.
Nokogiri is great, but I find the output messy to work with. I wrote a ruby gem to easily create classes off HTML: https://github.com/jassa/hyper_api
The HyperAPI gem uses Nokogiri to parse HTML with CSS selectors.
E.g.
Post = HyperAPI.new_class do
string title: 'div#title'
string body: 'div#body'
string author: '#details .author'
integer comments_count: '#extra .comment' do
size
end
end
# => Post
post = Post.new(html_string)
# => #<Post title: 'Hi there!', body: 'This blog post will talk about...', author: 'Bob', comments_count: 74>
I've written a scrubyt extractor based on the 'learning' technique - that is, specifying the current text on the page and getting it to work out the XPath expressions itself. However, I now want to export the extractor so that it can be used even when the page has changed.
The documentation for scrubyt seems to be all over the place now, but from what I can find I should be able to put the line extractor.export(__FILE__) and it should work. It doesn't - I just get an error saying that there is the wrong number of arguments for export, it should have 0. I've tried it without any arguments and it still fails.
I would ask on the scrubyt forum, but it seems like no-one's been there for ages!
Any ideas what to do here?
Just had the same problem and tried "puts google_data.export()" (trying to get some stuff from google)
This gave me the following:
=== Extractor tree ===
export() is not working at the moment, due to the removal or
ParseTree, ruby2ruby and RubyInline.
For now, in case you are using examples, you can replace them by hand
based on the output below.
So if your pattern in the learning extractor looks like
book "Ruby Cookbook"
and you see the following below:
[book] /table[1]/tr/td[2]
then replace "Ruby Cookbook" with "/table[1]/tr/td[2]" (and all the
other XPaths) and you are ready!
[link] /body/div/div/div/div/div/ol/li/h3/a
which gave me the xpath I was looking for
scrubyt version is 0.4.06