How to set a "base URL" for Webrat, Mechanize - ruby

I would like to specify a base URL so I don't have to always specify absolute URLs. How can I specify a base URL for Mechanize to use?

To accomplish the previously proffered answer using Webrat, you can do the following e.g. in your Cucumber env.rb:
require 'webrat'
Webrat.configure do |config|
config.mode = :mechanize
end
World do
session = Webrat::Session.new
session.extend(Webrat::Methods)
session.extend(Webrat::Matchers)
session.visit 'http://yoursite/yourbasepath/'
session
end
To make it more robust, such as for use in different environments, you could do:
ENV['CUCUMBER_HOST'] ||= 'yoursite'
ENV['CUCUMBER_BASE_PATH'] ||= '/yourbasepath/'
# Webrat
require 'webrat'
Webrat.configure do |config|
config.mode = :mechanize
end
World do
session = Webrat::Session.new
session.extend(Webrat::Methods)
session.extend(Webrat::Matchers)
session.visit('http://' + ENV['CUCUMBER_HOST'] + ENV['CUCUMBER_BASE_PATH'])
session
end
Note that if you're using Mechanize, Webrat will also fail to follow your redirects because it won't interpret the current host correctly. To work around this, you can add session.header('Host', ENV['CUCUMBER_HOST']) to the above.
To make sure the right paths are being used everywhere for visiting and matching, add ENV['CUCUMBER_BASE_PATH'] + to the beginning of your paths_to method in paths.rb, if you use it. It should look like this:
def path_to(page_name)
ENV['CUCUMBER_BASE_PATH'] +
case page_name
Apologies if anyone got a few e-mails from this -- I originally tried to post as a comment and Stack Overflow's irritating UI got the better of me.

For Mechanize, the first URL you specify will be considered the base URL. For example:
require "rubygems"
require "mechanize"
agent = Mechanize.new
agent.get("http://some-site.org")
# Subsequent requests can now use the relative path:
agent.get("/contact.html")
This way you only specify the base URL once.

Related

Ruby-Rspec: Initialize PageObjects once in spec_helper instead of each spec

I variables set for xpaths in a file called PageObjects. Each spec I run I initialize the page objects with "p = PageObjects.new". However, I would like to initialize "p = PageObjects.new" once in "spec_helper.rb" instead of each spec.
This still gives me "error: uninitialized constant PageObject"...
require 'selenium-webdriver'
require 'yaml'
require 'rspec/retry'
require 'pry'
require 'bundler/setup'
p = PageObject.new
RSpec.configure do |config|
config.default_sleep_interval = 1
config.default_retry_count = 4
config.verbose_retry = false
config.display_try_failure_messages = true
config.exceptions_to_retry = [Net::ReadTimeout, Capybara::ElementNotFound]
end
Is there a way to achieve my goal by initializing PageObject once inside spec_helper rather than in each spec?
RSpec helpers seems to be the perfect solution for you
define the helper.rb
module Helpers
def p
#page_object ||= PageObject.new
end
end
Configure RSpec to include it:
RSpec.configure do |c|
c.include Helpers
end
And then you can use p method that will give you the PageObject:
specify do
expect(p).to be_a(PagObject)
expect(p.object_id).to eq(p.object_id)
end
You effectively want your test database to be maintained between tests. This is dangerous for a number of reasons, the most obvious being previous tests will affect future ones. As you're dealing with the same PageObject you will need to reset it between tests.
Putting that to one side, the options for enabling / disabling this can be found at:
https://relishapp.com/rspec/rspec-rails/docs/transactions, namely:
When you run rails generate rspec:install, the spec/rails_helper.rb
file includes the following configuration:
RSpec.configure do |config|
config.use_transactional_fixtures = true
end
The name of this setting is a bit misleading. What it really means
in Rails is "run every test method within a transaction." In the
context of rspec-rails, it means "run every example within a
transaction."
The idea is to start each example with a clean database, create
whatever data is necessary for that example, and then remove that data
by simply rolling back the transaction at the end of the example.
Disabling transactions If you prefer to manage the data yourself, or
using another tool like database_cleaner to do it for you, simply tell
RSpec to tell Rails not to manage transactions:
RSpec.configure do |config|
config.use_transactional_fixtures = false
end

poltergeist doesn't seem to wait for phantomjs to load in capybara

I'm trying to get some rspec tests run using a mix of Capybara, Selenium, Capybara/webkit, and Poltergeist. I need it to run headless in certain cases and would rather not use xvfb to get webkit working. I am okay using selenium or poltergeist as the driver for phantomjs. The problem I am having is that my tests run fine with selenium and firefox or chrome but when I try phantomjs the elements always show as not found. After looking into it for a while and using page.save_screenshot in capybara I found out that the phantomjs browser wasn't loaded up when the driver told it to find elements so it wasn't returning anything. I was able to hack a fix to this in by editing the poltergeist source in <gem_path>/capybara/poltergeist/driver.rb as follows
def visit(url)
if #started
sleep_time = 0
else
sleep_time = 2
end
#started = true
browser.visit(url)
sleep sleep_time
end
This is obviously not an ideal solution for the problem and it doesn't work with selenium as the driver for phantomjs. Is there anyway I can tell the driver to wait for phantom to be ready?
UPDATE:
I was able to get it to run by changing where I included the Capybara::DSL. I added it to the RSpec.configure block as shown below.
RSpec.configure do |config|
config.include Capybara::DSL
I then passed the page object to all classes I created for interacting with the webpage ui.
An example class would now look like this
module LoginUI
require_relative 'webpage'
class LoginPage < WebPages::Pages
def initialize(page, values = {})
super(page)
end
def visit
browser.visit(login_url)
end
def login(username, password)
set_username(username)
set_password(password)
sign_in_button
end
def set_username(username)
edit = browser.find_element(#selectors[:login_edit])
edit.send_keys(username)
end
def set_password(password)
edit = browser.find_element(#selectors[:password_edit])
edit.send_keys(password)
end
def sign_in_button
browser.find_element(#selectors[:sign_in_button]).click
end
end
end
Webpage module looks like this
module WebPages
require_relative 'browser'
class Pages
def initialize(page)
#page = page
#browser = Browser::Browser.new
end
def browser
#browser
end
def sign_out
browser.visit(sign_out_url)
end
end
end
The Browser module looks like this
module Browser
class Browser
include Capybara::DSL
def refresh_page
page.evaluate_script("window.location.reload()")
end
def submit(locator)
find_element(locator).click
end
def find_element(hash)
page.find(hash.keys.first, hash.values.first)
end
def find_elements(hash)
page.find(hash.keys.first, hash.values.first, match: :first)
page.all(hash.keys.first, hash.values.first)
end
def current_url
return page.current_url
end
end
end
While this works I don't want to have to include the Capybara::DSL inside RSpec or have to include the page object in the classes. These classes have had some things removed for the example but show the general structure. Ideally I would like to have the Browser module include the Capybara::DSL and be able to handle all of the interaction with Capybara.
Your update completely changes the question so I'm adding a second answer. There is no need to include the Capybara::DSL in your RSpec configure if you don't call any Capybara methods from outside your Browser class, just as there is no need to pass 'page' to all your Pages classes if you limit all Capybara interaction to your Browser class. One thing to note is that the page method provided by Capybara::DSL is just an alias for Capybara.current_session so technically you could just always call that.
You don't show in your code how you're handling any assertions/expectations on the page content - so depending on how you're doing that you may need to include Capybara::RSpecMatchers in your RSpec config and/or your WebPages::Pages class.
Your example code has a couple of issues that immediately pop out, firstly your Browser#find_elements (assuming I'm reading your intention for having find first correctly) should probably just be
def find_elements(hash)
page.all(hash.keys.first, hash.values.first, minimum: 1)
end
Secondly, your LoginPage#login method should have an assertion/expectation on a visual change that indicates login succeeded as its final line (verify some message is displayed/logged in menu exists/ etc), to ensure the browser has received the auth cookies, etc before the tests move on. What that line looks like depends on exactly how you're architecting your expectations.
If this doesn't answer your question, please provide a concrete example of what exactly isn't working for you since none of the code you're showing indicates any need for Capybara::DSL to be included in either of the places you say you don't want it.
Capybara doesn't depend on visit having completed, instead the finders and matchers will retry up to a specified period of time until they succeed. You can increase this amount of time by increasing the value of Capybara.default_max_wait_time. The only methods that don't wait by default are first and all, but can be made to wait/retry by specifying any of the count options
first('.some_class', minimum: 1) # will wait up to Capybara.default_max_wait_time seconds for the element to exist on the page.
although you should always prefer find over first/all whenever possible
If increasing the maximum wait time doesn't solve your issue, add an example of a test that fails to your question.

How do I scrape data from a page that loads specific data after the main page load?

I have been using Ruby and Nokogiri to pull data from a URL similar to this one from the hollister website: http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358
My script looks like this right now:
require 'rubygems'
require 'nokogiri'
require 'open-uri'
page = Nokogiri::HTML(open("http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358"))
puts page.css("h3[data-property=GLB_ORDERNUMBERSYMBOL]")[0].text
My problem is that the Hollister page has some sort of asynchronous loading of data, such that when my script checks the area of the page with order specific data for a page element, it doesn't exist yet. I.E., the <h3> with data-property=GBL_ORDERNUMBERSYMBOL doesn't yet exist, but in the browser if you let it load for another ten seconds, the DOM and HTML change to reflect the specific order details.
What is the best way to capture this data that loads after the fact? I have tried using the watir-webdriver, but not sure what I would need to do to make that one work either.
Try installing Capybara-webkit (make sure you have QtWebKit installed, otherwise the gem install would fail). This will give you a headless solution. Then try this:
require 'capybara-webkit'
require 'capybara/dsl'
require 'nokogiri'
require 'open-uri'
url = 'http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358'
#change the capybara config to DSL and to use webkit
include Capybara::DSL
Capybara.current_driver = :webkit
visit(url)
doc = Nokogiri::HTML.parse(body)
then parse the body as you would normally. To remove all that error messages try this:
Capybara.register_driver :webkit do |app|
Capybara::Driver::Webkit.new(app, :stdout => nil)
end
I am not sure how to do it with Open-URI, but if you want to use Watir-Webdriver, the following works.
require 'watir-webdriver'
b = Watir::Browser.new
b.goto('http://www.hollisterco.com/webapp/wcs/stores/servlet/TrackDetail?storeId=10251&catalogId=10201&langId=-1&URL=TrackDetailView&orderNumber=1316358')
puts b.h3(:class, 'order-num').when_present.text
Note that a when_present() is performed on the h3 tag. What this means is that the script will wait for the h3 to appear before trying to get its text. If you know there are parts that take time to load, adding an explicit wait usually solves the problem.
Following #benaneesh's answer I had to make slight modifications to get it to work in my ruby script and not show the unknown url messages...
require 'capybara-webkit'
require 'capybara/dsl'
require 'nokogiri'
require 'open-uri'
include Capybara::DSL
Capybara.current_driver = :webkit
Capybara::Webkit.configure do |config|
config.block_unknown_urls
config.allow_url("*mysite.com")
end
#... rest of code

Ruby: expand shorten urls the hard way

Is there a way to open URLS in ruby and output the re-directed url:
ie convert http://bit.ly/l223ue to http://paper.li/CoyDavidsonCRE/1309121465
I find that there are more url shortener services than gems can keep up with, so I'm asking for the hard -but robust- way, instead of using a gem that connects to some API.
Here is a lengthen method
This has very little error handling but it might help you get started.
You could wrap lengthen with a begin rescue block that returns nil or attempt to retry it later. Not sure what you are trying to build but hope it helps.
require 'uri'
require 'net/http'
def lengthen(url)
uri = URI(url)
Net::HTTP.new(uri.host, uri.port).get(uri.path).header['location']
end
irb(main):008:0> lengthen('http://bit.ly/l223ue')
=> "http://paper.li/CoyDavidsonCRE/1309121465"

Use ruby mechanize to get data from foursquare

I am trying to use ruby and Mechanize to parse data on foursquare's website. Here is my code:
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
page = agent.get('https://foursquare.com')
page = agent.click page.link_with(:text => /Log In/)
form = page.forms[1]
form.F12778070592981DXGWJ = ARGV[0]
form.F1277807059296KSFTWQ = ARGV[1]
page = form.submit form.buttons.first
puts page.body
But then, when I run this code, the following error poped up:
C:/Ruby192/lib/ruby/gems/1.9.1/gems/mechanize-2.0.1/lib/mechanize/form.rb:162:in
`method_missing': undefined method `F12778070592981DXGWJ='
for #<Mechanize::Form:0x2b31f70> (NoMethodError)
from four.rb:10:in `<main>'
I checked and found that these two variables for the form object "F12778070592981DXGWJ" and "F1277807059296KSFTWQ" are changing every time when I try to open foursquare's webpage.
Does any one have the same problem before? your variables change every time you try to open a webpage? How should I solve this problem?
Our project is about parsing the data on foursquare. So I need to be able to login first.
Mechanize is useful for sites which don't expose an API, but Foursquare has an established REST API already. I'd recommend using one of the Ruby libraries, perhaps foursquare2. These libraries abstract away things like authentication, so you just have to register your app and use the provided keys.
Instead of indexing the form fields by their name, just index them by their order. That way you don't have to worry about the name that changes on each request:
form.fields[0].value = ARGV[0]
form.fields[1].value = ARGV[1]
...
However like dwhalen said, using the REST API is probably a much better way. That's why it's there.

Resources