Browser url not returning new url - ruby

I am experimenting with using rspec and watir to do some tdd and have come across a problem I can't seem to get past. I want to have watir click a link (target="_blank") and then get the url of the newly loaded page. Watir clicks the link but when I attempt to get the url I receive the old url not the current. Watir docs seem to indicate that the Browser url method will return the current url. I found a blog post that seems to solve this issue by having Watir execute some javascript to get the current url but this isn't working for me. Is there anyway to get the current url from a link click with Watir?
<!-- the html -->
LinkedIn
#The rspec code
it "should load LinkedIn" do
browser.link(:href => "http://www.linkedin.com").click
browser.url.should == "http://www.linkedin.com"
end

The target will load the link in a new browser window, therefore you need to switch to that window to assert the url:
it "should load LinkedIn" do
browser.link(:href => "http://www.linkedin.com").click
browser.window(:title => /.*LinkedIn.*/).use do
browser.url.should == "http://www.linkedin.com"
end
end
See: http://watirwebdriver.com/browser-popups/ for more examples

Related

Mechanize 'link_with' producing a different URL

I accessed a page that has this link:
<a class="portletpage-portlet-title is-active" tabindex="0" title="Registration" data-ppid="registration_WAR_registration" href="#registration">Registration</a>
The page is encrypted with SSL. The HTML attribute href is #registration. I am trying to follow this link get to the URL:
www.redacted.com/#registration
Here is my code:
agent.get('*redacted*'). do |page|
page.form_with(:action => '*redacted*') do |f|
f.field_with(:id => 'username').value = get_username()
f.field_with(:id => 'password').value = get_password()
end.click_button
agent.page.link_with(:text => 'Registration').click
When it clicks on the link, it produces the following error:
`fetch': 404 => Net::HTTPNotFound for https://*redacted*/group/1403104853945/academics?p_p_id=registration_WAR_uofsregistration&p_p_state=maximized -- unhandled response (Mechanize::ResponseCodeError)
from /home/mike/.rvm/gems/ruby-2.4.1/gems/mechanize-2.7.5/lib/mechanize.rb:464:in `get'
from /home/mike/.rvm/gems/ruby-2.4.1/gems/mechanize-2.7.5/lib/mechanize.rb:348:in `click'
from /home/mike/.rvm/gems/ruby-2.4.1/gems/mechanize-2.7.5/lib/mechanize/page/link.rb:30:in `click'
from u-of-s-scraper.rb:34:in `<main>'
and comes up with the URL:
www.redacted.com/group/1403104853945/academics?p_p_id=registration_WAR_uofsregistration&p_p_state=maximized
I'm not sure where Mechanize is getting the URL. The link has an attribute data-ppid, which appears to be contributing to the URL. Can anyone provide some insight?
It turns out that the page is written using Liferay's Portlets. Unfortunately, Portlets are not directly URL accessible, so I am currently investigating a different means of scraping the page - potentially with Selenium or PhantomJS.
data-ppid is a data attribute, which is supposed to be handled by JavaScript. The change of the URL is probably due to some Javascript code on the client side (and a redirect on the server side).
Links that start with # are "named links" or "bookmark links" - they don't go anywhere, just jump you somewhere on the page.
In other words, there's no reason to ever "follow" a link like that with mechanize.

Mechanize suddenly can't login anymore

I have been trying to login to my airbnb account to scrape all my reservations. I was able to login and get the status of the reservation to appear with the following code but all of a sudden, it seems like I have been blocked out of Airbnb. The script would usually return all of the reservations' status nicely lined up and after calling the script for about 6 hours straight trying to figure out how to get this to work I suddenly couldn't get it to return the /my_reservations page anymore.
When I called
puts page.body
it kept returning the /log_in page
I can still access my account manually through the browser though. Does anyone know anything about this? Any advice is greatly appreciated.
require 'mechanize'
require 'nokogiri'
agent = Mechanize.new {|a| a.user_agent_alias = "Mac Firefox"} page = agent.get('http://www.airbnb.com/my_reservations')
form = agent.page.form_with(action: "/authenticate") form.field_with(name: "email").value = "my_email#provider.com" form.field_with(name: "password").value = "my_password" form.submit
puts page.search('.label').text

Webdriver ruby, URL with same path not working in two of seven sites

I have used the same method to logout a logged user to the site. And that method is going to the same URL that one "logout" button opens when you click it. I used that method because the button is inside a dropdown and is easier to open the "#driver.get ENV['base_url'] + "logout"" method.
The thing is that this method is working in 5 out of 7 sites, and more strange is that if i copy+paste the logout url manually, the behaviour is the spected one, and the user logoffs, but the same action via webdriver is not working in some sites that are identical between them.
ENV['base_url'] = 'http://lucyvideo.com.co/'
#driver.get ENV['base_url']
(I login the user)
#driver.get ENV['base_url'] + "logout"
I've solved it... seems that webdriver didn't like the URLs without "www", I don't know why I tried to add that and it worked OK.
So, instead of "http://lucyvideo.com.co/"
I tried: "http://www.lucyvideo.com.co/"
And is working fine.

Screenshot of the URL section of the browser

I want to capture screenshot of the browser URL section.
browser.screenshot.save ('tdbank.png')
It will save the entire page of internal part of the browser, but I want to capture the URL header part of the browser. Any suggestion?
Sometime, URL is saying http or https. I want to capture this in screenshot and archive it. I know I could get it through,
url = browser.url
then do some comparison. I need this for legal purpose and it should be done by taking a screenshot.
thanks in advance.
If you're on windows, you could use the win32screenshot gem. For example:
require 'watir-webdriver'
require 'win32/screenshot'
b = Watir::Browser.new # using firefox as default browser
b.goto('http://www.example.org')
Win32::Screenshot::Take.of(:window, :title => /Firefox/).write("image.bmp")
b.close

Clicking link with JavaScript in Mechanize

I have this:
<a class="top_level_active" href="javascript:Submit('menu_home')">Account Summary</a>
I want to click that link but I get an error when using link_to.
I've tried:
bot.click(page.link_with(:href => /menu_home/))
bot.click(page.link_with(:class => 'top_level_active'))
bot.click(page.link_with(:href => /Account Summary/))
The error I get is:
NoMethodError: undefined method `[]' for nil:NilClass
That's a javascript link. Mechanize will not be able to click it, since it does not evaluate javascript. Sorry!
Try to find out what happens in your browser when you click that link. Does it create a POST or GET request? What are the parameters that are sent to the server. Once you know that, you can emulate the same action in your Mechanize script. Chrome dev tools / Firebug will help out.
If that doesn't work, try switching to a library that supports javascript evaluation. I've used watir-webdriver to great success, but you could also try out phantomjs, casperjs, pjscrape, or other tools
The first 2 should have worked so try this, print out the hrefs to make sure it's really there:
puts page.links.map(&:href)
Remember that just because you can see it in your browser does not mean it appears in the response. It could have been sent as an ajax update.
Also you can just do this which I think is cleaner:
page.link_with(:href => /menu_home/).click
However I don't think clicking that link will do what you want since it's javascript.
Here's a way to handle it. Assume your page returns this content:
puts page.body
<HTML><SCRIPT LANGUAGE="JavaScript"><!--
top.location="http://www.example.com/pages/myaccount/dashboard.aspx?";
// --></SCRIPT>
<NOSCRIPT>Javascript required.</NOSCRIPT></HTML>
We know it's coming so we know what to check for:
link_search = %r{top.location="([^"]+)"}
js_link = page.body.match(link_search)[1]
page = agent.get(js_link)

Resources