watir-webdriver change proxy while keeping browser open - ruby

I am using the Watir-Webdriver library in Ruby to check some pages. I know I can connect through a proxy using
profile = Selenium::WebDriver::Firefox::Profile.new#create a new profile
profile.proxy = Selenium::WebDriver::Proxy.new(#create proxy data for in the profile
:http => proxyadress,
:ftp => nil,
:ssl => nil,
:no_proxy => nil
)
browser = Watir::Browser.new :firefox, :profile => profile#create a browser window with this profile
browser.goto "http://www.example.com"
browser.close
However, when wanting to connect to the same page multiple times using different proxies, I have to create a new browser for every proxy. Loading(and unloading) the browser takes quite some time.
So, my question: Is there any way to change, using webdriver in ruby, the proxy adress Firefox uses to connect through while keeping the browser open?

If you want to test whether a page is blocked when accessed through a proxy server, you can do that through a headless library. I recently had success using mechanize. You can probably use net/http as well.
I am still not sure why you need to change the proxy server for a current session.
require 'Mechanize'
session = Mechanize.new
session.set_proxy(host, port, user, pass)
session.user_agent='Mac Safari'
session.agent.robots = true #observe of robots.txt rules
response = session.get(url)
puts response.code
You need to supply the proxy host/port/user/pass (user/pass are optional), and the url. If you get an exception, then the response.code is probably not friendly.

You may need to use an OS level automation tool to automate going through the FF menus to change the setting as a user would.
For windows users there is the option of either the new RAutomation tool, or AutoIT. both can be used to automate things at the OS UI level, which would let you go into the browser settings and change the proxy there.
Still I'd think if you are checking a larger number of sites that the overhead to change the proxy settings would not be that much compared to all of the site navigation and waiting for pages to load etc.
Unless you are currently taking a 'row traverse' approach and changing proxy settings multiple times for each site you are checking? If that's the case I would go towards more of a by-column method (if we were to presume each column is a proxy, and each row is a site) and fire up the browser for one proxy, check all the sites, then change the proxy and re-check all the sites. That way you'd only be changing the proxy settings once for each proxy which should not add that much overhead to your script.
It might mean a little more work with storing and then reporting results at the end (if you had been writing them out a line at a time) but that's what hashes or arrays are for.

Related

Blacklist URLs with headless Chrome

I'm trying to block URLs in my specs, achieving something like I had when using capybara_webkit:
Capybara::Webkit.configure do |config|
config.block_url("*google*")
config.allow_url('*my_website.com')
end
After reading this article, I tried to do something like:
require 'webmock/rspec'
module WebmockConfig
def self.default_disabled_urls
[
'*google*'
]
end
end
WebMock.disable_net_connect!(allow_localhost: true)
WebMock.disable_net_connect!(allow: WebmockConfig.default_disabled_urls)
but I'm getting
Real HTTP connections are disabled. Unregistered request: POST http://127.0.0.1/session
even if that should be solved by WebMock.disable_net_connect!(allow_localhost: true).
When running the specs without WebMock.disable_net_connect!(allow: WebmockConfig.default_disabled_urls), everything is working fine.
The capybara-webkit white/blacklisting affects the requests made by the browser, whereas WebMock can only affect requests made by your app. This means WebMock is useless for what you want since it wouldn't actually stop your browser from loading anything from google, etc. To do that while using the selenium driver you need to use a programmable proxy like puffing-billy which will allow you to customize the responses for any matching requests the browser makes.
To configure a driver using headless chrome and puffing_billy you could do something like
Capybara.register_driver :headless_chrome do |app|
browser_options = ::Selenium::WebDriver::Chrome::Options.new
browser_options.headless!
browser_options.add_argument("--proxy-server=#{Billy.proxy.host}:#{Billy.proxy.port}")
Capybara::Selenium::Driver.new(app, browser: :chrome, options: browser_options)
end
Whether or not you need any other options is dependent on your system config, etc but you should be able to tell by looking at your current driver registration.
The allow_localhost: true settings are overwritten by allow: WebmockConfig.default_disabled_urls you have to call WebMock.disable_net_connect! once with both settings or by adding 'localhost', '127.0.0.1' entries into self.default_disabled_urls

capybara - :9443 appears in the URL that I am visiting

I would like to visit a URL like https://latest.www.abc.com/def
However when I run it becomes https://latest.www.abc.com:9443/def
How can I omit the :9443 and be able to visit https://latest.www.abc.com/def exactly?
Thanks!!
If you're only visiting external sites (not testing a local app) then set Capybara.run_server = false, which will stop Capybara from starting a server and trying to insert the port of that server into URLs.
If you are testing a local app and also need to visit external sites then make sure you haven't set Capybara.always_include_port to true (it defaults to false) OR explicitly specify the desired port in the visit command
visit('https://latest.www.abc.com:443/def')

Simplest method of enforcing HTTPS for Heroku Ruby Sinatra app

I have an app I created on Heroku which is written in Ruby (not rails) and Sinatra.
It is hosted on the default herokuapp domain so I can address the app with both HTTP and HTTPS.
The app requests user credentials which I forward on to an HTTPS call so the forwarding part is secure.
I want to ensure my users always connect securely to my app so the credentials aren't passed in clear text.
Despite lots of research, I've not found a solution to this simple requirement.
Is there a simple solution without changing my app to Ruby rails or otherwise?
Thanks,
Alan
I use a helper that looks like this:
def https_required!
if settings.production? && request.scheme == 'http'
headers['Location'] = request.url.sub('http', 'https')
halt 301, "https required\n"
end
end
I can then add it to any single route I want to force to https, or use it in the before filter to force on a set of urls:
before "/admin/*" do
https_required!
end
Redirect in a Before Filter
This is untested, but it should work. If not, or if it needs additional refinement, it should at least give you a reasonable starting point.
before do
redirect request.url.sub('http', 'https') unless request.secure?
end
See Also
Filters
Request Object
RackSsl::Enforcer

Watir-webdriver doesnt store all cookies

When I goto the following link on firefox (V-12), the browser on my Ubuntu machine allows me to login normally.
https://r.espn.go.com/members/v3_1/login?language=en&forwardUrl=&appRedirect=http%3A%2F%2Fgames.espn.go.com
However, if I use watir-webdriver, I get the message: "Cookies must be enabled in order to login."
Here is the code to reproduce this issue with Watir:
require 'watir-webdriver'
browser = Watir::browser.new
browser.goto "https://r.espn.go.com/members/v3_1/login?language=en&forwardUrl=&appRedirect=http%3A%2F%2Fgames.espn.go.com"
You will notice that the browser displays the "Cookies must be enabled" error message below the "email address or member name" field. When I looked at the cookies stored, I noticed that not all cookies that were stored in the normal mode are available. I compared this by searching for "go.com" in the stored cookies.
Any idea what would cause the discrepancy in cookies stored between the two modes, using the same browser?
Thanks!
There is no problem or discrepancy with watir-webdriver. What is happening here is a result of how the website is coded.
The page you are accessing (https://r.espn.go.com/members/v3_1/login?language=en&forwardUrl=&appRedirect=http%3A%2F%2Fgames.espn.go.com) is intended to be an overlay on http://espn.go.com. Whoever coded the site assumed that the overlay page would always be accessed after a hit to the main page. So, the main page (http://espn.go.com) sets a cookie in order to test whether your user agent has cookies enabled. The overlay page with the sign in form then checks to see if the test cookie is present and, if not, displays the warning you are seeing.
What is important to understand is that watir-webdriver defaults to a clean profile for each new browser instance. This means that the browser does not have any of your cookies, extensions, preferences or browsing history. Because the clean profile has never visited http://espn.go.com to receive the test cookie, the warning is being displayed.
There are two ways to avoid this warning:
You can visit the main page prior to the sign-in page, like so:
require 'watir-webdriver'
browser = Watir::Browser.new
browser.goto "espn.go.com"
browser.goto "https://r.espn.go.com/members/v3_1/login?language=en&forwardUrl=&appRedirect=http%3A%2F%2Fgames.espn.go.com"
Or, you can use your default Firefox profile, which (presumably) already has the test cookie:
require 'watir-webdriver'
browser = Watir::Browser.new :firefox, :profile => "default"
browser.goto "https://r.espn.go.com/members/v3_1/login?language=en&forwardUrl=&appRedirect=http%3A%2F%2Fgames.espn.go.com"
Hope that helps!

ruby mechanize in Facebook

I'm trying to click the Settings button on the home page, but when I do I get this page back:
#<WWW::Mechanize::Page
{url
#<URI::HTTP:0x1023c5fc0 URL:http://www.facebook.com/editaccount.php?ref=mb&drop>}
{meta}
{title nil}
{iframes}
{frames}
{links}
{forms}>
which is.. kinda empty! Is there some problems with these iframes and frames stuff maybe?
As roja mentioned, following redirects might be what you need. Here's an example of how to do this:
#agent = Mechanize.new
#agent.redirect_ok = :all
#agent.follow_meta_refresh = :anywhere
Then you can pretty much ignore the fact that there's redirects involved - Mechanize will simply put you on the resulting page.
Facebook redirects me to: https://register.facebook.com/editaccount.php which I assume is the final destination. Assuming that WWW::Mechanize is set up to follow https redirects you should end up there too.
Much of facebook like most modern websites is generated by javascript which I think that WWW::Mechanize is unable to cope with, this could be the source of your problem. I recommend trying to scrape while appending "?_fb_noscript=1" to the url's you visit. This turns off much of facebooks javascript system and should enable a smoother ride for your little bot.
(Do remember this is only an idea and doubtless whatever you do is against facebooks usage policy and this makes you a "baddy." I don't condone such badness and beleve that baddies should be forced to go to bed early etc... ad nauseum)

Resources