Do not wait for page to finish loading in selenium - ruby

As the title states, I'm trying to create a script which opens multiple tabs in a browser. At the moment the script seems to wait until each page has finished loading before moving on to a new tab. Is there a way to move on without waiting for the page to load. It seems to be hard to find relevant information online.
#!/usr/bin/env ruby
require 'selenium-webdriver'
file = File.open(ARGV[0], 'r')
driver = Selenium::WebDriver.for :firefox
file.each do |host|
driver.get(host)
driver.execute_script( "window.open()" )
driver.switch_to.window( driver.window_handles.last )
end

Related

How to extract JS rendered HTML using Selenium-webdriver and nokogiri?

Consider two webpages one and two. Site number two is easy to scrape using nokogiri because it doesn't use JS. Site number one however cannot be scraped using just nokogiri. I googled and searched far and wide and found that if I loaded the page with an automated web browser I could scrape the the rendered HTML. I have the following code right below:
# creates an instance
driver = Selenium::WebDriver.for :chrome
# opens an existing webpage
driver.get 'http://www.bigstub.com/search.aspx'
# wait is used to let the webpage load up and let the JS render
wait = Selenium::WebDriver::Wait.new(:timeout => 5)
My question is that I am trying to let the page load up an close immediately once I get my desired class. An example is that if I adjust the time out to 10 seconds until I can find the class .title-holder how would I write this code?
Pusedo code:
rendered_source_page will time out if .include?("title-holder"). I just don't know how to write it.
UPDATE:
In regards to the headless question, selenium has an option or configuration in where you can add in a headless option. This is done by the code below:
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless')
driver = Selenium::WebDriver.for :chrome, options: options
For my next question in order for the site to fully scrape the JS rendered HTML I set my timeout variable to 5 seconds:
wait = Selenium::WebDriver::Wait.new(:timeout => 5)
wait.until { /title-holder/.match(driver.page_source) }
wait.until pretty much means wait 5 seconds until I find a title-holder class inside of the page_source or rendered HTML. This pretty much solved all my questions.
I am assuming you are running selenium on a server. So first install Xvfb
sudo apt-get install xvfb
Install firefox
sudo apt-get install firefox
Add the following two gems to your gemfile. You will need headless because you want to run the selenium webdriver on your server. Headless will start and stop Xvfb for you.
#gemfile
gem 'selenium-webdriver'
gem 'headless'
Code for scraping
headless = Headless.new
headless.start
driver = Selenium::WebDriver.for :firefox
driver.navigate.to example.com
wait = Selenium::WebDriver::Wait.new(:timeout => 30)
#scraping code comes here
Housekeeping so that you don't run out of memory.
driver.quit
headless.destroy
Hope this helps.
In regards to the headless question, selenium has an option or configuration in where you can add in a headless option. This is done by the code below:
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless')
driver = Selenium::WebDriver.for :chrome, options: options
For my next question in order for the site to fully scrape the JS rendered HTML I set my timeout variable to 5 seconds:
wait = Selenium::WebDriver::Wait.new(:timeout => 5)
wait.until { /title-holder/.match(driver.page_source) }
wait.until pretty much means wait 5 seconds until I find a title-holder class inside of the page_source or rendered HTML. This pretty much solved all my questions.

How can I save a web page in Watir?

Using Ruby and Watir, can I save a web page the same way as doing a right mouse-click and " save page with name "?
I need to save the current web page from a script.
Yes, you can do it with watir. Just open a page and save the browser.html to any destination you want:
b = Watir::Browser.new :phantomjs # I am using phantomjs for scripted browsing
b.goto 'http://google.com'
File.open('/tmp/google', 'w') {|f| f.write b.html }
I don't know about watir, but I know the way to do it using Selenium Web Driver is using the page source method.
Check out the docs for that here:
http://selenium.googlecode.com/git/docs/api/rb/Selenium/WebDriver/Driver.html#page_source-instance_method
using this, you should get the whole source.
You can now save the source, by just creating a new file. I haven't tried this but you can check it out.
driver = Selenium::WebDriver.for(:firefox)
driver.get(URL_of_page_to_save)
file = File.new(filename, "w")
file.puts(driver.page_source)
file.close
Not sure if this saves all elements of the page.
Hope this helped a bit!

How to use same browser window for automated test using selenium-webdriver (ruby)?

I am automating test cases for a website using selenium-webdriver and cucumber in ruby. I need each feature to run in a particular order and using the same browser window. Atm each feature creates a new window to run test in. Though in some test cases this behavior is desired- in many cases it is not. From my research so far it seems there are mixed answers about whether or not it is possible to drive the same browser window with selenium throughout test cases. Most answers I have run into were for other languages and were work arounds specific to a browser (I am developing my test while testing IE but will be expected to run these test in other browsers). I am working in Ruby and from what I have read it seems as though I'd have to make a class for the page? I'm confused as to why I would have to do this or how that helps.
my env.rb file:
require 'selenium-webdriver'
require 'rubygems'
require 'nokogiri'
require 'rspec/expectations'
Before do
#driver ||= Selenium::WebDriver.for :ie
#accept_next_alert = true
#driver.manage.timeouts.implicit_wait = 30
#driver.manage.timeouts.script_timeout = 30
#verification_errors = []
end
After do
##driver.quit
##verification_errors.should == []
end
Some information I've gathered so far of people with similar problems:
https://code.google.com/p/selenium/issues/detail?id=18
Is there any way to attach an already running browser to selenium webdriver in java?
Please ask me questions if anything about my question is not clear. I have many more test to create but I do not want to move on creating test if my foundation is sloppy and missing requested capabilities. (If you notice any other issues within my code please point them out in a comment)
The Before hook is run before each scenario. This is why a new browser is opened each time.
Do the following instead (in the env.rb):
require "selenium-webdriver"
driver = Selenium::WebDriver.for :ie
accept_next_alert = true
driver.manage.timeouts.implicit_wait = 30
driver.manage.timeouts.script_timeout = 30
verification_errors = []
Before do
#driver = driver
end
at_exit do
driver.close
end
In this case, a browser will be opened at the start (before any tests). Then each test will grab that browser and continue using it.
Note: While it is usually okay to re-use the browser across tests. You should be careful about tests that need to be run in a specific order (ie become dependent). Dependent tests can be hard to debug and maintain.
I had a similar problem in creating a spec_helper file. I did the following (simplified for locally-run firefox) for my purposes and it works very, very reliably. RSpec will use the same browser window for all it blocks in your _spec.rb file.
Rspec.configure do |config|
config.before(:all) do
#driver = Selenium::WebDriver.for :firefox
end
config.after(:all) do
#driver.quit
end
end
If you switch to :each instead of :all, you can use a separate browser instance for each assertion block... again, with :each RSpec will give a new browser instance for each it. Both are useful depending on the circumstance.
As the answers solve the problem but do not answer the question "How to connect to an existing session".
I managed to do this with the following code since it is not officially supported.
# monkey-patch 2 methods
module Selenium
module WebDriver
class Driver
# Be able to set the driver
def set_bridge_to(b)
#bridge = b
end
# bridge is a private method, simply create a public version
def public_bridge
#bridge
end
end
end
end
caps = Selenium::WebDriver::Remote::Capabilities.send("chrome")
driver = Selenium::WebDriver.for(
:remote,
url: "http://chrome:4444/wd/hub",
desired_capabilities: caps
)
used_bridge = driver.bridge
driver.get('https://www.google.com')
# opens a new unused chrome window
driver2 = Selenium::WebDriver.for(
:remote,
url: "http://chrome:4444/wd/hub",
desired_capabilities: caps
)
driver2.close() # close unused chrome window
driver2.set_bridge_to(used_bridge)
driver2.title # => "Google"
Sadly this did not test work between 2 rescue jobs, will update this in the future when I made it work for my own use case.

Open a new window with Ruby

I want to open a new window using the openWindow() method that I can see in the rdoc, but whenever I attempt to run my code, I am told that the method does not exist.
require 'rubygems'
require 'selenium-webdriver'
$browser = Selenium::WebDriver.for :firefox #I've tried chrome too to the same effect
$browser.navigate.to("http://google.com")
$browser.openWindow("http://cnet.com","ASDF") #This doesn't work.
$browser.open_window("http://cnet.com","ASDF") #This doesn't work either.
It would be greatly appreciated if someone could set the record straight on how to use this.
As detailed in this article, the correct way to use the API is:
#driver.get 'http://the-internet.herokuapp.com/windows'
main_window = #driver.window_handle
#driver.find_element(css: '.example a').click
windows = #driver.window_handles
windows.each do |window|
if main_window != window
#new_window = window
end
end
#driver.switch_to.window(main_window)
#driver.title.should_not =~ /New Window/
#driver.switch_to.window(#new_window)
#driver.title.should =~ /New Window/
Which will have the following behavior:
Load the page
Get the window handle for the current window
Take an action that opens a new window
Get the window handle for the new window
Switch between the windows as needed
I am not sure whether you could use
openWindow method but
To open a new window you will have to open a new instance of your firefox browser again
so ,try doing something like
$browser = Selenium::WebDriver.for :firefox
$browser.navigate.to("http://google.com")
$browser_new = Selenium::WebDriver.for :firefox
$browser_new.goto("http://cnet.com")
I don't know Selenium, but according to your own question the name of the method is open_window not openWindow.

How to avoid to launch firefox gui during a scraping of web page with javascript

I am trying to scrape a web page with a lot of javascript. with the help of pguardiano i have this piece of code in ruby.
require 'rubygems'
require 'watir-webdriver'
require 'csv'
#browser = Watir::Browser.new
#browser.goto 'http://www.oddsportal.com/matches/soccer/'
CSV.open('out.csv', 'w') do |out|
#browser.trs(:class => /deactivate/).each do |tr|
out << tr.tds.map(&:text)
end
end
The scraping is done recursively in background with a sleep time of 1 hour approximatively. I have no experience of ruby and in particular of web scraping, so i have a couple of questions.
How can i avoid that every time a new firefox session is opened with a lot of cpu and ram consumption?
Is it possible to use a firefox engine without using his GUI?
You can try a headless option.
require 'watir-webdriver'
require 'headless'
headless = Headless.new
headless.start
b = Watir::Browser.start 'www.google.com'
puts b.title
b.close
headless.destroy
An alternative is to use the selenium server. A third alternative is to use a scraper like Kapow.

Resources