How do I write a ruby web crawler which uses chrome? - ruby

I have a ruby web crawler that is currently coded to run in firefox. How do I switch it over to Chrome instead?
def open_browser()
tweaked_profile = Selenium::WebDriver::Firefox::Profile.new
tweaked_profile['nglayout.initialpaint.delay'] = 0
tweaked_profile.assume_untrusted_certificate_issuer=false
tweaked_profile['permissions.default.image'] = 2
tweaked_profile['network.proxy.type'] = 1
tweaked_profile['network.proxy.http'] = 'ec2proxy.csnzoo.com'
tweaked_profile['network.proxy.http_port'] = 8080
driver = Selenium::WebDriver.for :firefox, :profile => tweaked_profile
$browser = Watir::Browser.new(driver)
end
Should I just ditch watir and go with chromedriver or will watir work for this?

Check out http://watirwebdriver.com/chrome/, which has this example:
profile = Selenium::WebDriver::Chrome::Profile.new
...
b = Watir::Browser.new :chrome, :profile => profile
Also, these SO questions provide alternatives for crawling sites: Web crawler in ruby and What are some good Ruby-based web crawlers?

Related

How to work with a proxy in Firefox with Selenium WebDriver

I have code with the Chrome browser and this works, but I never worked with Firefox, but an Ubuntu server normally works only with Firefox, and now I have a question: How can I work with a proxy on the Firefox browser using the proxy_chain_rb gem?
I think my code for the Chrome browser will work in Firefox if you tell me how I can make Firefox options. My problem - I don’t know how I can use Firefox options and manuals are old. How can I replace my code for Google in Firefox?
Code
require 'watir'
require 'proxy_chain_rb'
require 'selenium-webdriver'
time2 = Time.now
file = File.new("report.json", "a:UTF-8")
myuseragent = File.readlines("user_agents.txt").sample
options = Selenium::WebDriver::Chrome::Options.new
options.add_emulation(user_agent: (myuseragent))
options.add_argument('--headless')
puts "Work started: " + time2.inspect
u_proxy = File.readlines("proxy.txt").sample
real_proxy = u_proxy
server = ProxyChainRb::Server.new
generated_proxy = server.start(real_proxy)
proxy = {
http: generated_proxy,
ssl: generated_proxy
}
caps = Selenium::WebDriver::Remote::Capabilities.chrome(:proxy => proxy)
driver = Selenium::WebDriver.for :chrome, :desired_capabilities => caps, options: options
driver.execute_script('return navigator.userAgent')
driver.get "https://raskruty.ru/"

Selenium Webdriver - set preferred browser language DE

I have a problem setting the preferred (accepted language) within headless Chrome using Selenium Webdriver and Ruby. I use the following WebDriver settings:
Selenium::WebDriver::Chrome.driver_path = #config[<path to the Chrome Driver>]
options = Selenium::WebDriver::Chrome::Options.new
options.add_argument('--headless')
options.add_argument('--disable-translate')
options.add_argument("--lang=de")
The driver is then initialized with:
#selenium_driver = Selenium::WebDriver.for :chrome, options: options
Everything works fine but at some pages Chrome returns English content even when I navigate to the German page URL (e.g. page.de). In these cases the Chrome driver returns the English content due to an internal forwarding to page.de/en. I do not specify the en path in my queried URL.
I have tried to set the language using the Webdriver preference:
options.add_preference('accept_languages', 'de')
instead of the add_argument but it doesn't change anything of the behavior.
Does anyone have an idea how to force a headless Chrome controlled by Selenium Webdriver within Ruby to request page content in a defined language or - not optimal but it might help as a workaround - to stop the forwarding?
Any help greatly appreciated
Best
Krid
I found a solution that works for me. As in many cases the problem was sitting in front of the screen and simply doesn't work precisely enough ;-)
Instead of using
options.add_argument("--lang=de")
you have to use
options.add_argument("--lang=de-DE")
When I use an IETF language tag the code I initially posted works as expected.
I'am using this in my test_helper.rb Works fine for me.
Capybara.register_driver :selenium do |app|
Chromedriver.set_version "2.36"
desired_capabilities = Selenium::WebDriver::Remote::Capabilities.chrome(
'chromeOptions' => {
'prefs' => {
'intl.accept_languages' => 'en-US'
},
args: ['disable-gpu', 'headless']
}
)
Capybara::Selenium::Driver.new(app, { browser: :chrome, desired_capabilities: desired_capabilities })
end
Capybara.javascript_driver = :chrome
Capybara.default_driver = :selenium
This prefs hash inside an options hash did the trick for me. It's at the end of the driven_by :selenium line.
(Inside test/application_syste_test_case.rb)
# frozen_string_literal: true
require 'test_helper'
require 'capybara/rails'
class ApplicationSystemTestCase < ActionDispatch::SystemTestCase
driven_by :selenium, using: :chrome, screen_size: [1400, 1400], options: { prefs: { 'intl.accept_languages' => 'de,de-DE;q=0.9,en;q=0.1' } }
# ...
2021-06-14 UPDATE:
The previous example produces this deprecation warning:
WARN Selenium [DEPRECATION] :prefs is deprecated. Use Selenium::WebDriver::Chrome::Options#add_preference instead.
IMO, the solution below is uglier, but I'm posting it for when it's fully deprecated and the original stops working.
class ApplicationSystemTestCase < ActionDispatch::SystemTestCase
driven_by(:selenium,
using: :chrome,
screen_size: [1400, 1400],
options: {
options: Selenium::WebDriver::Chrome::Options.new(
prefs: { 'intl.accept_languages' => 'de,de-DE;q=0.9,en;q=0.1' }
)
},
)
You should be able to solve your problem by adding an experimental option:
options.add_option('prefs', {'intl.accept_languages': 'en,en_US'})
I'm sure it works with Python, but I've not tried with Ruby: this approach is the correct one, not sure about the implementation.
You can find in this repository the code which handles your problem in Python code, and in this Q&A how to implement experimental_options in Ruby
For me works:
options = Selenium::WebDriver::Firefox::Options.new
options.add_preference("intl.accept_languages", 'de-DE')
Capybara::Selenium::Driver.new(app, browser: :firefox, options: options)

Open a Selenium browser with my cookies

I'm trying to create an automated script that goes to a website (Yik Yak) and submits stuff. It needs to access my cookies to know that I logged in before. It requires entering a key from my phone, and I can't automate that.
require 'selenium-webdriver'
profileDir = File.absolute_path("/home/carson/.mozilla/firefox/237ie3yd.default")
profile = Selenium::WebDriver::Firefox::Profile.from_name profileDir
driver = Selenium::WebDriver.for :firefox, :profile => profile
driver.navigate.to "https://www.yikyak.com/nearby/new"
wait = Selenium::WebDriver::Wait.new(:timeout => 10)
element = driver.find_element(:class, 'form-control')
element.send_keys "Tessttt"
element.submit
It runs and opens Firefox, but it stops at the page where I have to enter the key my phone gets.
Any help?
default_profile = Selenium::WebDriver::Firefox::Profile.from_name "default"
default_profile.native_events = true
driver = Selenium::WebDriver.for(:firefox, :profile => default_profile)
Via Ruby Bindings

How to run multiple firefox browser parallely having different proxy in Watir

The code is given Below , where it is launching three firefox browser
, all browser has different proxy settings. Using watir how launch all three browser same time using tread in watir???
require 'selenium-webdriver'
require 'rubygems'
require 'watir'
require 'rautomation'
require './CLReport.class'
require 'win32ole'
# TO INITIATE FIRST FIREFOX BROWSER
# THE PROXY DATA CAN BE parameterized from Excel sheet
profile = Selenium::WebDriver::Firefox::Profile.new
profile.proxy = Selenium::WebDriver::Proxy.new :http => 'myproxy.com:8080', :ssl => 'myproxy.com:8080'
$b1 = Watir::Browser.new :firefox, :profile => profile
$b1.goto("https://google.com")
# TO INITIATE SECOND FIREFOX BROWSER
# THE PROXY DATA CAN BE parameterized from Excel sheet
profile = Selenium::WebDriver::Firefox::Profile.new
profile.proxy = Selenium::WebDriver::Proxy.new :http => 'myproxy.com:8081', :ssl => 'myproxy.com:8081'
$b2 = Watir::Browser.new :firefox, :profile => profile
$b2.goto("https://google.com")
# TO INITIATE THORD FIREFOX BROWSER
# THE PROXY DATA CAN BE parameterized from Excel sheet
profile = Selenium::WebDriver::Firefox::Profile.new
profile.proxy = Selenium::WebDriver::Proxy.new :http => 'myproxy.com:8082', :ssl => 'myproxy.com:8082'
$b3 = Watir::Browser.new :firefox, :profile => profile
$b3.goto("https://google.com")
Now my question is how to join $b1,$b2,$b3 in a single browser using thread so that
only $browser.link(:text, "form application")click should work for all three browser parallely insted of writing
$b1.link(:text, "form application").click
$b2.link(:text, "form application").click
$b3.link(:text, "form application").click
i.e single line of code work work in three firefox browser same time parallely.
It is not possible because $b1,$b2,$b3 are instances of different browser,You can not make them equal.What are you doing is right. Or You can do some thing like that.
array = [$b1,$b2,$b3]
array.each { |browser|
browser.link(:text, "form application").click
}

Firefox 4 with watir webdriver: Need help using helperApps.neverAsk to save CSV without prompting

I learned how to use Firefox 4 with watir and webdriver (on Win7 x64), setting profile items. Example:
profile = Selenium::WebDriver::Firefox::Profile.new
profile["browser.download.useDownloadDir"] = true
profile["browser.download.dir"] = 'D:\\FirefoxDownloads'
profile["browser.helperApps.neverAsk.saveToDisk"] = "application/csv"
driver = Selenium::WebDriver.for :firefox, :profile => profile
browser = Watir::Browser.new(driver)
What I try to do with the example below, is setting CSV files to be always downloaded to a specific directory, never opened.
The code above succeeds in setting all the files automatically downloaded to the specified directory, but setting browser.helperApps.neverAsk.saveToDisk has no effect: I still get the open/save question.
After the script runs, the Firefox window is still open, and I enter the URL about:config.
I can see that browser.helperApps.neverAsk.saveToDisk was correctly set to application.csv , but in firefox/options/options/applications I don't see the entry for CSV files.
It seems that the menu setting, that is really effective, is not really bound with the about:config setting.
What am I doing wrong?
I've done some testing of this for you, unfortunately there doesn't seem to be a standard content-type for CSV files. You can try passing a comma separated list of content-types, hopefully one of those work for you. For me it was application/octet-stream that did the trick...
require 'watir-webdriver'
require 'selenium-webdriver'
profile = Selenium::WebDriver::Firefox::Profile.new
profile["browser.download.useDownloadDir"] = true
profile["browser.download.dir"] = '/tmp'
profile["browser.helperApps.neverAsk.saveToDisk"] = "text/plain, application/vnd.ms-excel, text/csv, text/comma-separated-values, application/octet-stream"
driver = Selenium::WebDriver.for :firefox, :profile => profile
browser = Watir::Browser.new(driver)
browser.goto "http://altentee.com/test/test.csv"
In Firefox 6+, I couldn't get this to work without specifically setting the 'browser.download.folderList' value:
profile = Selenium::WebDriver::Firefox::Profile.new
profile['browser.download.folderList'] = 2 #custom location
profile['browser.download.dir'] = download_directory
profile['browser.helperApps.neverAsk.saveToDisk'] = "text/csv, application/csv"
b = Watir::Browser.new :firefox, :profile => profile
See: http://watirwebdriver.com/browser-downloads/

Resources