Nokogiri is not loading "browser" from Watir - ruby

I have this so far:
require 'watir-webdriver'
require 'date'
require 'nokogiri'
browser = Watir::Browser.start 'https://example/ViewReport.aspx'
browser.link(:text, 'Combined Employee Performance Report').click
today = Date.today
yesterday = today.prev_day.strftime('%m' '%d' '%Y')
t = browser.text_field :id => 'UC255_txtStart'
t.set yesterday
t = browser.text_field :id => 'UC255_txtEnd'
t.set yesterday
btn = browser.button :value, 'Run Report'
btn.exists?
btn.click
page = Nokogiri::HTML.parse('browser')
links = page.css("a")
puts links.length
When I try to parse browser, the variable that Watir is using for the site URI, it gives me a blank HTML page.

Problem
When you do
page = Nokogiri::HTML.parse('browser')
You are asking Nokogiri to parse the string 'browser'.
Solution
What you actually want to parse is the html in the browser.
To get the browser's html, you do:
browser.html
So to parse it, you would do:
page = Nokogiri::HTML.parse(browser.html)

Related

Open filled-form page in ruby

I'm using mechanize to fill out a form, but I want to review it on the webpage before submission. The goal is to open a browser with the pre-filled form.
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('https://www.linuxtoday.com/contribute.html')
form = page.form_with :name => 'upload'
form.sub_name = "mbb"
form.email = "mbb#mbb.com"
form.author_name = "Mr Potatohead"
form.title = "Mr Potatohead writes Ruby"
form.link = "https://potato.potato"
form.body = "More text"
`open #{page.uri}`
Calling out to the operating system to open the site is, of course, empty form. I don't see a page.open or similar method available. Is there a way to achieve this (using mechanize or other gems)?
That won't work because setting form fields doesn't even update the DOM.
If you want to review the form data you can inspect form.request_data
As others have mentioned in the comments try selenium, you'll need chrome or firefox driver installed, here's example with chrome to get you started:
require 'selenium-webdriver'
require 'pry' # optional
driver = Selenium::WebDriver.for :chrome
driver.navigate.to 'https://www.linuxtoday.com/contribute.html'
form = driver.find_element(id: 'upload')
form.find_element(id: 'sub_name').send_keys 'mbb'
form.find_element(id: 'email').send_keys 'mbb#mbb.com'
binding.pry # or sleep 60
driver.quit
For more instructions see documentation

How do i resolve an HTTP500 Error while web scraping with Mechanize in ruby?

I want to retrieve my driving license number, issue_date, and expiry_date from this website("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp"). When I try to fetch it, I get the error Mechanize::ResponseCodeError: 500 => Net::HTTPInternalServerError for https://sarathi.nic.in:8443/nrportal/sarathi/DlDetRequest.jsp -- unhandled response.
This is the code that I wrote to scrape:
require 'mechanize'
require 'logger'
require 'nokogiri'
require 'open-uri'
require 'openssl'
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari 4'
Mechanize.new.get("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")
page=agent.get('https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp') # opening home page.
page = agent.page.links.find { |l| l.text == 'Status of Licence' }.click # click the link.
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field.
page.form_with(:name=>"dlform").field_with(:name=>"javax.faces.ViewState").value="SUBMIT" #submit button value assigning.
page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp") #to specify the form i need.
agent.cookie_jar.clear!
gg=agent.submit page.forms.last #submitting my form
It isn't working since you are clearing off the cookies before submitting the form, hence removing all the input data you provided. I could get it working by removing it simply as:
...
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field
form = page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp")
gg = agent.submit form, form.buttons.first
Note that you do not need to set the value for #submit button, rather pass the submit button while form submission itself.

Using mechanize with watir + phantomjs

I'm trying to insert the html generated from phantom js into a mechanize object so that I can easily search it. I've tried the following to no avail...
b = Watir::Browser.new :phantomjs
url = "www.google.com"
b.goto url
agent = Mechanize.new
#Following is not executed at same time...
#Error 1: lots of errors
page = agent.get(b.html)
#Error 2: `parse': wrong number of arguments (1 for 3) (ArgumentError)
page = agent.parse(b.html)
#Error 3 last ditch effort: undefined method `agent'
page = agent(b.html)
As I think it through I'm beginning to wonder if I can mechanize an existing html object... I initially got onto it via: http://shane.in/2014/01/headless-web-scraping/ & http://watirmelon.com/2013/02/05/watir-webdriver-with-ghostdriver-on-osx-headless-browser-testing/
I was in the same situation. I write a lot of code with Mechanize so that I do not want to move to nokogiri when using watir. Below code is how I did.
require 'watir'
require 'mechanize'
b = Watir::Browser.new
b.goto(url)
html = b.html
a = Mechanize.new
page = Mechanize::Page.new(nil, {'content-type'=>'text/html'}, html, nil, a)
You could use page to search for elements.
require 'watir'
require 'nokogiri'
b = Watir::Browser.new :phantomjs
url = "http://google.com"
b.goto url
p Nokogiri::HTML(b.html)
You are probably better off just using Nokogiri for this [that is, if you only need to search for some data in source].

Mechanize. Ruby. Can't get drop-down menu with dynamic content of hidden fields

I'm not experienced in Ruby + Mechanize, just starting, so... pls help.
I tried to fill out form with dynamic content. But can't get how I could do it step by step.
That is my code:
#!/usr/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'mechanize'
require 'logger'
url = "https://visapoint.eu/visapoint2/disclaimer.aspx"
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.log = Logger.new(STDOUT)
page = agent.get(url)
page.encoding = 'utf-8'
# Disclamer.aspx object page
disclamer_page = agent.page
# Click Accept button on Disclamer.aspx
accept_button = disclamer_page.form.button_with(:value =>'Accept')
action_page = disclamer_page.form.click_button( accept_button )
# Click New Appointment button on Action.aspx
new_appointment_button = action_page.form.button_with(:name => 'ctl00$cphMain$btnNewAppointment_input')
form_page = action_page.form.click_button ( new_appointment_button )
# Fill form on Form.aspx page
form_page.form['ctl00$cphMain$ddCitizenship'] = "Kazakhstan (Қазақстан)"
form_page.form['ctl00$cphMain$ddCountryOfResidence'] = "Kazakhstan (Қазақстан)"
form_page.form['ctl00$cphMain$ddEmbassy'] = "#Kazakhstan (Қазақстан) - Astana"
form_page.form['ctl00$cphMain$ddVisaType'] = "Long-stay visa for study"
pp form_page.form
formpage_next = agent.submit(form_page.form, form_page.form.buttons.last)
pp formpage_next
After I have send the form_page, I have expect the reloaded page with captcha, but there is nothing there.

post form parameters difference between Firefox and Ruby Mechanize

I am trying to figure out if mechanize sends correct post query.
I want to log in to a forum (please see html source, mechanize log in my other question) but I get only the login page again. When looking into it I can see that firefox sends out post with parameters like
auth_username=myusername&auth_password=mypassword&auth_login=Login but my script sends
auth_username=radek&auth_password=mypassword is that ok or the &auth_login=Login part must be present?
When I tried to add it using login_form['auth_login'] = 'Login' I got an error gems/mechanize-0.9.3/lib/www/mechanize/page.rb:13 inmeta': undefined method search' for nil:NilClass (NoMethodError)
It seems to me that auth_login is a form button not a field (I don't know if it matters)
[#<WWW::Mechanize::Form
{name nil}
{method "POST"}
{action
"http://www.somedomain.com/login?auth_successurl=http://www.somedomain.com/forum/yota?baz_r=1"}
{fields
#<WWW::Mechanize::Form::Field:0x36946c0 #name="auth_username", #value="">
#<WWW::Mechanize::Form::Field:0x369451c #name="auth_password", #value="">}
{radiobuttons}
{checkboxes}
{file_uploads}
{buttons
#<WWW::Mechanize::Form::Button:0x36943b4
#name="auth_login",
#value="Login">}>
]
My script is as follow
require 'rubygems'
require 'mechanize'
require 'logger'
agent = WWW::Mechanize.new {|a| a.log = Logger.new("loginYOTA.log") }
agent.follow_meta_refresh = true #Mechanize does not follow meta refreshes by default, we need to set that option.
page = agent.get("http://www.somedomain.com/login?auth_successurl=http://www.somedomain.com/forum/yota?baz_r=1")
login_form = page.form_with(:method => 'POST') #works
puts login_form.buttons.inspect
puts page.forms.inspect
STDIN.gets
login_form.fields.each { |f| puts "#{f.name} : #{f.value}" }
#STDIN.gets
login_form['auth_username'] = 'myusername'
login_form['auth_password'] = 'mypassword'
login_form['auth_login'] = 'Login'
STDIN.gets
page = agent.submit login_form
#Display message if logged in
puts page.parser.xpath("/html/body/div/div/div/table/tr/td[2]/div/strong").xpath('text()').to_s.strip
puts
puts page.parser.xpath("/html/body/div/div/div/table/tr/td[2]/div").xpath('text()').to_s.strip
output = File.open("login.html", "w") {|f| f.write(page.parser.to_html) }
You can find more code, html, log in my other related question log in with browser and then ruby/mechanize takes it over?
the absence of one parameter compare to firefox in POST caused mechanize not to log in. Adding new parameter solved this problem. So it seems to me that the web server requires &auth_login=Login parameter to be in POST.
You can read how to add new field to mechanize form in another question.

Resources