Open filled-form page in ruby - ruby

I'm using mechanize to fill out a form, but I want to review it on the webpage before submission. The goal is to open a browser with the pre-filled form.
require 'mechanize'
mechanize = Mechanize.new
page = mechanize.get('https://www.linuxtoday.com/contribute.html')
form = page.form_with :name => 'upload'
form.sub_name = "mbb"
form.email = "mbb#mbb.com"
form.author_name = "Mr Potatohead"
form.title = "Mr Potatohead writes Ruby"
form.link = "https://potato.potato"
form.body = "More text"
`open #{page.uri}`
Calling out to the operating system to open the site is, of course, empty form. I don't see a page.open or similar method available. Is there a way to achieve this (using mechanize or other gems)?

That won't work because setting form fields doesn't even update the DOM.
If you want to review the form data you can inspect form.request_data

As others have mentioned in the comments try selenium, you'll need chrome or firefox driver installed, here's example with chrome to get you started:
require 'selenium-webdriver'
require 'pry' # optional
driver = Selenium::WebDriver.for :chrome
driver.navigate.to 'https://www.linuxtoday.com/contribute.html'
form = driver.find_element(id: 'upload')
form.find_element(id: 'sub_name').send_keys 'mbb'
form.find_element(id: 'email').send_keys 'mbb#mbb.com'
binding.pry # or sleep 60
driver.quit
For more instructions see documentation

Related

How do i resolve an HTTP500 Error while web scraping with Mechanize in ruby?

I want to retrieve my driving license number, issue_date, and expiry_date from this website("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp"). When I try to fetch it, I get the error Mechanize::ResponseCodeError: 500 => Net::HTTPInternalServerError for https://sarathi.nic.in:8443/nrportal/sarathi/DlDetRequest.jsp -- unhandled response.
This is the code that I wrote to scrape:
require 'mechanize'
require 'logger'
require 'nokogiri'
require 'open-uri'
require 'openssl'
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari 4'
Mechanize.new.get("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")
page=agent.get('https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp') # opening home page.
page = agent.page.links.find { |l| l.text == 'Status of Licence' }.click # click the link.
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field.
page.form_with(:name=>"dlform").field_with(:name=>"javax.faces.ViewState").value="SUBMIT" #submit button value assigning.
page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp") #to specify the form i need.
agent.cookie_jar.clear!
gg=agent.submit page.forms.last #submitting my form
It isn't working since you are clearing off the cookies before submitting the form, hence removing all the input data you provided. I could get it working by removing it simply as:
...
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field
form = page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp")
gg = agent.submit form, form.buttons.first
Note that you do not need to set the value for #submit button, rather pass the submit button while form submission itself.

Not able to login into rottentomatoes.com using mechanize

I am using following code :-
require 'rubygems'
require 'mechanize'
agent = Mechanize.new
agent.get("https://www.rottentomatoes.com/user/account/login/") do |login_page|
inside_page = login_page.form_with(:action => 'https://www.rottentomatoes.com/user/account/login/') do |f|
f.login_username = "random#mailinator.com"
f.login_password = "123456"
end.click_button
end
There isn't any issue with your code, the issue is how Rotten Tomatoes handles a login, they redirect back to the homepage via JavaScript in the HTML body. I added a single line to your code (and added my credentials):
puts agent.page.body
The Result:
<script>
window.top.location='http://www.rottentomatoes.com/';
</script>
So, you can either use their API or if you want to proceed and execute the JavaScript to follow the redirect you can use WATIR or Selenium.

Using mechanize with watir + phantomjs

I'm trying to insert the html generated from phantom js into a mechanize object so that I can easily search it. I've tried the following to no avail...
b = Watir::Browser.new :phantomjs
url = "www.google.com"
b.goto url
agent = Mechanize.new
#Following is not executed at same time...
#Error 1: lots of errors
page = agent.get(b.html)
#Error 2: `parse': wrong number of arguments (1 for 3) (ArgumentError)
page = agent.parse(b.html)
#Error 3 last ditch effort: undefined method `agent'
page = agent(b.html)
As I think it through I'm beginning to wonder if I can mechanize an existing html object... I initially got onto it via: http://shane.in/2014/01/headless-web-scraping/ & http://watirmelon.com/2013/02/05/watir-webdriver-with-ghostdriver-on-osx-headless-browser-testing/
I was in the same situation. I write a lot of code with Mechanize so that I do not want to move to nokogiri when using watir. Below code is how I did.
require 'watir'
require 'mechanize'
b = Watir::Browser.new
b.goto(url)
html = b.html
a = Mechanize.new
page = Mechanize::Page.new(nil, {'content-type'=>'text/html'}, html, nil, a)
You could use page to search for elements.
require 'watir'
require 'nokogiri'
b = Watir::Browser.new :phantomjs
url = "http://google.com"
b.goto url
p Nokogiri::HTML(b.html)
You are probably better off just using Nokogiri for this [that is, if you only need to search for some data in source].

Rails ruby-mechanize how to get a page after redirection

I want to collect manufacturers and their medicine details from http://www.mims.com/India/Browse/Alphabet/All?cat=Company&tab=company.
Mechanize gem is used to extract content from html page with help of ryan Tutorial
I can login successfully but couldn't reach desination page http://www.mims.com/India/Browse/Alphabet/All?cat=Company&tab=company.
I have tried so far
require 'rubygems'
require 'nokogiri'
require 'open-uri'
require 'mechanize'
agent = Mechanize.new
agent.user_agent = 'Individueller User-Agent'
agent.user_agent_alias = 'Linux Mozilla'
agent.get("https://sso.mims.com/Account/SignIn") do |page|
#login_page = a.click(page.link_with(:text => /Login/))
# Submit the login form
login_page = page.form_with(:action => '/') do |f|
f.SignInEmailAddress = 'username of mims'
f.SignInPassword = 'secret'
end.click_button
url = 'http://www.mims.com/India/Browse/Alphabet/A?cat=drug'
page = agent.get url # here checking authentication if success then redirecting to destination
p page
end
Note: I have shared dummy login credential for your testing
After clicks on 'CompaniesBrowse Company Directory' link, page redirecting with flash message "you are redirecting...", Mechanize gem caches this page.
Question:
1) How to get the original page(after redirection).
I found problem cases that MIMS site auto submit form with page onload callback for checking authentication. It is not working with machanize gem.
Solution
Manually submitting the form two times solves this issue. Example
url = 'http://www.mims.com/India/Browse/Alphabet/A?cat=drug'
page = agent.get url # here checking authentication if success then redirecting to destination
p page
page.form.submit
agent.page.form.submit

Why is Mechanize not following the link

I am trying to follow a link with Mechanize but it does not seem to be working, syntax appears to be correct, am I referencing this incorrectly or do I need to do something else?
Problem area
agent.page.links_with(:text => 'VG278H')[2].click
Full Code
require 'rubygems'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
agent.get ("http://icecat.biz/en/")
#Show all form fields belonging to the first form
form = agent.page.forms[0].fields
#Enter VG278H into the text box lookup_text, submit the data
agent.page.forms[0]["lookup_text"] = "VG278H"
agent.page.forms[0].submit #Results of this is stored in Mechanize agent.page object
#Call agent.page with our results and assign them to a variable page
page = agent.page
agent.page.links_with(:text => 'VG278H')[2].click
doc = page.parser
puts doc
You should grab a copy of Charles (http://www.charlesproxy.com/) or something that allows you to watch what happens when you submit the form from your browser. Anyway, your problem is that this part:
agent.page.forms[0]["lookup_text"] = "VG278H"
agent.page.forms[0].submit
is returning an html fragment that looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><script>self.location.href="http://icecat.us/index.cgi?language=en&new_search=1&lookup_text=VG278H"</script>
So you actually need to call this directly or scrap out the the self.location.href and have your agent perform a get:
page = agent.get("http://icecat.us/index.cgi?language=en&new_search=1&lookup_text=VG278H")
If you were going to do that, this works:
require 'rubygems'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
agent.get ("http://icecat.biz/en/")
page = agent.get("http://icecat.us/index.cgi?language=en&new_search=1&lookup_text=VG278H")
page = page.links_with(:text => 'VG278H')[2].click
doc = page.parser
puts doc
Happy scraping

Resources