post form parameters difference between Firefox and Ruby Mechanize - ruby

I am trying to figure out if mechanize sends correct post query.
I want to log in to a forum (please see html source, mechanize log in my other question) but I get only the login page again. When looking into it I can see that firefox sends out post with parameters like
auth_username=myusername&auth_password=mypassword&auth_login=Login but my script sends
auth_username=radek&auth_password=mypassword is that ok or the &auth_login=Login part must be present?
When I tried to add it using login_form['auth_login'] = 'Login' I got an error gems/mechanize-0.9.3/lib/www/mechanize/page.rb:13 inmeta': undefined method search' for nil:NilClass (NoMethodError)
It seems to me that auth_login is a form button not a field (I don't know if it matters)
[#<WWW::Mechanize::Form
{name nil}
{method "POST"}
{action
"http://www.somedomain.com/login?auth_successurl=http://www.somedomain.com/forum/yota?baz_r=1"}
{fields
#<WWW::Mechanize::Form::Field:0x36946c0 #name="auth_username", #value="">
#<WWW::Mechanize::Form::Field:0x369451c #name="auth_password", #value="">}
{radiobuttons}
{checkboxes}
{file_uploads}
{buttons
#<WWW::Mechanize::Form::Button:0x36943b4
#name="auth_login",
#value="Login">}>
]
My script is as follow
require 'rubygems'
require 'mechanize'
require 'logger'
agent = WWW::Mechanize.new {|a| a.log = Logger.new("loginYOTA.log") }
agent.follow_meta_refresh = true #Mechanize does not follow meta refreshes by default, we need to set that option.
page = agent.get("http://www.somedomain.com/login?auth_successurl=http://www.somedomain.com/forum/yota?baz_r=1")
login_form = page.form_with(:method => 'POST') #works
puts login_form.buttons.inspect
puts page.forms.inspect
STDIN.gets
login_form.fields.each { |f| puts "#{f.name} : #{f.value}" }
#STDIN.gets
login_form['auth_username'] = 'myusername'
login_form['auth_password'] = 'mypassword'
login_form['auth_login'] = 'Login'
STDIN.gets
page = agent.submit login_form
#Display message if logged in
puts page.parser.xpath("/html/body/div/div/div/table/tr/td[2]/div/strong").xpath('text()').to_s.strip
puts
puts page.parser.xpath("/html/body/div/div/div/table/tr/td[2]/div").xpath('text()').to_s.strip
output = File.open("login.html", "w") {|f| f.write(page.parser.to_html) }
You can find more code, html, log in my other related question log in with browser and then ruby/mechanize takes it over?

the absence of one parameter compare to firefox in POST caused mechanize not to log in. Adding new parameter solved this problem. So it seems to me that the web server requires &auth_login=Login parameter to be in POST.
You can read how to add new field to mechanize form in another question.

Related

How do i resolve an HTTP500 Error while web scraping with Mechanize in ruby?

I want to retrieve my driving license number, issue_date, and expiry_date from this website("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp"). When I try to fetch it, I get the error Mechanize::ResponseCodeError: 500 => Net::HTTPInternalServerError for https://sarathi.nic.in:8443/nrportal/sarathi/DlDetRequest.jsp -- unhandled response.
This is the code that I wrote to scrape:
require 'mechanize'
require 'logger'
require 'nokogiri'
require 'open-uri'
require 'openssl'
OpenSSL::SSL::VERIFY_PEER = OpenSSL::SSL::VERIFY_NONE
agent = Mechanize.new
agent.log = Logger.new "mech.log"
agent.user_agent_alias = 'Mac Safari 4'
Mechanize.new.get("https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp")
page=agent.get('https://sarathi.nic.in:8443/nrportal/sarathi/HomePage.jsp') # opening home page.
page = agent.page.links.find { |l| l.text == 'Status of Licence' }.click # click the link.
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field.
page.form_with(:name=>"dlform").field_with(:name=>"javax.faces.ViewState").value="SUBMIT" #submit button value assigning.
page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp") #to specify the form i need.
agent.cookie_jar.clear!
gg=agent.submit page.forms.last #submitting my form
It isn't working since you are clearing off the cookies before submitting the form, hence removing all the input data you provided. I could get it working by removing it simply as:
...
page.forms_with(:name=>"dlform").first.field_with(:name=>"dlform:DLNumber").value="TN3‌​8 20120001119" #user input to text field
form = page.form(:name=>"dlform",:action=>"/nrportal/sarathi/DlDetRequest.jsp")
gg = agent.submit form, form.buttons.first
Note that you do not need to set the value for #submit button, rather pass the submit button while form submission itself.

Google login with mechanize on ruby

I'm trying to get to google play developer console using ruby. But first I have to login. I'm trying like this:
def try_post(url, body = {}, headers = {})
unless #agent #This just creates a new mechanize instance
setup
end
puts 'Logging in'
# Hardcoded for testing purposes
#agent.get 'https://accounts.google.com/ServiceLogin?service=androiddeveloper&passive=1209600&continue=https://play.google.com/apps/publish/%23&followup=https://play.google.com/apps/publish/#identifier'
form = #agent.page.forms.find {|f| f.form_node['id'] == "gaia_loginform"}
unless form
raise 'No login form'
end
form.field_with(:id => "Email").value = #config.email
form.click_button
form = #agent.page.forms.find {|f| f.form_node['id'] == "gaia_loginform"}
unless form
raise 'No login form'
end
form.field_with(:name => "Passwd").value = #config.password
form.click_button
if #agent.page.uri.host != "play.google.com"
STDERR.puts "login failed? : uri = " + #agent.page.uri.to_s
raise 'Google login failed'
end
# #agent.post(url, body)
end
However this fails spectacularly. I tried a few other ways (trying to populate Passwd-hidden, finding field by id and so on) but no luck. I think that the password does not get entered since when I try to puts #agent.page.body after the final click_button I see enter password text somewhere in HTML.
What am I doing wrong and how can I fix it?
I've been digging around a bit more and found out that it's not that simple and I could not login with mechanize in any way.
So I ended up with using watir which was fairly simple and straightforward. Here's an example:
browser.goto LOGIN_URL
browser.text_field(:id, 'Email').set #config.email
browser.button(:id, 'next').click
browser.text_field(:id, 'Passwd').wait_until_present
browser.text_field(:id, 'Passwd').set #config.password
browser.button(:id, 'signIn').click
# Here I wait until an element on my target page is visible and then continue
browser.link(:href, '#SOMETHING').wait_until_present
Hope it helps.

How to set the Referer header before loading a page with Ruby mechanize?

Is there a straightforward way to set custom headers with Mechanize 2.3?
I tried a former solution but get:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p|
p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main'
}
# ./mech.rb:30:in `<main>': undefined method `pre_connect_hooks' for nil:NilClass (NoMethodError)
The docs say:
get(uri, parameters = [], referer = nil, headers = {}) { |page| ... }
so for example:
agent.get 'http://www.google.com/', [], agent.page.uri, {'foo' => 'bar'}
alternatively you might like:
agent.request_headers = {'foo' => 'bar'}
agent.get url
You misunderstood the code you were copying. There was a newline in the example, but it disappeared in the formatting as it wasn't tagged as code. $agent contains nil since you're trying to use it before it has been initialized. You must initialize the object and then use it. Just try this:
$agent = Mechanize.new
$agent.pre_connect_hooks << lambda { |p| p[:request]['Referer'] = 'https://wwws.mysite.com/cgi-bin/apps/Main' }
For this question I noticed people seem to use:
page = agent.get("http://www.you.com/index_login/", :referer => "http://www.you.com/")
As an aside, now that I tested this answer, it seems this was not the issue behind my actual problem: that every visit to a site I'm scraping requires going through the login sequence pages again, even seconds later after the first logged-in visit, despite that I'm always loading and saving the complete cookie jar in yaml format. But that would lead to another question of course.

How to submit formstack form using ruby?

I have a form similiar to THIS and want to be submit data to it from a CSV file using ruby. Here is what I have been trying to do:
require 'uri'
require 'net/http'
params = {
'field15157482-first' => 'bip',
'field15157482-last' => 'bop',
'field15157485' => 'bip#bob.com',
'field15157487' => 'option1'
'fsSubmitButton1196962' => 'Submit'
}
x = Net::HTTP.post_form(URI.parse('http://www.formstack.com/forms/?1196833-GxMTxR20GK'), params)
I keep getting A valid form ID was not supplied. I have a hunch I am using the wrong URL but I don't know what to replace it with.
I would use the the API but I don't have access to the token hence my stone age approach. Any suggestions would be much appreciated.
The form uses hidden variables and cookies to attempt to maintain a "unique session". Fortunately, Mechanize makes handling 'sneaky' forms quite easy.
require "mechanize"
form_uri = "http://www.formstack.com/forms/?1196962-617Z6Foyif"
#agent = Mechanize.new
page = #agent.get form_uri
form = page.forms[0]
form.fields_with(:class => /fsField/).each do |field|
field.value = case field.name
when /first/ then "First Name"
when /last/ then "Last Name"
else "email#address.com"
end
end
page = form.submit form.buttons.first
puts
puts "=== Response Header"
puts
puts page.header
puts
puts "=== Response Body"
puts
puts page.body
Looking at the source on http://www.formstack.com/forms/?1196833-GxMTxR20GK and the example in your link, it appears that formstack forms post to index.php, and require a form id to be passed in to identify which form is being submitted.. Looking at the forms in both examples, you'll see a field similar to this:
<input type="hidden" name="form" value="1196833" />
Try adding the following to your params hash:
'form' => '1196883' # or other appropriate form value
You may also need to include the other hidden fields for a valid submit.

how to add new field to mechanize form (ruby/mechanize)

there is a public class method to add field to mechanize form
I tried ..
#login_form.field.new('auth_login','Login')
#login_form.field.new('auth_login','Login')
and both gives me an error undefined method "new" for #<WWW::Mechanize::Form::Field:0x3683cbc> (NoMethodError)
I tried login_form.field.new('auth_login','Login') which gives me an error
mechanize-0.9.3/lib/www/mechanize/page.rb:13 n `meta': undefined method `search' for nil:NilClass (NoMethodError)
but at the time I submit the form. The field does not exist in html source. I want to add it so POST query sent by my script will contain auth_username=myusername&auth_password=mypassword&auth_login=Login So far it sends only auth_username=radek&auth_password=mypassword which might be why I cannot get logged in. Just my thought.
The script looks like
require 'rubygems'
require 'mechanize'
require 'logger'
agent = WWW::Mechanize.new {|a| a.log = Logger.new("loginYOTA.log") }
agent.follow_meta_refresh = true #Mechanize does not follow meta refreshes by default, we need to set that option.
page = agent.get("http://www.somedomain.com/login?auth_successurl=http://www.somedomain.com/forum/yota?baz_r=1")
login_form = page.form_with(:method => 'POST')
puts login_form.buttons.inspect
puts page.forms.inspect
#STDIN.gets
login_form.fields.each { |f| puts "#{f.name} : #{f.value}" }
login_form['auth_username'] = 'radeks'
login_form['auth_password'] = 'TestPass01'
#login_form['auth_login'] = 'Login'
#login_form.field.new('auth_login','Login')
#login_form.field.new('auth_login','Login')
#login_form.fields.each { |f| puts "#{f.name} : #{f.value}" }
#STDIN.gets
page = agent.submit login_form
#Display welcome message if logged in
puts page.parser.xpath("/html/body/div/div/div/table/tr/td[2]/div/strong").xpath('text()').to_s.strip
puts
puts page.parser.xpath("/html/body/div/div/div/table/tr/td[2]/div").xpath('text()').to_s.strip
output = File.open("login.html", "w") {|f| f.write(page.parser.to_html) }
The .inspect of the form looks like
[#<WWW::Mechanize::Form
{name nil}
{method "POST"}
{action
"http://www.somedomain.com/login?auth_successurl=http://www.somedomain.com/forum/yota?baz_r=1"}
{fields
#<WWW::Mechanize::Form::Field:0x36946c0 #name="auth_username", #value="">
#<WWW::Mechanize::Form::Field:0x369451c #name="auth_password", #value="">}
{radiobuttons}
{checkboxes}
{file_uploads}
{buttons
#<WWW::Mechanize::Form::Button:0x36943b4
#name="auth_login",
#value="Login">}>
]
I think what you're looking for is
login_form.add_field!(field_name, value = nil)
Here are the docs:
http://rdoc.info/projects/tenderlove/mechanize
The difference between this and the method WWW::Mechanize::Form::Field.new is not much, aside from the fact that there aren't many ways to add fields to a form. Here's how the add_field! method is implemented....you can see that it's exactly what you'd expect. It instantiates a Field object, then adds it to the form's 'fields' array. You wouldn't be able to do this in your code because the method "fields<<" is a private method inside "Form."
# File lib/www/mechanize/form.rb, line 65
def add_field!(field_name, value = nil)
fields << Field.new(field_name, value)
end
On a side note, according to the docs you should be able to do the first variation you proposed:
login_form['field_name']='value'
Hope this helps!
another way how to add new field is to so at the time of posting the form
page = agent.post( url, {'auth_username'=>'myusername', #existing field
'auth_password'=>'mypassword', #existing field
'auth_login'=>'Login'}) #new field

Resources