Using Mechanize with Google Docs - ruby

I'm trying to use Mechanize login to Google Docs so that I can scrape something (not possible from the API) but I keep seem to keep getting a 404 when trying to follow the meta redirect:
require 'rubygems'
require 'mechanize'
USERNAME = "..."
PASSWORD = "..."
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en&continue=http://docs.google.com/"
agent = Mechanize.new
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = USERNAME
login_form.Passwd = PASSWORD
login_response_page = agent.submit(login_form)
redirect = login_response_page.meta[0].uri.to_s
puts "redirect: #{redirect}"
followed_page = agent.get(redirect) # throws a HTTPNotFound exception
pp followed_page
Can anyone see why this isn't working?

Andy you're awesome!!
Your code helped me to make my script workable and to login into google account. I found your error after couple of hours.It was about html escaping. As I found,Mechanize automatically escapes uri it recieves as a parameter for 'get' method. So my solution is:
EMAIL = ".."
PASSWD = ".."
agent = Mechanize.new{ |a| a.log = Logger.new("mech.log")}
agent.user_agent_alias = 'Linux Mozilla'
agent.open_timeout = 3
agent.read_timeout = 4
agent.keep_alive = true
agent.redirect_ok = true
LOGIN_URL = "https://www.google.com/accounts/Login?hl=en"
login_page = agent.get(LOGIN_URL)
login_form = login_page.forms.first
login_form.Email = EMAIL
login_form.Passwd = PASSWD
login_response_page = agent.submit(login_form)
redirect = login_response_page.meta[0].uri.to_s
puts redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/"
followed_page = agent.get(redirect.split('&')[0..-2].join('&') + "&continue=https://www.google.com/adplanner")
pp followed_page
This works just fine for me. I have replaced continue parameter from the meta tag (which is already escaped) by new one.

Related

In Mechanize (Ruby), how to login then scrape? [duplicate]

This question already has an answer here:
How to fill out login form with mechanize in Ruby?
(1 answer)
Closed 8 years ago.
My aim: On ROR 3, get a PDF file from a site which requires you to login before you can download it
My method, using Mechanize:
Step 1: log in
Step 2: since I'm logged in, get the PDF link
Thing is, when I debug and click on the link scraped, I'm redirected to the login page instead of getting the file
There are the 2 controls that I did on step 1:
(...)
search_results = form.submit
puts search_results.body
=> {"succes":true,"URL":"/sso/inscription/"}
Apparently the login succeed
puts agent.cookie_jar.jar
=> I could find the information about my session, si I guess that cookies are saved
Any hint about what I did wrong ?
(could be important: on the site, when you login into "http://elwatan.com/sso/inscription/inscription_payant.php", you are redirected to the home page (elwatan.com)
Below my code:
# step 1, login:
agent = Mechanize.new
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
search_results = form.submit
# step 2, get the PDF:
#watan = {}
page.parser.xpath('//th/a').each do |link|
puts #watan[link.text.strip] = link['href']
end
The agent variable retains the session and cookies.
So you first do your login, as you did, and then you write agent.get(---your-pdf-link-here--).
In your example code is a small error: the result of the submit is in search_results and then you continue to use page to search for the links?
So in your case, I guess it should look like (untested of course) :
# step 1, login:
agent = Mechanize.new
agent.pluggable_parser.pdf = Mechanize::FileSaver
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
page = form.submit
# step 2, get the PDF:
page.parser.xpath('//th/a').each do |link|
agent.get link['href']
end
page variable doesn't update after submit, link click, etc.
You need either work with page returned after submit:
agent = Mechanize.new
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
page = form.submit
Or manually get a new page:
agent = Mechanize.new
page = agent.get("http://elwatan.com/sso/inscription/inscription_payant.php")
form = page.form_with(:id => 'form-login-page')
form.login = "my_mail"
form.password = "my_pasword"
form.submit
page2 = agent.get('http://...')

Mechanize. Ruby. Can't get drop-down menu with dynamic content of hidden fields

I'm not experienced in Ruby + Mechanize, just starting, so... pls help.
I tried to fill out form with dynamic content. But can't get how I could do it step by step.
That is my code:
#!/usr/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'mechanize'
require 'logger'
url = "https://visapoint.eu/visapoint2/disclaimer.aspx"
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.log = Logger.new(STDOUT)
page = agent.get(url)
page.encoding = 'utf-8'
# Disclamer.aspx object page
disclamer_page = agent.page
# Click Accept button on Disclamer.aspx
accept_button = disclamer_page.form.button_with(:value =>'Accept')
action_page = disclamer_page.form.click_button( accept_button )
# Click New Appointment button on Action.aspx
new_appointment_button = action_page.form.button_with(:name => 'ctl00$cphMain$btnNewAppointment_input')
form_page = action_page.form.click_button ( new_appointment_button )
# Fill form on Form.aspx page
form_page.form['ctl00$cphMain$ddCitizenship'] = "Kazakhstan (Қазақстан)"
form_page.form['ctl00$cphMain$ddCountryOfResidence'] = "Kazakhstan (Қазақстан)"
form_page.form['ctl00$cphMain$ddEmbassy'] = "#Kazakhstan (Қазақстан) - Astana"
form_page.form['ctl00$cphMain$ddVisaType'] = "Long-stay visa for study"
pp form_page.form
formpage_next = agent.submit(form_page.form, form_page.form.buttons.last)
pp formpage_next
After I have send the form_page, I have expect the reloaded page with captcha, but there is nothing there.

Using ruby to retrieve a document from a website

I have written a script in ruby that navigates through a website and gets to a form page. Once the form page is filled out the script hits the submit button and then a dialogbox opens asking you where to save it too. I am having trouble trying to get this file. I have searched the web and cant find anything. How would i go about retrieveing the file name of the document?
I would really appreciate if someone could help me
My code is below:
browser = Mechanize.new
## CONSTANTS
LOGIN_URL = 'https://business.airtricity.com/ews/welcome.jsp'
HOME_PAGE_URL = 'https://business.airtricity.com/ews/welcome.jsp'
CONSUMPTION_REPORT_URL = 'https://business.airtricity.com/ews/touConsChart.jsp?custid=209495'
LOGIN = ""
PASS = ""
MPRN_GPRN_LCIS = "10000001534"
CONSUMPTION_DATE = "20/01/2013"
END_DATE = "27/01/2013"
DOWNLOAD = "DL"
### Login page
begin
login_page = browser.get(LOGIN_URL)
rescue Mechanize::ResponseCodeError => exception
login_page = exception.page
end
puts "+++++++++"
puts login_page.links
puts "+++++++++"
login_form = login_page.forms.first
login_form['userid'] = LOGIN
login_form['password'] = PASS
login_form['_login_form_'] = "yes"
login_form['ipAddress'] = "137.43.154.176"
login_form.submit
## home page
begin
home_page = browser.get(HOME_PAGE_URL)
rescue Mechanize::ResponseCodeError => exception
home_page = exception.page
end
puts "----------"
puts home_page.links
puts "----------"
# Consumption Report
begin
Report_Page = browser.get(CONSUMPTION_REPORT_URL)
rescue Mechanize::ResponseCodeError => exception
Report_Page = exception.page
end
puts "**********"
puts Report_Page.links
pp Report_Page
puts "**********"
Report_Form = Report_Page.forms.first
Report_Form['entity1'] = MPRN_GPRN_LCIS
Report_Form['start'] = CONSUMPTION_DATE
Report_Form['end'] = END_DATE
Report_Form['charttype'] = DOWNLOAD
Report_Form.submit
## Download Report
begin
browser.pluggable_parser.csv = Mechanize::Download
Download_Page = browser.get('https://business.airtricity.com/ews/touConsChart.jsp?custid=209495/meter_read_download_2013-1-20_2013-1-27.csv').save('Hello')
rescue Mechanize::ResponseCodeError => exception
Download_Page = exception.page
end
http://mechanize.rubyforge.org/Mechanize.html#method-i-get_file
File downloading from url it's pretty straightforward with mechanize:
browser = Mechanize.new
file_url = 'https://raw.github.com/ragsagar/ragsagar.github.com/c5caa502f8dec9d5e3738feb83d86e9f7561bd5e/.html'
downloaded_file = browser.get_file file_url
File.open('new_file.txt', 'w') { |file| file.write downloaded_file }
I've seen automation fail because of the browser agent. Perhaps you could try
browser.user_agent_alias = "Windows Mozilla"

Retrieve Google Checkout CSV (no API)

I'm trying to retrieve the Google Checkout report (Download data to spreadsheet (.csv)). Unfortunatly I can't use the API (it's reserved to only UK and US accounts...!)
I have a script made with Mechanize and Ruby but I have an error : "Net::HTTPBadRequest 1.1 400 Bad Request".
Here is my code :
require 'rubygems'
require 'mechanize'
require 'logger'
agent = Mechanize.new { |a| a.log = Logger.new(STDERR) }
agent.user_agent_alias = 'Mac Safari'
page = agent.get 'https://checkout.google.com/sell/orders'
form = page.forms.first
form.Email = 'email#gmail.com'
form.Passwd = 'password'
page = agent.submit(form, form.buttons.first)
form = page.forms.last
p form
form['start-date'] = "2012-11-16"
form['end-date'] = "2012-11-17"
form['column-style'] = "EXPANDED"
#form['_type'] = "order-list-request"
#form['date-time-zone'] = "America/Los_Angeles"
#form['financial-state'] = ""
#form['query-type'] = ""
p form
begin
page = agent.submit(form, form.buttons.first)
rescue Mechanize::ResponseCodeError => ex
puts ex.page.body
end
Thanks to pguardiario and Charles proxy, I found my error... There was a superfluous field!

login vk.com net::http.post_form

I want login to vk.com or m.vk.com without Ruby. But my code dosen't work.
require 'net/http'
email = "qweqweqwe#gmail.com"
pass = "qeqqweqwe"
userUri = URI('m.vk.com/index.html')
Net::HTTP.get(userUri)
res = Net::HTTP.post_form(userUri, 'email' => email, 'pass' => pass)
puts res.body
First of all, you need to change userUri to the following:
userUri = URI('https://login.vk.com/?act=login')
Which is where the vk site expects your login parameters.
I'm not very faimilar with vk, but you probably need a way to handle the session cookie. Both receiving it, and providing it for future requests. Can you elaborate on what you're doing after login?
Here is the net/http info for cookie handling:
# Headers
res['Set-Cookie'] # => String
res.get_fields('set-cookie') # => Array
res.to_hash['set-cookie'] # => Array
puts "Headers: #{res.to_hash.inspect}"
This kind of task is exactly what Mechanize is for. Mechanize handles redirects and cookies automatically. You can do something like this:
require 'mechanize'
agent = Mechanize.new
url = "http://m.vk.com/login/"
page = agent.get(url)
form = page.forms[0]
form['email'] = "qweqweqwe#gmail.com"
form['pass'] = "qeqqweqwe"
form.submit
puts agent.page.body

Resources