Switching ip while parsing the site with ruby mechanize - ruby

Is there any way to change, or hide send request ip, while i'm parsing a website with my ruby mechanize program? To avoid bun from site server.
I've seen sites changing ip-adresses, like this http://www.newipnow.com/ . But don't figure how to use it in my program.
Here is my code:
require 'rubygems'
require 'mechanize'
require 'nokogiri'
require 'logger'
Encoding.default_external = Encoding::UTF_8
Encoding.default_internal = Encoding::UTF_8
agent = Mechanize.new do |a|
a.ssl_version,
a.verify_mode = 'SSLv3',
OpenSSL::SSL::VERIFY_NONE, a.user_agent_alias = 'Windows Mozilla'
end
authrization = agent.get("http://vk.com/")
vk_form = authrization.forms.first
vk_form.email = 'myaccount'
vk_form.pass = 'mypassword'
authrization = agent.submit(vk_form, vk_form.buttons.first)

Yes, you can set a proxy like this:
agent.set_proxy host, port, user, pass

Related

Ruby mechanize Form

Is there anyway to copy the out put of forms available to a file, like
require 'mechanize'
agent = Mechanize.new
page = agent.get("http://www.google.com")
search = page.form_with(:action => "/search")
I want to store the result/output which is shown in irb of the "search" to a file?

Mechanize. Ruby. Can't get drop-down menu with dynamic content of hidden fields

I'm not experienced in Ruby + Mechanize, just starting, so... pls help.
I tried to fill out form with dynamic content. But can't get how I could do it step by step.
That is my code:
#!/usr/bin/env ruby
# encoding: utf-8
require 'rubygems'
require 'mechanize'
require 'logger'
url = "https://visapoint.eu/visapoint2/disclaimer.aspx"
agent = Mechanize.new
agent.user_agent_alias = 'Mac Safari'
agent.log = Logger.new(STDOUT)
page = agent.get(url)
page.encoding = 'utf-8'
# Disclamer.aspx object page
disclamer_page = agent.page
# Click Accept button on Disclamer.aspx
accept_button = disclamer_page.form.button_with(:value =>'Accept')
action_page = disclamer_page.form.click_button( accept_button )
# Click New Appointment button on Action.aspx
new_appointment_button = action_page.form.button_with(:name => 'ctl00$cphMain$btnNewAppointment_input')
form_page = action_page.form.click_button ( new_appointment_button )
# Fill form on Form.aspx page
form_page.form['ctl00$cphMain$ddCitizenship'] = "Kazakhstan (Қазақстан)"
form_page.form['ctl00$cphMain$ddCountryOfResidence'] = "Kazakhstan (Қазақстан)"
form_page.form['ctl00$cphMain$ddEmbassy'] = "#Kazakhstan (Қазақстан) - Astana"
form_page.form['ctl00$cphMain$ddVisaType'] = "Long-stay visa for study"
pp form_page.form
formpage_next = agent.submit(form_page.form, form_page.form.buttons.last)
pp formpage_next
After I have send the form_page, I have expect the reloaded page with captcha, but there is nothing there.

Why is Mechanize not following the link

I am trying to follow a link with Mechanize but it does not seem to be working, syntax appears to be correct, am I referencing this incorrectly or do I need to do something else?
Problem area
agent.page.links_with(:text => 'VG278H')[2].click
Full Code
require 'rubygems'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
agent.get ("http://icecat.biz/en/")
#Show all form fields belonging to the first form
form = agent.page.forms[0].fields
#Enter VG278H into the text box lookup_text, submit the data
agent.page.forms[0]["lookup_text"] = "VG278H"
agent.page.forms[0].submit #Results of this is stored in Mechanize agent.page object
#Call agent.page with our results and assign them to a variable page
page = agent.page
agent.page.links_with(:text => 'VG278H')[2].click
doc = page.parser
puts doc
You should grab a copy of Charles (http://www.charlesproxy.com/) or something that allows you to watch what happens when you submit the form from your browser. Anyway, your problem is that this part:
agent.page.forms[0]["lookup_text"] = "VG278H"
agent.page.forms[0].submit
is returning an html fragment that looks like this:
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><script>self.location.href="http://icecat.us/index.cgi?language=en&new_search=1&lookup_text=VG278H"</script>
So you actually need to call this directly or scrap out the the self.location.href and have your agent perform a get:
page = agent.get("http://icecat.us/index.cgi?language=en&new_search=1&lookup_text=VG278H")
If you were going to do that, this works:
require 'rubygems'
require 'mechanize'
require 'open-uri'
agent = Mechanize.new
agent.get ("http://icecat.biz/en/")
page = agent.get("http://icecat.us/index.cgi?language=en&new_search=1&lookup_text=VG278H")
page = page.links_with(:text => 'VG278H')[2].click
doc = page.parser
puts doc
Happy scraping

How can I redirect pretty-print in IRB

I am trying to redirect pretty-print output in IRB but pp page >> results.txt does not work.
How can I redirect pretty print to file? I am using Windows OS.
My code
require 'nokogiri'
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
page = agent.get('http://www.asus.com/Search/')
pp page
You can't redirect output to a file inside a Ruby script using >>. That trick only works at the command-line.
To write to a file use:
File.open('results.txt', 'a') { |fo| pp page, fo }
See the documentation for pp for more information.
Ok I got it to work, for anyone curious, this is based on another pretty print question I found:
require 'nokogiri'
require 'mechanize'
agent = Mechanize.new
agent.user_agent_alias = 'Windows Mozilla'
page = agent.get('http://www.asus.com/Search/')
pp page
File.open("results.txt","w") do |f|
PP.pp(page,f)
end

Retrieve Google Checkout CSV (no API)

I'm trying to retrieve the Google Checkout report (Download data to spreadsheet (.csv)). Unfortunatly I can't use the API (it's reserved to only UK and US accounts...!)
I have a script made with Mechanize and Ruby but I have an error : "Net::HTTPBadRequest 1.1 400 Bad Request".
Here is my code :
require 'rubygems'
require 'mechanize'
require 'logger'
agent = Mechanize.new { |a| a.log = Logger.new(STDERR) }
agent.user_agent_alias = 'Mac Safari'
page = agent.get 'https://checkout.google.com/sell/orders'
form = page.forms.first
form.Email = 'email#gmail.com'
form.Passwd = 'password'
page = agent.submit(form, form.buttons.first)
form = page.forms.last
p form
form['start-date'] = "2012-11-16"
form['end-date'] = "2012-11-17"
form['column-style'] = "EXPANDED"
#form['_type'] = "order-list-request"
#form['date-time-zone'] = "America/Los_Angeles"
#form['financial-state'] = ""
#form['query-type'] = ""
p form
begin
page = agent.submit(form, form.buttons.first)
rescue Mechanize::ResponseCodeError => ex
puts ex.page.body
end
Thanks to pguardiario and Charles proxy, I found my error... There was a superfluous field!

Resources