I have this code:
#!/bin/env ruby
# encoding: utf-8
require 'mechanize'
agent = Mechanize.new
agent.robots = false
agent.user_agent_alias = 'Mac Safari'
url = "http://www.paris.cl/tienda/es/paris/computacion/tablet/tablet-acer--iconia-b1-710-l688-7-342750-ppp-"
website = agent.get(url)
rescue Exception => e
puts "Error : " + e.message
This try to get a website, but I get this error:
Error : 403 => Net::HTTPForbidden for http://www.paris.cl/tienda/es/paris/computacion/tablet/tablet-acer--iconia-b1-710-l688-7-342750-ppp- -- unhandled response
The webserver blocks me (before I can get the website),I try changing the IP, but nothing happend.
Exists any form to avoid this lock? (Also I don't know which type of lock is this)
Your code works for me, apparently you kicked their resource a few times as unprotected user (without proper HTTP headers) and they've blocked your IP.
happens to the best of us :)
I'm writing a program that scans for vulnerable websites, I happen to know that there are a couple sites that have vulnerabilities, and return a SQL syntax error, however, when I run the program, it skips over these sites and doesn't output that they where found or output that they where saved into a file. This program is being used for pentesting and all owners of sites are made aware of the vulnerability.
def get_urls
info("Searching for possible SQL vulnerable sites.")
#agent = Mechanize.new
page = #agent.get('http://www.google.com/')
google_form = page.form('f')
google_form.q = "#{SEARCH}"
url = #agent.submit(google_form, google_form.buttons.first)
url.links.each do |link|
if link.href.to_s =~ /url.q/
str = link.href.to_s
str_list = str.split(%r{=|&})
urls = str_list[1]
next if str_list[1].split('/')[2] == "webcache.googleusercontent.com"
urls_to_log = urls.gsub("%3F", '?').gsub("%3D", '=')
success("Site found: #{urls_to_log}")
File.open("#{PATH}/temp/SQL_sites_to_check.txt", "a+") {|s| s.puts("#{urls_to_log}'")}
info("Possible vulnerable sites dumped into #{PATH}/temp/SQL_sites_to_check.txt")
def check_if_vulnerable
info("Checking if sites are vulnerable.")
IO.read("#{PATH}/temp/SQL_sites_to_check.txt").each_line do |parse|
Timeout::timeout(5) do
parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}"))
rescue Timeout::Error, RestClient::ResourceNotFound, RestClient::SSLCertificateNotVerified, Errno::ECONNABORTED, Mechanize::ResponseCodeError, RestClient::InternalServerError => e
if e
warn("URL: #{parse.chomp} failed with error: [#{e}] dumped to non_exploitable.txt")
File.open("#{PATH}/lib/non_exploitable.txt", "a+"){|s| s.puts(parse)}
success("SQL syntax error discovered in URL: #{parse.chomp} dumped to SQL_VULN.txt")
File.open("#{PATH}/lib/SQL_VULN.txt", "a+"){|vuln| vuln.puts(parse)}
Example of usage:
[22:49:29 INFO]Checking if sites are vulnerable.
[22:49:53 WARNING]URL: http://www.police.bd/content.php?id=275' failed with error: [execution expired] dumped to non_exploitable.txt
File containing the URLs:
As you can see the program skips over 3 URLs and goes straight to the fourth one, why?
Am I doing something wrong to where this will happen?
I'm not sure if that rescue block is where it should be. You are not doing anything with the content you fetch in parsing = Nokogiri::HTML(RestClient.get("#{parse.chomp}")) and for the first three it maybe just works hence no exception and no error output. Add some output after that line to see them being fetched.
I want to test if an url exist before downloading it
I usully do this
but insted of that I want to test if a page is attributed to that url before downloading it
The only way to see if a page exists (and that you can reach it via the internet) is to perform an actual request. You could first do a HTTP HEAD request, which only requests the headers, not the actual content:
url = "www.some_url.com/atributes"
agent = Mechanize.New
page_exists = true
rescue SocketError
page_exists = false
if page_exists
page = agent.get(url)
# do something with page ...
But then again, you can just get rid of the extra request and rescue from errors directly with the GET request:
url = "www.some_url.com/atributes"
agent = Mechanize.New
page = agent.get(url)
# do something with page ...
rescue SocketError
puts "There is no such page."
I have a URL and I need to retrieve the URL it redirects to (the number of redirections is arbitrary).
One real example I'm working on is:
which will eventually redirect to:
which is the URL I'm interested in.
I tried with open-uri as follows:
privacy_url = "https://www.google.com/url?q=http://m.zynga.com/about/privacy-center/privacy-policy&sa=D&usg=AFQjCNESJyXBeZenALhKWb52N1vHouAd5Q"
final_url = nil
open(privacy_url) do |h|
puts "Redirecting to #{h.base_uri}"
final_url = h.base_uri
but I keep getting the original URL back, meaning that final_url is equal to privacy_url.
Is there any way to follow this kind of redirection and programmatically access the resulting URL?
I finally made it, using the Mechanize gem. They key is to enable the follow_meta_refresh options, which is disabled by default.
Here's how
require 'mechanize'
browser = Mechanize.new
browser.follow_meta_refresh = true
start_url = "https://www.google.com/url?q=http://m.zynga.com/about/privacy-center/privacy-policy&sa=D&usg=AFQjCNESJyXBeZenALhKWb52N1vHouAd5Q"
final_url = nil
browser.get(start_url) do |page|
final_url = page.uri.to_s
puts final_url # => http://company.zynga.com/privacy/policy
I know some languages have a library that allows you to get the HTTP content for a 404 or 500 message.
Is there a library that allows that for Ruby?
I've tried open-uri but it simply returns an HTTPError exception without the HTML content for the 404 response.
This doesn't seem to be stated clearly enough in the docs, but HttpError has an io attribute, which you can treat as a read only file as far as i know.
require 'open-uri'
response = open('http://google.com/blahblah')
rescue => e
puts e # Error message
puts e.io.status # Http Error code
puts e.io.readlines # Http response body
Net::HTTP supports what you need.
You can use the request_get method and it will return a response regardless of the status code.
From script/console:
> http = Net::HTTP.new('localhost', 3000)
=> #<Net::HTTP localhost:3000 open=false>
> resp = http.request_get('/foo') # a page that doesn't exist
=> #<Net::HTTPNotFound 404 Not Found readbody=true>
> resp.code
=> "404"
> resp.body
=> "<html>...</html>"
(If the library is not available to you by default, you can do a require 'net/http'
Works with HTTParty as well https://github.com/jnunemaker/httparty
require 'rubygems'
require 'httparty'
There are a number of HTTP Clients available, choose one you like from https://www.ruby-toolbox.com/categories/http_clients
In ruby, if you use mechanize following 301/302 redirects like this
require 'mechanize'
m = WWW::Mechanize.new
how to get the list of the pages mechanize was redirected through? (Like http://google.com => http://www.google.com => http://google.com.ua)
OK, here is the code in mechanize responsible for redirection
elsif res_klass <= Net::HTTPRedirection
return page unless follow_redirect?
log.info("follow redirect to: #{ response['Location'] }") if log
from_uri = page.uri
raise RedirectLimitReachedError.new(page, redirects) if redirects + 1 > redirection_limit
redirect_verb = options[:verb] == :head ? :head : :get
page = fetch_page( :uri => response['Location'].to_s,
:referer => page,
:params => [],
:verb => redirect_verb,
:redirects => redirects + 1
#history.push(page, from_uri)
return page
but trying to m.history.map {|p| puts p.uri} shows 3 times the uri of last page..
The key here is to take advantage of the built in logging in Mechanize. Here's a full code sample using the built in Rails logging facilities.
require 'mechanize'
require 'logger'
mechanize_logger = Logger.new('log/mechanize.log')
mechanize_logger.level = Logger::INFO
url = 'http://google.com'
agent = Mechanize.new
agent.log = mechanize_logger
And then check the output of log/mechanize.log in your log directory and you'll see the whole mechanize process including the intermediate urls.
I'm not certain, but here are a couple of things to try:
see what's in m.history[i].uri after the get()
You might need something like:
for m.redirection_limit in 0..99
rescue WWW::Mechanize::RedirectLimitReachedError
# code here could get control at
# intermediate redirection levels