How do I get the destination URL of a shortened URL using Ruby? - ruby

How do I take this URL http://t.co/yjgxz5Y and get the destination URL which is http://nickstraffictricks.com/4856_how-to-rank-1-in-google/

require 'net/http'
require 'uri'
Net::HTTP.get_response(URI.parse('http://t.co/yjgxz5Y'))['location']
# => "http://nickstraffictricks.com/4856_how-to-rank-1-in-google/"

I've used open-uri for this, because it's nice and simple. It will retrieve the page, but will also follow multiple redirects:
require 'open-uri'
final_uri = ''
open('http://t.co/yjgxz5Y') do |h|
final_uri = h.base_uri
end
final_uri # => #<URI::HTTP:0x00000100851050 URL:http://nickstraffictricks.com/4856_how-to-rank-1-in-google/>
The docs show a nice example for using the lower-level Net::HTTP to handle redirects.
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
response = Net::HTTP.get_response(URI.parse(uri_str))
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
puts fetch('http://www.ruby-lang.org')
Of course this all breaks down if the page isn't using a HTTP redirect. A lot of sites use meta-redirects, which you have to handle by retrieving the URL from the meta tag, but that's a different question.

For resolving redirects you should use a HEAD request to avoid downloading the whole response body (imagine resolving a URL to an audio or video file).
Working example using the Faraday gem:
require 'faraday'
require 'faraday_middleware'
def resolve_redirects(url)
response = fetch_response(url, method: :head)
if response
return response.to_hash[:url].to_s
else
return nil
end
end
def fetch_response(url, method: :get)
conn = Faraday.new do |b|
b.use FaradayMiddleware::FollowRedirects;
b.adapter :net_http
end
return conn.send method, url
rescue Faraday::Error, Faraday::Error::ConnectionFailed => e
return nil
end
puts resolve_redirects("http://cre.fm/feed/m4a") # http://feeds.feedburner.com/cre-podcast

You would have to follow the redirect. I think that would help :
http://shadow-file.blogspot.com/2009/03/handling-http-redirection-in-ruby.html

Related

Ruby - Getting page content even if it doesn't exist

I am trying to put together a series of custom 404 pages.
require 'uri'
def open(url)
page_content = Net::HTTP.get(URI.parse(url))
puts page_content.content
end
open('http://somesite.com/1ygjah1761')
the following code exits the program with an error. How can I get the page content from a website, regardless of it being 404 or not.
You need to rescue from the error
def open(url)
require 'net/http'
page_content = ""
begin
page_content = Net::HTTP.get(URI.parse(url))
puts page_content
rescue Net::HTTPNotFound
puts "THIS IS 404" + page_content
end
end
You can find more information on something like this here: http://tammersaleh.com/posts/rescuing-net-http-exceptions/
Net::HTTP.get returns the page content directly as a string, so there is no need to call .content on the results:
page_content = Net::HTTP.get(URI.parse(url))
puts page_content

How to download a binary file via Net::HTTP::Get?

I am trying to download a binary file via HTTP using the following Ruby script.
#!/usr/bin/env ruby
require 'net/http'
require 'uri'
def http_download(resource, filename, debug = false)
uri = URI.parse(resource)
puts "Starting HTTP download for: #{uri}"
http_object = Net::HTTP.new(uri.host, uri.port)
http_object.use_ssl = true if uri.scheme == 'https'
begin
http_object.start do |http|
request = Net::HTTP::Get.new uri.request_uri
Net::HTTP.get_print(uri) if debug
http.read_timeout = 500
http.request request do |response|
open filename, 'w' do |io|
response.read_body do |chunk|
io.write chunk
end
end
end
end
rescue Exception => e
puts "=> Exception: '#{e}'. Skipping download."
return
end
puts "Stored download as #{filename}."
end
However it downloads the HTML source instead of the binary. When I enter the URL in the browser the binary file is downloaded. Here is a URL with which the script fails:
http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML
I execute the script as follows
pry> require 'myscript'
pry> resource = "http://dcatlas.dcgis.dc.gov/catalog/download.asp?downloadID=2175&downloadTYPE=KML"
pry> http_download(resource,"StreetTreePt.KML", true)
How can I download the binary?
Redirection experiments
I found this redirection check which looks quite reasonable. When I integrate in the response block it fails with the following error:
Exception: 'undefined method `host' for "save_download.asp?filename=StreetTreePt.KML":String'. Skipping download.
The exception does not occur in the "original" function posted above.
The documentation for Net::HTTP shows how to handle redirects:
Following Redirection
Each Net::HTTPResponse object belongs to a class for its response code.
For example, all 2XX responses are instances of a Net::HTTPSuccess subclass, a 3XX response is an instance of a Net::HTTPRedirection subclass and a 200 response is an instance of the Net::HTTPOK class. For details of response classes, see the section “HTTP Response Classes” below.
Using a case statement you can handle various types of responses properly:
def fetch(uri_str, limit = 10)
# You should choose a better exception.
raise ArgumentError, 'too many HTTP redirects' if limit == 0
response = Net::HTTP.get_response(URI(uri_str))
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
location = response['location']
warn "redirected to #{location}"
fetch(location, limit - 1)
else
response.value
end
end
print fetch('http://www.ruby-lang.org')
Or, you can use Ruby's OpenURI, which handles it automatically. Or, the Curb gem will do it. Probably Typhoeus and HTTPClient too.
According to the code you show in your question, the exception you are getting can only come from:
http_object = Net::HTTP.new(uri.host, uri.port)
which is hardly likely since uri is a URI object. You need to show the complete code if you want help with that problem.

Em-synchrony sample code not working as expected

The em-synchrony documentation links to this article which implies that this code with fiber:
require 'eventmachine'
require 'fiber'
require 'em-http-request'
def http_get(url)
f = Fiber.current
http = EventMachine::HttpRequest.new(url).get
# resume fiber once http call is done
http.callback { f.resume(http) }
http.errback { f.resume(http) }
return Fiber.yield
end
EventMachine.run do
Fiber.new {
page = http_get('http://myurl')
puts "Fetched page: #{page.response}"
EventMachine.stop
}.resume
end
...is equivalent to this much simpler code using em-synchrony:
require 'em-synchrony'
require 'em-http-request'
EventMachine.synchrony do
page = EventMachine::HttpRequest.new("http://myurl").get
p "No callbacks! Fetched page: #{page.response}"
EventMachine.stop
end
However running the two produces different results. In the first the fiber yields until the HTML response comes back, while the second seems to print immediately without waiting for the response and as a result the printed response is empty. Am I misreading or mistyping, or is the article actually suggesting the wrong thing?
You need to use extended version of EventMachine::HttpRequest that knows how to work with EventMachine.synchrony.
Change
require 'em-http-request'
to
require "em-synchrony/em-http"
This in turn will require "em-http-request" and will patch #get, #head, #post, #delete, #put methods of EventMachine::HttpRequest to work with Fibers.
Here is the link to source code of em-synchrony/em-http.

Download a zip file through Net::HTTP

I am trying to download the latest.zip from WordPress.org using Net::HTTP. This is what I have got so far:
Net::HTTP.start("wordpress.org/") { |http|
resp = http.get("latest.zip")
open("a.zip", "wb") { |file|
file.write(resp.body)
}
puts "WordPress downloaded"
}
But this only gives me a 4 kilobytes 404 error HTML-page (if I change file to a.txt). I am thinking this has something to do with the URL probably is redirected somehow but I have no clue what I am doing. I am a newbie to Ruby.
My first question is why use Net::HTTP, or code to download something that could be done more easily using curl or wget, which are designed to make it easy to download files?
But, since you want to download things using code, I'd recommend looking at Open-URI if you want to follow redirects. Its a standard library for Ruby, and very useful for fast HTTP/FTP access to pages and files:
require 'open-uri'
open('latest.zip', 'wb') do |fo|
fo.print open('http://wordpress.org/latest.zip').read
end
I just ran that, waited a few seconds for it to finish, ran unzip against the downloaded file "latest.zip", and it expanded into the directory containing their content.
Beyond Open-URI, there's HTTPClient and Typhoeus, among others, that make it easy to open an HTTP connection and send queriers/receive data. They're very powerful and worth getting to know.
NET::HTTP doesn't provide a nice way of following redirects, here is a piece of code that I've been using for a while now:
require 'net/http'
class RedirectFollower
class TooManyRedirects < StandardError; end
attr_accessor :url, :body, :redirect_limit, :response
def initialize(url, limit=5)
#url, #redirect_limit = url, limit
end
def resolve
raise TooManyRedirects if redirect_limit < 0
self.response = Net::HTTP.get_response(URI.parse(url))
if response.kind_of?(Net::HTTPRedirection)
self.url = redirect_url
self.redirect_limit -= 1
resolve
end
self.body = response.body
self
end
def redirect_url
if response['location'].nil?
response.body.match(/<a href=\"([^>]+)\">/i)[1]
else
response['location']
end
end
end
wordpress = RedirectFollower.new('http://wordpress.org/latest.zip').resolve
puts wordpress.url
File.open("latest.zip", "w") do |file|
file.write wordpress.body
end

Fetch URL (with params) using Ruby

Could someone tell me how I can fetch (GET) a URL (with params) using Ruby? I found a bunch of examples online but I couldn't find one that explained how I can also pass the parameters.
require 'net/http'
require 'uri'
uri = URI.parse("http://www.example.com/?test=1")
response = Net::HTTP.get_response uri
p response.body
There are also some other good HTTP clients or wrappers, such as HTTParty.
require 'rubygems'
require 'httparty'
response = HTTParty.get("http://www.example.com/?test=1")
p response.body
I use something like the following, it's pretty simple and doesn't make you build your own query string:
require 'net/http'
response = nil
Net::HTTP.start "example.com", 80 do |http|
request = Net::HTTP::Get.new "/endpoint"
request.form_data = {:q => "123"}
response = http.request(request)
end
I missed this one. The solutions are here.
Parametrized get request in Ruby?

Resources