I would like to take information from another website. Therefore (maybe) I should make a request to that website (in my case a HTTP GET request) and receive the response.
How can I make this in Ruby on Rails?
If it is possible, is it a correct approach to use in my controllers?
You can use Ruby's Net::HTTP class:
require 'net/http'
url = URI.parse('http://www.example.com/index.html')
req = Net::HTTP::Get.new(url.to_s)
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
puts res.body
Net::HTTP is built into Ruby, but let's face it, often it's easier not to use its cumbersome 1980s style and try a higher level alternative:
HTTP Gem
HTTParty
RestClient
Excon
Feedjira (RSS only)
OpenURI is the best; it's as simple as
require 'open-uri'
response = open('http://example.com').read
require 'net/http'
result = Net::HTTP.get(URI.parse('http://www.example.com/about.html'))
# or
result = Net::HTTP.get(URI.parse('http://www.example.com'), '/about.html')
I prefer httpclient over Net::HTTP.
client = HTTPClient.new
puts client.get_content('http://www.example.com/index.html')
HTTParty is a good choice if you're making a class that's a client for a service. It's a convenient mixin that gives you 90% of what you need. See how short the Google and Twitter clients are in the examples.
And to answer your second question: no, I wouldn't put this functionality in a controller--I'd use a model instead if possible to encapsulate the particulars (perhaps using HTTParty) and simply call it from the controller.
Here is the code that works if you are making a REST api call behind a proxy:
require "uri"
require 'net/http'
proxy_host = '<proxy addr>'
proxy_port = '<proxy_port>'
proxy_user = '<username>'
proxy_pass = '<password>'
uri = URI.parse("https://saucelabs.com:80/rest/v1/users/<username>")
proxy = Net::HTTP::Proxy(proxy_host, proxy_port, proxy_user, proxy_pass)
req = Net::HTTP::Get.new(uri.path)
req.basic_auth(<sauce_username>,<sauce_password>)
result = proxy.start(uri.host,uri.port) do |http|
http.request(req)
end
puts result.body
My favorite two ways to grab the contents of URLs are either OpenURI or Typhoeus.
OpenURI because it's everywhere, and Typhoeus because it's very flexible and powerful.
Related
I'm coding a native Ruby script to scrap a website using Nokogiri, whenever I pass proxy options to the open-uri open() method, it returns 407 Proxy Authentication Required but my options does have the authentification details, here's my code
proxy_url = URI.parse("http://12.34.567.89:PORT")
session = Nokogiri::HTML(open("http://google.com", :proxy_http_basic_authentication =>[proxy_url, "username", "password"]
Note: As my proxy is premium, I have replaced real proxy credentials with fake one
I have a restrictive proxy at work but the followig works.
Try the code with your proxy credentials.
I used Nokogiri here for parsing but you don't realy need it for getting the HTML.
require 'net/http'
require 'uri'
require 'nokogiri'
url = 'http://stackoverflow.com/questions/32818853/ruby-open-uri-proxy-authentication-fails'
proxy_host, proxy_port, proxy_user, proxy_pass = '****', 8080, "*****", "*****"
uri = URI.parse(url)
Net::HTTP::Proxy(proxy_host, proxy_port, proxy_user, proxy_pass).start(uri.host, uri.port) do |http|
http.get(uri.path) do |str|
puts Nokogiri::HTML(str).text
end
end
I am using Typhoeus with Hydra in order to make parallel requests . my end goal is to parse the typhoeus response into mechanize object.
url = "http://example.com/"
hydra = Typhoeus::Hydra.new
agent = Mechanize.new
request = Typhoeus::Request.new(url, :method => :get, :proxy => "#{proxy_host}:#{proxy_port}")
request.on_complete do |response| #Typhoeus::response object
body = response.body
uri = request.parsed_uri
page = agent.parse(uri, response, body)
end
hydra.queue(request)
hydra.run
the agent.parse method is giving me error because it cannot parse the typhoeus response object
/usr/local/rvm/gems/ruby-1.9.3-p194/gems/mechanize-2.5.1/lib/mechanize.rb:1165:in `parse': undefined method `[]' for #<Typhoeus::Response:0x00000012cd9da0> (NoMethodError)
Is there anyway i can convert Typhoeus response into Net::HTTPResponse object ?
Or is there any other way I can club Mechanize and Typhoeus together? So that, I can make parallel requests with typhoeus and scrape the data with Mechanize library.
I tried to create a Net::HTTPResponse(https://github.com/ruby/ruby/blob/trunk/lib/net/http/response.rb) from a Typhoeus::Response, but it didn't work out. Calling the initializer is easy, but setting the response body or headers not.
I looked into mechanize to see if it can be changed to use Typhoeus for making requests but I don't think thats possible right now. Net/http is really hard-wired into mechanize. I thought of a mechanize-typhoeus adapter, which would be nice.
Im using Net::HTTP in my ruby code to make http requests. For example to make a post request i do
require 'net/http'
Net::HTTP.post_form(url,{'email' => email,'password' => password})
This works. But im unable to make a delete request, i.e.
require 'net/http'
Net::HTTP::Delete(url)
gives the following error
NoMethodError: undefined method `Delete' for Net::HTTP:Class
The documentation at http://ruby-doc.org/stdlib-1.9.3/libdoc/net/http/rdoc/Net/HTTP.html shows Delete is available. So why is it not working in my case ?
Thank You
The documentation tells you that Net::HTTP::Delete is a class, not a method.
Try Net::HTTP.new('www.server.com').delete('/path') instead.
uri = URI('http://localhost:8080/customer/johndoe')
http = Net::HTTP.new(uri.host, uri.port)
req = Net::HTTP::Delete.new(uri.path)
res = http.request(req)
puts "deleted #{res}"
Simple post and delete requests, see docs for more:
puts Net::HTTP.new("httpbin.org").post("/post", "a=1").body
puts Net::HTTP.new("httpbin.org").delete("/delete").body
This works for me:
uri = URI(YOUR_URL)
req = Net::HTTP::Delete.new(uri, {}) # params on second place
response = Net::HTTP.start(uri.host, uri.port, use_ssl: true) do |http|
http.request req
end
How do I take this URL http://t.co/yjgxz5Y and get the destination URL which is http://nickstraffictricks.com/4856_how-to-rank-1-in-google/
require 'net/http'
require 'uri'
Net::HTTP.get_response(URI.parse('http://t.co/yjgxz5Y'))['location']
# => "http://nickstraffictricks.com/4856_how-to-rank-1-in-google/"
I've used open-uri for this, because it's nice and simple. It will retrieve the page, but will also follow multiple redirects:
require 'open-uri'
final_uri = ''
open('http://t.co/yjgxz5Y') do |h|
final_uri = h.base_uri
end
final_uri # => #<URI::HTTP:0x00000100851050 URL:http://nickstraffictricks.com/4856_how-to-rank-1-in-google/>
The docs show a nice example for using the lower-level Net::HTTP to handle redirects.
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
response = Net::HTTP.get_response(URI.parse(uri_str))
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
puts fetch('http://www.ruby-lang.org')
Of course this all breaks down if the page isn't using a HTTP redirect. A lot of sites use meta-redirects, which you have to handle by retrieving the URL from the meta tag, but that's a different question.
For resolving redirects you should use a HEAD request to avoid downloading the whole response body (imagine resolving a URL to an audio or video file).
Working example using the Faraday gem:
require 'faraday'
require 'faraday_middleware'
def resolve_redirects(url)
response = fetch_response(url, method: :head)
if response
return response.to_hash[:url].to_s
else
return nil
end
end
def fetch_response(url, method: :get)
conn = Faraday.new do |b|
b.use FaradayMiddleware::FollowRedirects;
b.adapter :net_http
end
return conn.send method, url
rescue Faraday::Error, Faraday::Error::ConnectionFailed => e
return nil
end
puts resolve_redirects("http://cre.fm/feed/m4a") # http://feeds.feedburner.com/cre-podcast
You would have to follow the redirect. I think that would help :
http://shadow-file.blogspot.com/2009/03/handling-http-redirection-in-ruby.html
Could someone tell me how I can fetch (GET) a URL (with params) using Ruby? I found a bunch of examples online but I couldn't find one that explained how I can also pass the parameters.
require 'net/http'
require 'uri'
uri = URI.parse("http://www.example.com/?test=1")
response = Net::HTTP.get_response uri
p response.body
There are also some other good HTTP clients or wrappers, such as HTTParty.
require 'rubygems'
require 'httparty'
response = HTTParty.get("http://www.example.com/?test=1")
p response.body
I use something like the following, it's pretty simple and doesn't make you build your own query string:
require 'net/http'
response = nil
Net::HTTP.start "example.com", 80 do |http|
request = Net::HTTP::Get.new "/endpoint"
request.form_data = {:q => "123"}
response = http.request(request)
end
I missed this one. The solutions are here.
Parametrized get request in Ruby?