Ruby recursive method needs return clause - ruby

I could be missing the obvious here, but this code is failing unless I use the return clause when call it recursively on Net::HTTPRedirection case.
def fetch_headers(limit = REDIRECT_LIMIT)
# You should choose a better exception.
raise ArgumentError, 'too many HTTP redirects' if limit == 0
http = Net::HTTP.new(#uri.host, #uri.port)
http.use_ssl = true if #uri.scheme == 'https'
request_uri = #uri.request_uri.nil? ? '/' : #uri.request_uri
http.request_head(request_uri) do |response|
case response
when Net::HTTPSuccess then
return response
when Net::HTTPRedirection then
location = response['location']
parsed_location = URI.parse location
#uri = parsed_location.absolute? ? parsed_location : #uri.merge(parsed_location)
fetch_headers(limit - 1)
else
return response.value
end
end
end
The caller method:
def perform(link_id)
link = Link.find(link_id)
url = link.url =~ /^http/ ? link.url : "http://#{link.url}"
#uri = URI.parse url
headers = fetch_headers
case headers.content_type
when /application/
filename = File.basename(#uri.path)
link.update title: filename
when /html/
response = fetch_page
page = Nokogiri::HTML(response)
link.update title: page_title(page), description: page_description(page)
else
logger.warn "URL #{url} with unknow mime-type: #{headers.content_type}"
end
end
Here is the spec I am running:
it 'follows the redirects using relative URL' do
link = create(:link, url: url)
path = '/welcome.html'
stub_request(:head, url).to_return(status: 302, body: '',
headers: { 'Location' => path })
redirect_url = "#{url}#{path}"
stub_request(:head, redirect_url).to_return(status: 200, body: '',
headers: html_header)
stub_request(:get, redirect_url).to_return(status: 200, body: title_html_raw,
headers: html_header)
UrlScrapperJob.perform_now link.id
link.reload
expect(link.title).to match(/page title/)
end
Here are the result of fetch_headers method:
With the return clause: #<Net::HTTPOK 200 readbody=true>
Without the return clause: #<Net::HTTPFound 302 readbody=true>
The result I would expect would be the HTTPOK 200 because it should follow the redirects until a 200 OK.

The difference is the value, returned from fetch_headers function.
return returns it’s argument as a result of a function call.
Without explicit return, the return value is what http.request_head(request_uri, &block) returns, which is apparently causes the infinite recursion.
You might want to try
http.request_head(request_uri) do |response|
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
location = response['location']
parsed_location = URI.parse location
#uri = parsed_location.absolute? ? parsed_location : #uri.merge(parsed_location)
fetch_headers(limit - 1)
else
response.value
end
end.tap { |result| puts result } # ⇐ here
to examine what is actual result without explicit return.

Related

Ruby Net::HTTP passing headers through the creation of request

Maybe I'm just blind but many post about passing headers in Net::HTTP follows the lines of
require 'net/http'
uri = URI("http://www.ruby-lang.org")
req = Net::HTTP::Get.new(uri)
req['some_header'] = "some_val"
res = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
puts res.body
(From Ruby - Send GET request with headers metaphori's answer)
And from the Net::HTTP docs (https://docs.ruby-lang.org/en/2.0.0/Net/HTTP.html)
uri = URI('http://example.com/cached_response')
file = File.stat 'cached_response'
req = Net::HTTP::Get.new(uri)
req['If-Modified-Since'] = file.mtime.rfc2822
res = Net::HTTP.start(uri.hostname, uri.port) {|http|
http.request(req)
}
open 'cached_response', 'w' do |io|
io.write res.body
end if res.is_a?(Net::HTTPSuccess)
But what is the advantage of doing the above when you can pass the headers via the following way?
options = {
'headers' => {
'Content-Type' => 'application/json'
}
}
request = Net::HTTP::Get.new('http://www.stackoverflow.com/', options['headers'])
This allows you to parameterize the headers and can allow for multiple headers very easily.
My main question is, what is the advantage of passing the headers in the creation of Net::HTTP::Get vs passing them after the creation of Net::HTTP::Get
Net::HTTPHeader already goes ahead and assigns the headers in the function
def initialize_http_header(initheader)
#header = {}
return unless initheader
initheader.each do |key, value|
warn "net/http: duplicated HTTP header: #{key}", uplevel: 1 if key?(key) and $VERBOSE
if value.nil?
warn "net/http: nil HTTP header: #{key}", uplevel: 1 if $VERBOSE
else
value = value.strip # raise error for invalid byte sequences
if value.count("\r\n") > 0
raise ArgumentError, 'header field value cannot include CR/LF'
end
#header[key.downcase] = [value]
end
end
end
So doing
request['some_header'] = "some_val" almost seems like code duplication.
There is no advantage for setting headers one way or another, at least not that I can think of. It comes down to your own preference. In fact, if you take a look at what happens when you supply headers while initializing a new Net::Http::Get, you will find that internally, Ruby simply sets the headers onto a #headers variable:
https://github.com/ruby/ruby/blob/c5eb24349a4535948514fe765c3ddb0628d81004/lib/net/http/header.rb#L25
And if you set the headers using request[name] = value, you can see that Net::Http does the exact same thing, but in a different method:
https://github.com/ruby/ruby/blob/c5eb24349a4535948514fe765c3ddb0628d81004/lib/net/http/header.rb#L46
So the resulting object has the same configuration no matter which way you decide to pass the request headers.

Ruby Net/Http how to get body of page with status code 3xx

I use net/http ruby's library to get the html response, but i can't get the body of the page with the status code 3xx
Page Body:
<div class="flash-container">
<div class="flash flash-success">
Il tuo indirizzo email è stato modificato con successo.
×
</div>
</div>
Request:
require 'net/http'
require 'uri'
http = Net::HTTP.new(uri.host, uri.port)
http.use_ssl = true
request = Net::HTTP::Post.new(uri.request_uri)
request.set_form_data({
'email' => email,
'email-confirm' => email_confirm,
'password' => password
})
request['Cookie'] = 'ACCOUNT_SESSID=' + token
response = http.request(request)
Response:
response.code # '302'
response.body # ''
You'll likely need to follow the redirect (302 code). The Ruby docs have a great example for doing this.
I've included this below, along with a check to return the body if it exists. If you never want to follow the redirect, you could change the else condition to return response.code, and empty string, false, or whatever's appropriate. Here's the full example:
def fetch(uri_str, limit = 10)
raise ArgumentError, 'too many HTTP redirects' if limit == 0
response = Net::HTTP.get_response(URI(uri_str))
case response
when Net::HTTPSuccess then
response
when Net::HTTPRedirection then
if response.body_permitted?
response
else
location = response['location']
warn "redirected to #{location}"
fetch(location, limit - 1)
end
else
response.value
end
end
The code is pretty straight forward, calling itself recursively if the code from Net::HTTP.get_response returns a redirect, pointing to the new location.
You can follow up to ten redirects with this approach, which should be ample, though should likely adjust to suit or circumstances.
Then, when you run fetch(your_url), it should follow the redirect until it lands on a page and can return the body. I.E.
res = fetch(your_url)
res.body
Let me know how you get on with this, or if you've any questions!

Ruby + Net::HTTP: How do I send two XML documents in one POST request?

I have to send two XML documents in my request to the UPS API (here's my original question What is the root of this XML document? )
How would I do this?
def make_initial_request
uri = URI.parse(UPS_API['confirm_url'])
https = Net::HTTP.new(uri.host, uri.port)
https.use_ssl = true
headers = {'Content-Type' => 'text/xml'}
request = Net::HTTP::Post.new(uri.path, headers)
request.body = xml_for_initial_request #<-- how do i split this into two documents?
#request.body = second_xml_document #<-- i want something like that. could i just use << ?
begin
response = https.request(request)
rescue
return nil
end
puts "response: #{response.code} #{response.message}: #{response.body}"
return nil if response.body.include?("Error")
end
You should use MIME Multipart messages if the API support them (ruby gem).
Otherwise just try to concatenate files' contents request.body = "#{xml_for_initial_request}\n#{second_xml_document}"

Parameters for a Ruby HTTP Put call

I'm having trouble getting parameters passed in an HTTP Put call, using ruby. Take a look at the "put_data" variable.
When I leave it as a hash, ruby says:
undefined method `bytesize' for #<Hash:0x007fbf41a109e8>
if I convert to a string, I get:
can't convert Net::HTTPUnauthorized into String
I've also tried doing just - '?token=wBsB16NSrfVDpZPoEpM'
def process_activation
uri = URI("http://localhost:3000/api/v1/activation/" + self.member_card_num)
Net::HTTP.start(uri.host, uri.port) do |http|
headers = {'Content-Type' => 'text/plain; charset=utf-8'}
put_data = {:token => "wBsB16NSrfVDpZPoEpM"}
response = http.send_request('PUT', uri.request_uri, put_data, headers)
result = JSON.parse(response)
end
if result['card']['state']['state'] == "active"
return true
else
return false
end
end
I've searched all around, including rubydocs, but can't find an example of how to encode parameters. Any help would be appreciated.
Don't waste your time with NET::HTTP. I used 'rest-client' and had this thing done in minutes...
def process_activation
response = RestClient.put 'http://localhost:3000/api/v1/card_activation/'+ self.member_card_num, :token => "wBsB1pjJNNfiK6NSrfVDpZPoEpM"
result = JSON.parse(response)
return result['card']['state']['state'] == "active"
end

Ruby - net/http - following redirects

I've got a URL and I'm using HTTP GET to pass a query along to a page. What happens with the most recent flavor (in net/http) is that the script doesn't go beyond the 302 response. I've tried several different solutions; HTTPClient, net/http, Rest-Client, Patron...
I need a way to continue to the final page in order to validate an attribute tag on that pages html. The redirection is due to a mobile user agent hitting a page that redirects to a mobile view, hence the mobile user agent in the header. Here is my code as it is today:
require 'uri'
require 'net/http'
class Check_Get_Page
def more_http
url = URI.parse('my_url')
req, data = Net::HTTP::Get.new(url.path, {
'User-Agent' => 'Mozilla/5.0 (iPhone; U; CPU iPhone OS 4_3_2 like Mac OS X; en-us) AppleWebKit/533.17.9 (KHTML, like Gecko) Version/5.0.2 Mobile/8H7 Safari/6533.18.5'
})
res = Net::HTTP.start(url.host, url.port) {|http|
http.request(req)
}
cookie = res.response['set-cookie']
puts 'Body = ' + res.body
puts 'Message = ' + res.message
puts 'Code = ' + res.code
puts "Cookie \n" + cookie
end
end
m = Check_Get_Page.new
m.more_http
Any suggestions would be greatly appreciated!
To follow redirects, you can do something like this (taken from ruby-doc)
Following Redirection
require 'net/http'
require 'uri'
def fetch(uri_str, limit = 10)
# You should choose better exception.
raise ArgumentError, 'HTTP redirect too deep' if limit == 0
url = URI.parse(uri_str)
req = Net::HTTP::Get.new(url.path, { 'User-Agent' => 'Mozilla/5.0 (etc...)' })
response = Net::HTTP.start(url.host, url.port, use_ssl: true) { |http| http.request(req) }
case response
when Net::HTTPSuccess then response
when Net::HTTPRedirection then fetch(response['location'], limit - 1)
else
response.error!
end
end
print fetch('http://www.ruby-lang.org/')
Given a URL that redirects
url = 'http://httpbin.org/redirect-to?url=http%3A%2F%2Fhttpbin.org%2Fredirect-to%3Furl%3Dhttp%3A%2F%2Fexample.org'
A. Net::HTTP
begin
response = Net::HTTP.get_response(URI.parse(url))
url = response['location']
end while response.is_a?(Net::HTTPRedirection)
Make sure that you handle the case when there are too many redirects.
B. OpenURI
open(url).read
OpenURI::OpenRead#open follows redirects by default, but it doesn't limit the number of redirects.
I wrote another class for this based on examples given here, thank you very much everybody. I added cookies, parameters and exceptions and finally got what I need: https://gist.github.com/sekrett/7dd4177d6c87cf8265cd
require 'uri'
require 'net/http'
require 'openssl'
class UrlResolver
def self.resolve(uri_str, agent = 'curl/7.43.0', max_attempts = 10, timeout = 10)
attempts = 0
cookie = nil
until attempts >= max_attempts
attempts += 1
url = URI.parse(uri_str)
http = Net::HTTP.new(url.host, url.port)
http.open_timeout = timeout
http.read_timeout = timeout
path = url.path
path = '/' if path == ''
path += '?' + url.query unless url.query.nil?
params = { 'User-Agent' => agent, 'Accept' => '*/*' }
params['Cookie'] = cookie unless cookie.nil?
request = Net::HTTP::Get.new(path, params)
if url.instance_of?(URI::HTTPS)
http.use_ssl = true
http.verify_mode = OpenSSL::SSL::VERIFY_NONE
end
response = http.request(request)
case response
when Net::HTTPSuccess then
break
when Net::HTTPRedirection then
location = response['Location']
cookie = response['Set-Cookie']
new_uri = URI.parse(location)
uri_str = if new_uri.relative?
url + location
else
new_uri.to_s
end
else
raise 'Unexpected response: ' + response.inspect
end
end
raise 'Too many http redirects' if attempts == max_attempts
uri_str
# response.body
end
end
puts UrlResolver.resolve('http://www.ruby-lang.org')
The reference that worked for me is here: http://shadow-file.blogspot.co.uk/2009/03/handling-http-redirection-in-ruby.html
Compared to most examples (including the accepted answer here), it's more robust as it handles URLs which are just a domain (http://example.com - needs to add a /), handles SSL specifically, and also relative URLs.
Of course you would be better off using a library like RESTClient in most cases, but sometimes the low-level detail is necessary.
Maybe you can use curb-fu gem here https://github.com/gdi/curb-fu the only thing is some extra code to make it follow redirect. I've used the following before. Hope it helps.
require 'rubygems'
require 'curb-fu'
module CurbFu
class Request
module Base
def new_meth(url_params, query_params = {})
curb = old_meth url_params, query_params
curb.follow_location = true
curb
end
alias :old_meth :build
alias :build :new_meth
end
end
end
#this should follow the redirect because we instruct
#Curb.follow_location = true
print CurbFu.get('http://<your path>/').body
If you do not need to care about the details at each redirection, you can use the library Mechanize
require 'mechanize'
agent = Mechanize.new
begin
response = #agent.get(url)
rescue Mechanize::ResponseCodeError
// response codes other than 200, 301, or 302
rescue Timeout::Error
rescue Mechanize::RedirectLimitReachedError
rescue StandardError
end
It will return the destination page.
Or you can turn off redirection by this :
agent.redirect_ok = false
Or you can optionally change some settings at the request
agent.user_agent = "Mozilla/5.0 (Linux; Android 6.0; Nexus 5 Build/MRA58N) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/80.0.3987.106 Mobile Safari/537.36"

Resources