Ruby: abort OpenURI based on content length - ruby

Ruby's OpenURI provides a content_length_proc option which allows determining* content length before the actual transfer is started:
open(url, :content_length_proc => lambda { |content_length|
puts "Content Length: #{content_length}"
}) { |data|
# data.meta, data.read etc.
}
Is there a way for this proc to abort the actual, full retrieval?
* I'm aware this is not reliable - but it's sufficient for a simple heuristic in my case

This is the corresponding code from open-uri.rb:
if options[:content_length_proc] && Net::HTTPSuccess === resp
if resp.key?('Content-Length')
options[:content_length_proc].call(resp['Content-Length'].to_i)
else
options[:content_length_proc].call(nil)
end
end
resp.read_body {|str|
...
}
So as you can see the return value of content_length_proc is ignored. But, what you could simply do to cancel the operation is raise some form of error in the callback - this will effectively abort further execution. If you raise a dedicated error class you could even rescue it and react to that specific situation:
begin
open(url, :content_length_proc => lambda { |content_length|
puts "Content Length: #{content_length}"
}) { |data|
# data.meta, data.read etc.
}
rescue MyError
# react to it
end

Related

Workaround for Timeouts with Http.rb and Celluloid?

I know that current timeouts are currently not supported with Http.rb and Celluloid[1], but is there an interim workaround?
Here's the code I'd like to run:
def fetch(url, options = {} )
puts "Request -> #{url}"
begin
options = options.merge({ socket_class: Celluloid::IO::TCPSocket,
timeout_class: HTTP::Timeout::Global,
timeout_options: {
connect_timeout: 1,
read_timeout: 1,
write_timeout: 1
}
})
HTTP.get(url, options)
rescue HTTP::TimeoutError => e
[do more stuff]
end
end
Its goal is to test a server as being live and healthy. I'd be open to alternatives (e.g. %x(ping <server>)) but these seem less efficient and actually able to get at what I'm looking for.
[1] https://github.com/httprb/http.rb#celluloidio-support
You can set timeout on future calls when you fetch for the request
Here is how to use timeout with Http.rb and Celluloid-io
require 'celluloid/io'
require 'http'
TIMEOUT = 10 # in sec
class HttpFetcher
include Celluloid::IO
def fetch(url)
HTTP.get(url, socket_class: Celluloid::IO::TCPSocket)
rescue Exception => e
# error
end
end
fetcher = HttpFetcher.new
urls = %w(http://www.ruby-lang.org/ http://www.rubygems.org/ http://celluloid.io/)
# Kick off a bunch of future calls to HttpFetcher to grab the URLs in parallel
futures = urls.map { |u| [u, fetcher.future.fetch(u)] }
# Consume the results as they come in
futures.each do |url, future|
# Wait for HttpFetcher#fetch to complete for this request
response = future.value(TIMEOUT)
puts "*** Got #{url}: #{response.inspect}\n\n"
end

Return from context above

This question is little complicated to formulate but I will do my best. Trough our code we have snippets such as
response = do_something()
return response unless response.ok?
I was think of writing wrapper method which would remove need for this step, and it would look something like this
def rr(&block)
response = yield
unless response.ok?
# somehow do return but in context above (magic needed here)
end
response
end
After that I would be able to minimize code from above to be
response = rr { do_something() }
Seems impossible but this is Ruby so maybe there is a way?
The correct way to return across multiple layers of the stack when something goes wrong (which appears to be what you are trying to do) is to raise an exception:
class RequestFailedException < StandardError; end
def rr(&block)
response = yield
unless response.ok?
raise RequestFailedException, "Response not okay: #{response.inspect}"
end
response
end
Usage:
def do_lots_of_things()
rr { do_something }
rr { do_something_else }
rr { another_thing }
end
begin
do_lots_of_things
rescue RequestFailedException => e
# Handle or ignore error
end
Wouldn't you want to just write a wrapper that does exactly that? Functionally it seems you're just ignoring non-ok? responses:
def rr
response = yield
response.ok? ? response : nil
end
Maybe I'm missing something here but I don't see why you'd need to force a return in another context, something that's not even possible anyway.

Why my eventmachine client code doesn't work asynchronously?

def index
p "INDEX, #{Fiber.current.object_id}" # <- #1
EventMachine.run {
http = EventMachine::HttpRequest.new('http://google.com/').get :query => {'keyname' => 'value'}
http.errback { p "Uh oh, #{Fiber.current.object_id}"; EM.stop } # <- #2
http.callback {
p "#{http.response_header.status}, #{Fiber.current.object_id}" # <- #3
p "#{http.response_header}"
p "#{http.response}"
EventMachine.stop
}
}
render text: 'test1'
end
In this code, I expected getting different Fiber id at #1, #2, #3 line. But all fiber objects' id was same. I tried Thread.current.object_id, but it was same result too.
What am I misunderstaning? Is that code even executes asynchronously?
P.S I'm using ruby 2.0 and the code is running with rails4
http://ruby-doc.org/core-2.0/Fiber.html
Fibers are primitives for implementing light weight cooperative
concurrency in Ruby. Basically they are a means of creating code
blocks that can be paused and resumed, much like threads. The main
difference is that they are never preempted and that the scheduling
must be done by the programmer and not the VM.
Where in your code are you scheduling fibers, e.g. calling Fiber.yield or my_fiber.resume?
current() → fiber
Returns the current fiber.
You need to require 'fiber' before using this method. If you are not
running in the context of a fiber this method will return the root
fiber.
Where in your code have you created additional fibers, e.g. Fiber.new do ...?
Is that code even executes asynchronously?
require 'em-http-request'
require 'fiber'
puts Fiber.current.object_id
def index
p "INDEX, #{Fiber.current.object_id}" # <- #1
EventMachine.run {
http = EventMachine::HttpRequest.new('http://google.com/').get :query => {'keyname' => 'value'}
http.errback { p "#{Uh oh}, #{Fiber.current.object_id}"; EM.stop } # <- #2
http.callback {
p "#{http.response_header.status}, #{Fiber.current.object_id}" # <- #3
p "#{http.response_header}"
p "#{http.response}"
EventMachine.stop
}
}
#render text: 'test1'
end
index()
--output:--
2157346420
"INDEX, 2157346420"
"301, 2157346420"
"{\"LOCATION\"=>\"http://www.google.com/?keyname=value\", \"CONTENT_TYPE\"=>\"text/html; charset=UTF-8\", \"DATE\"=>\"Mon, 22 Jul 2013 08:44:35 GMT\", \"EXPIRES\"=>\"Wed, 21 Aug 2013 08:44:35 GMT\", \"CACHE_CONTROL\"=>\"public, max-age=2592000\", \"SERVER\"=>\"gws\", \"CONTENT_LENGTH\"=>\"233\", \"X_XSS_PROTECTION\"=>\"1; mode=block\", \"X_FRAME_OPTIONS\"=>\"SAMEORIGIN\", \"CONNECTION\"=>\"close\"}"
"<HTML><HEAD><meta http-equiv=\"content-type\" content=\"text/html;charset=utf-8\">\n<TITLE>301 Moved</TITLE></HEAD><BODY>\n<H1>301 Moved</H1>\nThe document has moved\nhere.\r\n</BODY></HTML>\r\n"
Nope.
And this is an error:
http.errback { p "#{Uh oh}" ...
As a search of the repository shows, em-http does not use fibers by default. The link, however, lists an example of how you can use fibers if you are so inclined.

better ruby - possible to not repeat parts of a function?

I have a little sinatra api that I'm trying to beautify. Most of my routes are simple db operations but a few involve calling an external service before doing db operations. In all cases most of the code is the same except for how I respond to the service response. Is there any slick functional programming approach?
Here's an example of one of these routes:
get '/update_x' do
validateParams(params,:x)
xid = params[:x]
xName = getNameFromId(xid)
if xName
# Make request to proxy service
rid = generateRandomHexNumber(16) # generate requestId
params['m'] = 'set'
params['rid'] = rid
json = "{}"
begin
response = #resource["/"+"?rid=#{rid}&id=#{xid}&json=#{json}"].get
status = response.code
body = response.body
parsed_json = JSON(body)
if parsed_json['response'] and parsed_json['response']['success'] and parsed_json['response']['success']=='false'
msg = {:success => "false", :response => "unknown error"}
if parsed_json['response']['response']
msg = {:success => "false", :response => parsed_json['response']['response']}
end
content_type :json
msg.to_json
else
#### Here is stuff specific to this api call
updateDBHelper(xid,buildUpdateOptions(params))
params['ss_status'] = status
content_type :json
params.to_json
#### End specific to api call
end
rescue Exception=>e
params['ss_status'] = status
params['exception'] = e
content_type :json
params.to_json
end
else
msg = {:success => "false", :response => "Not found"}
content_type :json
msg.to_json
end
end
In general if you have a general pattern with some arbitrary code that changes every time then the simplest thing would be accepting a block with those customizations.
def make_api_request(some, params)
# do what you need to do
yield(variables, that, your_custom_code, needs)
# do some more, maybe cleanup
end
get '/some_route' do
make_api_request do |variables, that, your_custom_code, needs|
# do custom stuff here
end
end

Ruby EventMachine - how to return values from EM::Deferrable to main EM loop?

I'm playing with EventMachine for some days now which has a steep learn curve IMHO ;-) I try to return a hash by triggering HttpHeaderCrawler.query() which I need within the callback. But what I get in this case is not the hash {'http_status' => xxx, 'http_version' => xxx} but an EventMachine::HttpClient Object itself.
I wanna keep the EM.run block clean and wanna do all logic within own classes / modules so how to return such a value into the main loop to access it by the callback? Many thanks in advance ;-)
#!/usr/bin/env ruby
require 'eventmachine'
require 'em-http-request'
class HttpHeaderCrawler
include EM::Deferrable
def query(uri)
http = EM::HttpRequest.new(uri).get
http.callback do
http_header = {
"http_status" => http.response_header.http_status,
"http_version" => http.response_header.http_version
}
puts "Returns to EM main loop: #{http_header}"
succeed(http_header)
end
end
end
EM.run do
domains = ['http://www.google.com', 'http://www.facebook.com', 'http://www.twitter.com']
domains.each do |domain|
hdr = HttpHeaderCrawler.new.query(domain)
hdr.callback do |header|
puts "Received from HttpHeaderCrawler: #{header}"
end
end
end
This snippet produces the following output:
Returns to EM main loop: {"http_status"=>302, "http_version"=>"1.1"}
Received from HttpHeaderCrawler: #<EventMachine::HttpClient:0x00000100d57388>
Returns to EM main loop: {"http_status"=>301, "http_version"=>"1.1"}
Received from HttpHeaderCrawler: #<EventMachine::HttpClient:0x00000100d551a0>
Returns to EM main loop: {"http_status"=>200, "http_version"=>"1.1"}
Received from HttpHeaderCrawler: #<EventMachine::HttpClient:0x00000100d56280>
I think the problem is #query returns http.callback, which returns the http object itself, whereas it should return self, i.e. the HttpHeaderCrawler. See if this works.
def query(uri)
http = EM::HttpRequest.new(uri).get
http.callback do
http_header = {
"http_status" => http.response_header.http_status,
"http_version" => http.response_header.http_version
}
puts "Returns to EM main loop: #{http_header}"
succeed(http_header)
end
self
end

Resources