Ruby library to make multiple HTTP requests simultaneously - ruby

I'm looking for an alternate Ruby HTTP library that makes multiple HTTP calls simultaneously and performs better than the core Net::HTTP library.

You are probably looking for Typhoeus.
Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
https://github.com/typhoeus/typhoeus

Why do you need a networking library handle parallelism? That is exactly what threads are for.
require "open-uri"
fetcher = lambda do |uri|
puts "Started fetching #{uri}"
puts open(uri).read
puts "Stopped fetching #{uri}"
end
thread1 = Thread.new("http://localhost:9292", &fetcher)
thread2 = Thread.new("http://localhost:9293", &fetcher)
thread1.join
thread2.join
Also, I don't understand what do you mean by "performs better". Core libraries are usually good enough to be in the core. Do you have any problems with Net::HTTP?

You can use the Parallel gem, it should work with any Ruby HTTP library.

Not sure if it performs better then Typhoeus, BUT you could use Eventmacheine + em-http-request. There is an example for sending multiple requests.
require 'eventmachine'
require 'em-http'
EventMachine.run {
multi = EventMachine::MultiRequest.new
reqs = [
'http://google.com/',
'http://google.ca:81/'
]
reqs.each_with_index do |url, idx|
http = EventMachine::HttpRequest.new(url, :connect_timeout => 1)
req = http.get
multi.add idx, req
end
multi.callback do
p multi.responses[:callback].size
p multi.responses[:errback].size
EventMachine.stop
end
}
https://github.com/igrigorik/em-http-request

Related

How to pass data between task in Ruby Rake?

So how to pass data between task in Ruby Rake?
Believe me I read a lot bout this in the internet and none makes sense.
I found the ENV['some_var'] approach, But I can't share objects without some conversions, unnecessary additional conversions that cost time to me and the processor :(. Additionally: "Come on, is this the best that they made up?"
Somebody said use instance or class variables. It is hacky isn't it? I mean it is semantic fiasco. The modules in Ruby are places to put methods and constants- I read somewhere, which makes sense at some some extend, but class and instance variables in modules? Why classes then?
So how can I share data between two rake tasks without doing some hacking?
For example: How to pass the response object from task get to task ping_server:
require 'net/http'
require 'uri'
namespace :http_request do
desc 'Request server to obtain status, and stores the data in the memcache specified by the environment'
# This task is made generic so it can serve as a low level routine for other tasks.
# Thus avoiding repetitive code.
task :get, [:url] => [:environment] do |t, args|
#configuration
WAIT_RESPONSE_TO_IN_SECONDS = 5
uri = URI.parse(args[:url])
http = Net::HTTP.new(uri.host, uri.port)
# We cannot wait for response forever, therefore provide timeout
http.open_timeout = WAIT_RESPONSE_TO_IN_SECONDS # in seconds
request = Net::HTTP::Get.new(uri.path)
# The response may take too long, or the URI may be bad(invalid)
begin
response = http.request(request)
puts response.code
ENV['req_response'] = {status: "ok", val: response.inspect}.to_s
# Rails.cache.write(args[:name], response.code)
rescue Exception => e
puts "\nRequest filed: #{e}\n"
ENV['req_response'] = {status: e.to_s, val: nil.to_s}.to_s
end
end
end
namespace :server_state do
desc "write cache"
task :ping_server, [:url] => "http_request:get" do
response = eval(ENV['req_response'])
puts "\n\nRESULT = #{response}"
puts "\n\nRESULT = #{response[:val]}"
end
end

How to get HTTP headers before downloading with Ruby's OpenUri

I am currently using OpenURI to download a file in Ruby. Unfortunately, it seems impossible to get the HTTP headers without downloading the full file:
open(base_url,
:content_length_proc => lambda {|t|
if t && 0 < t
pbar = ProgressBar.create(:total => t)
end
},
:progress_proc => lambda {|s|
pbar.progress = s if pbar
}) {|io|
puts io.size
puts io.meta['content-disposition']
}
Running the code above shows that it first downloads the full file and only then prints the header I need.
Is there a way to get the headers before the full file is downloaded, so I can cancel the download if the headers are not what I expect them to be?
You can use Net::HTTP for this matter, for example:
require 'net/http'
http = Net::HTTP.start('stackoverflow.com')
resp = http.head('/')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish
Another example, this time getting the header of the wonderful book, Object Orient Programming With ANSI-C:
require 'net/http'
http = Net::HTTP.start('www.planetpdf.com')
resp = http.head('/codecuts/pdfs/ooc.pdf')
resp.each { |k, v| puts "#{k}: #{v}" }
http.finish
It seems what I wanted is not possible to archieve using OpenURI, at least not, as I said, without loading the whole file first.
I was able to do what I wanted using Net::HTTP's request_get
Here an example:
http.request_get('/largefile.jpg') {|response|
if (response['content-length'] < max_length)
response.read_body do |str| # read body now
# save to file
end
end
}
Note that this only works when using a block, doing it like:
response = http.request_get('/largefile.jpg')
the body will already be read.
Rather than use Net::HTTP, which can be like digging a pool on the beach using a sand shovel, you can use a number of the HTTP clients for Ruby and clean up the code.
Here's a sample using HTTParty:
require 'httparty'
resp = HTTParty.head('http://example.org')
resp.headers
# => {"accept-ranges"=>["bytes"], "cache-control"=>["max-age=604800"], "content-type"=>["text/html"], "date"=>["Thu, 02 Mar 2017 18:52:42 GMT"], "etag"=>["\"359670651\""], "expires"=>["Thu, 09 Mar 2017 18:52:42 GMT"], "last-modified"=>["Fri, 09 Aug 2013 23:54:35 GMT"], "server"=>["ECS (oxr/83AB)"], "x-cache"=>["HIT"], "content-length"=>["1270"], "connection"=>["close"]}
At that point it's easy to check the size of the document:
resp.headers['content-length'] # => "1270"
Unfortunately, the HTTPd you're talking to might not know how big the content will be; In order to respond quickly servers don't necessarily calculate the size of dynamically generated output, which would take almost as long and be almost as CPU intensive as actually sending it, so relying on the "content-length" value might be buggy.
The issue with Net::HTTP is it won't automatically handle redirects, so then you have to add additional code. Granted, that code is supplied in the documentation, but the code keeps growing as you need to do more things, until you've ended up writing yet another http client (YAHC). So, avoid that and use an existing wheel.

Suggested Redis driver for use within Goliath?

There seem to be several options for establishing Redis connections for use within EventMachine, and I'm having a hard time understanding the core differences between them.
My goal is to implement Redis within Goliath
The way I establish my connection now is through em-synchrony:
require 'em-synchrony'
require 'em-synchrony/em-redis'
config['redis'] = EventMachine::Synchrony::ConnectionPool.new(:size => 20) do
EventMachine::Protocols::Redis.connect(:host => 'localhost', :port => 6379)
end
What is the difference between the above, and using something like em-hiredis?
If I'm using Redis for sets and basic key:value storage, is em-redis the best solution for my scenario?
We use em-hiredis very successfully inside Goliath. Here's a sample of how we coded publishing:
config/example_api.rb
# These give us direct access to the redis connection from within the API
config['redisUri'] = 'redis://localhost:6379/0'
config['redisPub'] ||= EM::Hiredis.connect('')
example_api.rb
class ExampleApi < Goliath::API
use Goliath::Rack::Params # parse & merge query and body parameters
use Goliath::Rack::Formatters::JSON # JSON output formatter
use Goliath::Rack::Render # auto-negotiate response format
def response(env)
env.logger.debug "\n\n\nENV: #{env['PATH_INFO']}"
env.logger.debug "REQUEST: Received"
env.logger.debug "POST Action received: #{env.params} "
#processing of requests from browser goes here
resp =
case env.params["action"]
when 'SOME_ACTION' then process_action(env)
when 'ANOTHER_ACTION' then process_another_action(env)
else
# skip
end
env.logger.debug "REQUEST: About to respond with: #{resp}"
[200, {'Content-Type' => 'application/json', 'Access-Control-Allow-Origin' => "*"}, resp]
end
# process an action
def process_action(env)
# extract message data
data = Hash.new
data["user_id"], data["object_id"] = env.params['user_id'], env.params['object_id']
publishData = { "action" => 'SOME_ACTION_RECEIVED',
"data" => data }
redisPub.publish("Channel_1", Yajl::Encoder.encode(publishData))
end
end
return data
end
# process anothr action
def process_another_action(env)
# extract message data
data = Hash.new
data["user_id"], data["widget_id"] = env.params['user_id'], env.params['widget_id']
publishData = { "action" => 'SOME_OTHER_ACTION_RECEIVED',
"data" => data }
redisPub.publish("Channel_1", Yajl::Encoder.encode(publishData))
end
end
return data
end
end
Handling subscriptions are left as an exercise for the reader.
what em-synchrony does is patch the em-redis gem to allow using it with fibers which effectively allows it to run in goliath.
Here is a project using Goliath + Redis which can guide you on how to make all this works: https://github.com/igrigorik/mneme
Example with em-hiredis, what goliath do is wrap your request in a fiber so a way to test it is:
require 'rubygems'
require 'bundler/setup'
require 'em-hiredis'
require 'em-synchrony'
EM::run do
Fiber.new do
## this is what you can use in goliath
redis = EM::Hiredis.connect
p EM::Synchrony.sync redis.keys('*')
## end of goliath block
end.resume
end
and the Gemfile I used:
source :rubygems
gem 'em-hiredis'
gem 'em-synchrony'
If you run this example you will get the list of defined keys in your redis database printed on screen.
Without the EM::Synchrony.sync call you would get a deferrable but here the fiber is suspended until the calls return and you get the result.

How to fetch multiple JSONs in parallel with Eventmachine in Ruby

I'm new to EM and am following this example:
EventMachine.run {
http = EventMachine::HttpRequest.new('http://google.com/').get :query => {'keyname' => 'value'}
http.errback { p 'Uh oh'; EM.stop }
http.callback {
p http.response_header.status
p http.response_header
p http.response
EventMachine.stop
}
}
I want to do something similar.
I want to fetch "JavaScript Object Notation" (JSON) files from several different web servers, in parallel.
I cannot find the way how to store all these JSON files in a common variable, so that I can do some calculations about them afterwards, something like in every request I store the JSON in a global array.
You want the requests to be in parallel and to process them after all have been completed?
You can use EventMachine::MultiRequest from em-http-request. The wiki has documentation on issuing parallel requests, see "Synchronizing with Multi interface".
You should add our code to multi.callback and you will receive an array of requests.

Best practices for handling binary data in Ruby?

What are the best practices for reading and writing binary data in Ruby?
In the code sample below I needed to send a binary file using over HTTP (as POST data):
class SimpleHandler < Mongrel::HttpHandler
def process(request, response)
response.start(200) do |head,out|
head["Content-Type"] = "application/ocsp-responder"
f = File.new("resp.der", "r")
begin
while true
out.syswrite(f.sysread(1))
end
rescue EOFError => err
puts "Sent response."
end
end
end
end
While this code seems to do a good job, it probably isn't very idiomatic. How can I improve it?
Then FileUtils copy_stream might be of use.
require 'fileutils'
fin = File.new('svarttag.jpg')
fout = File.new('blacktrain.jpg','w')
FileUtils.copy_stream(fin,fout)
fin.close
fout.close
Maybe not exactly what you asked for but if it's the whole HTTP POST files issue you want to solve then HTTPClient can do it for you:
require 'httpclient'
HTTPClient.post 'http://nl.netlog.com/test', { :file => File.new('resp.der') }
Also I've heard that Nick Siegers multipart-post is good but I haven't used it.

Resources