How to fetch multiple JSONs in parallel with Eventmachine in Ruby - ruby

I'm new to EM and am following this example:
EventMachine.run {
http = EventMachine::HttpRequest.new('http://google.com/').get :query => {'keyname' => 'value'}
http.errback { p 'Uh oh'; EM.stop }
http.callback {
p http.response_header.status
p http.response_header
p http.response
EventMachine.stop
}
}
I want to do something similar.
I want to fetch "JavaScript Object Notation" (JSON) files from several different web servers, in parallel.
I cannot find the way how to store all these JSON files in a common variable, so that I can do some calculations about them afterwards, something like in every request I store the JSON in a global array.

You want the requests to be in parallel and to process them after all have been completed?
You can use EventMachine::MultiRequest from em-http-request. The wiki has documentation on issuing parallel requests, see "Synchronizing with Multi interface".
You should add our code to multi.callback and you will receive an array of requests.

Related

Parsing JSON with multiple pages in Ruby

I understand how to parse JSON, but I don’t understand how to parse it if it contains links to other pages.
I would be grateful for your help!
api.example.com/v0/accounts
On the first request for a JSON file, we get:
{
"response": "OK",
"nfts": [
{
"token_id": "35507806371588763669298464310896317145981867843055556101069010709538683224114"
}
],
"total": null,
"continuation": "1634866413000"
}
There is a line: continuation, which is a link to the next request, and so it repeats many more times.
On next request, the link changes to api.example.com/v0/accounts&continuation=1634866413000
My code now looks like this:
class Source
include Mongoid::Document
include Mongoid::Timestamps
require 'json'
after_save :add_items
def add_items
json= HTTParty.get("https://api.example.com/v0/accounts")
json.dig('nfts')
load_items_ethereum.each do |item|
Item.create!(
:token_id => item['token_id'],
)
end
end
end
Low-level HTTP clients like HTTParty typically don't handle iteration. You'll need to do it yourself, using a loop until there's no continuation field, e.g.:
begin
continuation_param = "?continuation=#{continuation_id}" if continuation_id
json = HTTParty.get("https://api.example.com/v0/accounts#{continuation_param}")
continuation_id = json.dig('continuation');
# process latest payload, append it to a running list, etc.
end while continuation_id
(And for production, best practice would be to keep a counter so you can bail after N iterations, to avoid an infinite loop.)

How to pass data between task in Ruby Rake?

So how to pass data between task in Ruby Rake?
Believe me I read a lot bout this in the internet and none makes sense.
I found the ENV['some_var'] approach, But I can't share objects without some conversions, unnecessary additional conversions that cost time to me and the processor :(. Additionally: "Come on, is this the best that they made up?"
Somebody said use instance or class variables. It is hacky isn't it? I mean it is semantic fiasco. The modules in Ruby are places to put methods and constants- I read somewhere, which makes sense at some some extend, but class and instance variables in modules? Why classes then?
So how can I share data between two rake tasks without doing some hacking?
For example: How to pass the response object from task get to task ping_server:
require 'net/http'
require 'uri'
namespace :http_request do
desc 'Request server to obtain status, and stores the data in the memcache specified by the environment'
# This task is made generic so it can serve as a low level routine for other tasks.
# Thus avoiding repetitive code.
task :get, [:url] => [:environment] do |t, args|
#configuration
WAIT_RESPONSE_TO_IN_SECONDS = 5
uri = URI.parse(args[:url])
http = Net::HTTP.new(uri.host, uri.port)
# We cannot wait for response forever, therefore provide timeout
http.open_timeout = WAIT_RESPONSE_TO_IN_SECONDS # in seconds
request = Net::HTTP::Get.new(uri.path)
# The response may take too long, or the URI may be bad(invalid)
begin
response = http.request(request)
puts response.code
ENV['req_response'] = {status: "ok", val: response.inspect}.to_s
# Rails.cache.write(args[:name], response.code)
rescue Exception => e
puts "\nRequest filed: #{e}\n"
ENV['req_response'] = {status: e.to_s, val: nil.to_s}.to_s
end
end
end
namespace :server_state do
desc "write cache"
task :ping_server, [:url] => "http_request:get" do
response = eval(ENV['req_response'])
puts "\n\nRESULT = #{response}"
puts "\n\nRESULT = #{response[:val]}"
end
end

Can I make asynchronous requests with Ruby's Typhoeus?

I am using Typhoeus and would like to make a single request without blocking for the response. Later on, I might check the response, or I might not. The point is I don't want the code execution to wait for the response.
Is there a way to do this built-in to Typhoeus?
Otherwise I guess I have to use threads and do it myself?
You could try using a thread:
response = nil
request_thread = Thread.new {
# Set up the request object here
response = request.response
}
From there you can check response == nil to see if the request has been made yet, and you can call request_thread.join to block until the thread is done executing.
I would suggest looking into the 'unirest' gem for Ruby.
As far as I am aware, Typhoeus blocks on the 'hydra.run' call
With Unirest, it does not block on the get / post / put / etc call, but continues to run. If you want, you can store the 'object' in a hash or an array with an identifier to retrieve later, like so:
identifier_requests['id'] = Unirest.post(url,headers: headers, parameters: param, auth: auth)
Then to block, or retrieve responses, use one of the calls on the response object:
response_code = (identifier_requests['id']).code
response.body
http://unirest.io/ruby.html
Typhoeus has non-blocking calls built-in. From their docs:
request = Typhoeus::Request.new("www.example.com", followlocation: true)
request.on_complete do |response|
if response.success?
# hell yeah
elsif response.timed_out?
# aw hell no
log("got a time out")
elsif response.code == 0
# Could not get an http response, something's wrong.
log(response.return_message)
else
# Received a non-successful http response.
log("HTTP request failed: " + response.code.to_s)
end
end
request.run
This is from their docs at https://github.com/typhoeus/typhoeus

Ruby library to make multiple HTTP requests simultaneously

I'm looking for an alternate Ruby HTTP library that makes multiple HTTP calls simultaneously and performs better than the core Net::HTTP library.
You are probably looking for Typhoeus.
Typhoeus runs HTTP requests in parallel while cleanly encapsulating handling logic
https://github.com/typhoeus/typhoeus
Why do you need a networking library handle parallelism? That is exactly what threads are for.
require "open-uri"
fetcher = lambda do |uri|
puts "Started fetching #{uri}"
puts open(uri).read
puts "Stopped fetching #{uri}"
end
thread1 = Thread.new("http://localhost:9292", &fetcher)
thread2 = Thread.new("http://localhost:9293", &fetcher)
thread1.join
thread2.join
Also, I don't understand what do you mean by "performs better". Core libraries are usually good enough to be in the core. Do you have any problems with Net::HTTP?
You can use the Parallel gem, it should work with any Ruby HTTP library.
Not sure if it performs better then Typhoeus, BUT you could use Eventmacheine + em-http-request. There is an example for sending multiple requests.
require 'eventmachine'
require 'em-http'
EventMachine.run {
multi = EventMachine::MultiRequest.new
reqs = [
'http://google.com/',
'http://google.ca:81/'
]
reqs.each_with_index do |url, idx|
http = EventMachine::HttpRequest.new(url, :connect_timeout => 1)
req = http.get
multi.add idx, req
end
multi.callback do
p multi.responses[:callback].size
p multi.responses[:errback].size
EventMachine.stop
end
}
https://github.com/igrigorik/em-http-request

Ruby HttpClient async

Hello stack overflow people.
Does someone know of a code example to make the ruby httpclient do an async post? It has a method but it looks like it just gives you a connection back you have to keep checking, which I assume would still be blocking. I did not see a way to "fire and forget" or just pass a method that it could call later in a separate thread while the rest of my code kept running.
thanks,
craig
This sounds like you're programming in evented style. Maybe you are even using eventmachine? You don't say so, but in the case you do, this project: https://github.com/eventmachine/em-http-request will let you do something close:
EventMachine.run {
http = EventMachine::HttpRequest.new('http://127.0.0.1/').get :query => {'keyname' => 'value'}
http.callback {
p http.response_header.status
p http.response_header
p http.response
EventMachine.stop
}
}

Resources