In all eventmachine code that I've seen, the callbacks / errorbacks were declared after the actual call of the method.
Here's a simple example:
about = EventMachine::HttpRequest.new('http://google.ca/search?q=eventmachine').get
about.callback { # callback nesting, ad infinitum }
about.errback { # error-handling code }
Why are the callbacks and errorbacks declared AFTER ? Is it not possible that the EM::HttpRequest already finished w/ some sort of success or error state? How does EM guarantee that callbacks and errorbacks are actually caught?
The .get call only sets up the request.
The get request method in EM::HttpRequest module.
EM::HttpRequest uses EM::Deferrable module which is sort of a switch.
Add these two together, and you get a functionality where the request is first built and waits until a response is received. So, for the first iteration of the EM.run do..end loop, the connection is setup, the callbacks are registered and when the response is received, which will be processed in the next iteration/whenever the response is received, the set_deferrable_status is set to :succeeded or :failed and the corresponding callback/errback is executed.
Take the following code....
http = EM::HttpRequest.new('http://google.com').get
http.callback {puts "it was a great call"}
http.errback { puts "it was a bad call" }
You might think that if the asynchronous request happens faster than we can set the callback it's possible that the callback will never be called.
The request is happening asynchronously so we might think it's a possibility. But it's not. What if we put some really long running code in between the time we actually set up the callback? I'll demostrate and show that the callbacks still work.
http = EM::HttpRequest.new('http://google.com').get
#Some really long running code
100000000000.times do
#some really long running code
end
http.callback {puts "it was a great call"}
http.errback { puts "it was a bad call" }
The request in this case completes long before the really long running code completes but the callback will still be called? Why?
The reason is because HttpRequest is a Deferrable. It inherits from it. Even though a Deferrable is something that can run asyc Defferables have a status. success or fail and we still have a reference to that deferable in a variable called http.
When we call http.callback {"puts "it was a great call"} We imediately check to see if the status of the deffereable, in this case http, is success. If it is, imediately call the callback. Else set it up so that the defferable calls it whenever it finishes with a "success" status. It's that simple. As long as we have a reference to that defferable we can set the callback at any time.
My guess was confirmed when I actually took a look at the source code for Defferable.
http://eventmachine.rubyforge.org/EventMachine/Deferrable.html#callback-instance_method
Royce
Related
I have a situation where I am calling a method on a gem that does its processing asynchronously. I want to be able to wait for the callback to be called before continuing execution on the thread where the method was called.
gem.async_method(args) do |result|
# the callback
end
# wait until callback is called and then continue execution
puts result # somehow have access to the result from the callback
PubNub Ruby SDK http_sync
I think you are just looking for the http_sync parameter when you publish, subscribe', etc. PubNub. https://www.pubnub.com/docs/ruby/api-reference-sdk-v4#publish_args
pubnub.publish(
channel: 'my_channel',
message: { text: 'Hi!' },
http_sync: true
) do |envelope|
puts envelope.status
end
I managed to find the solution to this eventually. The gem I was using returned a Celluloid::Future. If you call future_object.value it will block execution until the value is received.
Currently, Typhoeus doesn't have automatic re-download in case of failure. What would be the best way of ensuring a retry if the download is not successful?
def request
request ||= Typhoeus::Request.new("www.example.com")
request.on_complete do |response|
if response.success?
xml = Nokogiri::XML(response.body)
else
# retry to download it
end
end
end
I think you need to refactor your code. You should have two queues and threads you're working with, at a minimum.
The first is a queue of URLs that you pull from to read via Typhoeus::Request.
If the queue is empty you sleep that thread for a minute, then look for a URL to retrieve. If you successfully read the page, parse it and push the resulting XML doc into a second queue of DOMs to work on. Process that from a second thread. And, if the second queue is empty, sleep that second thread until there is something to work on.
If reading a URL fails, automatically re-push it onto the first queue.
If both queues are empty you could exit the code, or let both threads sleep until something says to start processing URLs again and you repopulate the first queue.
You also need a retries-counter associated with the URL, otherwise if a site goes down you could retry forever. You could push little sub-arrays onto the queue as:
["url", 0]
where 0 is the retry, or get more complex using an object or define a class. Whatever you do, increment that counter until it hits a drop-dead value, then stop adding that to the queue and report it or remove it from your source of URLs database somehow.
That's somewhat similar to code I've written a couple times to handle big spidering tasks.
See Ruby's Thread and Queue classes for examples of this.
Also:
request ||= Typhoeus::Request.new("www.example.com")
makes no sense. request will be nil when that code runs, so the ||= will always fire. Instead use:
request = Typhoeus::Request.new("www.example.com")
modified with the appropriate code to pull the next value from the first queue mentioned above.
I'm making a Ruby server using the em-websocket gem. When a client sends some message (e.g. "thread") the server creates two different threads and sends two anwsers to the client in parallel (I'm actually studying multithreading and websockets). Here's my code:
EM.run {
EM::WebSocket.run(:host => "0.0.0.0", :port => 8080) do |ws|
ws.onmessage { |msg|
puts "Recieved message: #{msg}"
if msg == "thread"
threads = []
threads << a = Thread.new {
sleep(1)
puts "1"
ws.send("Message sent from thread 1")
}
threads << b = Thread.new{
sleep(2)
puts "2"
ws.send("Message sent from thread 2")
}
threads.each { |aThread| aThread.join }
end
How it executes:
I'm sending "thread" message to a server
After one second in my console I see printed string "1". After another second I see "2".
Only after that both messages simultaneously are sent to the client.
The problem is that I want to send messages exactly at the same time when debug output "1" and "2" are sent.
My Ruby version is 1.9.3p194.
I don't have experience with EM, so take this with a pinch of salt.
However, at first glance, it looks like "aThread.join" is actually blocking the "onmessage" method from completing and thus also preventing the "ws.send" from being processed.
Have you tried removing the "threads.each" block?
Edit:
After having tested this in arch linux with both ruby 1.9.3 and 2.0.0 (using "test.html" from the examples of em-websocket), I am sure that even if removing the "threads.each" block doesn't fix the problem for you, you will still have to remove it as Thread#join will suspend the current thread until the "joined" threads are finished.
If you follow the function call of "ws.onmessage" through the source code, you will end up at the Connection#send_data method of the Eventmachine module and find the following within the comments:
Call this method to send data to the remote end of the network connection. It takes a single String argument, which may contain binary data. Data is buffered to be sent at the end of this event loop tick (cycle).
As "onmessage" is blocked by the "join" until both "send" methods have run, the event loop tick cannot finish until both sets of data are buffered and thus, all the data cannot be sent until this time.
If it is still not working for you after removing the "threads.each" block, make sure that you have restarted your eventmachine and try setting the second sleep to 5 seconds instead. I don't know how long a typical event loop takes in eventmachine (and I can't imagine it to be as long as a second), however, the documentation basically says that if several "send" calls are made within the same tick, they will all be sent at the same time. So increasing the time difference will make sure that this is not happening.
I think the problem is that you are calling sleep method, passing 1 to the first thread and 2 to the second thread.
Try removing sleep call on both threads or passing the same value on each call.
I have a Http client written in Ruby that can make synchronous requests to URLs. However, to quickly execute multiple requests I decided to use Eventmachine. The idea is to
queue all the requests and execute them using eventmachine.
class EventMachineBackend
...
...
def execute(request)
$q ||= EM.Queue.new
$q.push(request)
$q.pop {|request| request.invoke}
EM.run{EM.next_tick {EM.stop}}
end
...
end
Forgive my use of a global queue variable. I will refactor it later. Is what I am doing in EventMachineBackend#execute the right way of using Eventmachine queues?
One problem I see in my implementation is it is essentially synchronous. I push a request, pop and execute the request and wait for it to complete.
Could anyone suggest a better implementation.
Your the request logic has to be asynchronous for it to work with EventMachine, I suggest that you use em-http-request. You can find an example on how to use it here, it shows how to run the requests in parallel. An even better interface for running multiple connections in parallel is the MultiRequest class from the same gem.
If you want to queue requests and only run a fixed number of them in parallel you can do something like this:
EM.run do
urls = [...] # regular array with URLs
active_requests = 0
# this routine will be used as callback and will
# be run when each request finishes
when_done = proc do
active_requests -= 1
if urls.empty? && active_requests == 0
# if there are no more urls, and there are no active
# requests it means we're done, so shut down the reactor
EM.stop
elsif !urls.empty?
# if there are more urls launch a new request
launch_next.call
end
end
# this routine launches a request
launch_next = proc do
# get the next url to fetch
url = urls.pop
# launch the request, and register the callback
request = EM::HttpRequest.new(url).get
request.callback(&when_done)
request.errback(&when_done)
# increment the number of active requests, this
# is important since it will tell us when all requests
# are done
active_requests += 1
end
# launch three requests in parallel, each will launch
# a new requests when done, so there will always be
# three requests active at any one time, unless there
# are no more urls to fetch
3.times do
launch_next.call
end
end
Caveat emptor, there may very well be some detail I've missed in the code above.
If you think it's hard to follow the logic in my example, welcome to the world of evented programming. It's really tricky to write readable evented code. It all goes backwards. Sometimes it helps to start reading from the end.
I've assumed that you don't want to add more requests after you've started downloading, it doesn't look like it from the code in your question, but should you want to you can rewrite my code to use an EM::Queue instead of a regular array, and remove the part that does EM.stop, since you will not be stopping. You can probably remove the code that keeps track of the number of active requests too, since that's not relevant. The important part would look something like this:
launch_next = proc do
urls.pop do |url|
request = EM::HttpRequest.new(url).get
request.callback(&launch_next)
request.errback(&launch_next)
end
end
Also, bear in mind that my code doesn't actually do anything with the response. The response will be passed as an argument to the when_done routine (in the first example). I also do the same thing for success and error, which you may not want to do in a real application.
Okay it's a simple task. After I render html to the client I want to execute a db call with information from the request.
I am using sinatra because it's a lightweight microframework, but really i up for anything in ruby, if it's faster/easier(Rack?). I just want to get the url and redirect the client somewhere else based on the url.
So how does one go about with rack/sinatra a real after_filter. And by after_filter I mean after response is sent to the client. Or is that just not dooable without threads?
I forked sinatra and added after filters, but there is no way to flush the response, even send_data which is suppose to stream files(which is obviously for binary) waits for the after_filters.
I've seen this question: Multipart-response-in-ruby but the answer is for rails. And i am not sure if it really flushes the response to the client and then allows for processing afterwards.
Rack::Callbacks has some before and after callbacks but even those look like they would run before the response is ever sent to the client here's Rack::Callbacks implementation(added comment):
def call(env)
#before.each {|c| c.call(env) }
response = #app.call(env)
#after.each {|c| c.call(env) }
response
#i am guessing when this method returns then the response is sent to the client.
end
So i know I could call a background task through the shell with rake. But it would be nice not to have too... Also there is NeverBlock but is that good for executing a separate process without delaying the response or would it still make the app wait as whole(i think it would)?
I know this is a lot, but in short it's simple after_filter that really runs after the response is sent in ruby/sinatra/rack.
Thanks for reading or answering my question! :-)
Modified run_later port to rails to do the trick the file is available here:
http://github.com/pmamediagroup/sinatra_run_later/tree/master