Sending outside of EventMachine loop - ruby

I'm using the em-ws-client gem, although I think my question is more general than that. I'm trying to send data from outside the EventMachine receive block, but it takes a very long time (~20s) for the data to be sent:
require "em-ws-client"
m = Mutex.new
c = ConditionVariable.new
Thread.new do
EM.run do
#ws = EM::WebSocketClient.new("ws://echo.websocket.org")
#ws.onopen do
puts "connected"
m.synchronize { c.broadcast }
end
#ws.onmessage do |msg, binary|
puts msg
end
end
end
m.synchronize { c.wait(m) }
#ws.send_message "test"
sleep 100
When I put the #ws.send_message "test" directly into the onopen method it works just fine. I don't understand why my version doesn't work. I found this issue in EventMachine, but I'm not sure whether it's related.
Why does it take so long, and how can I fix that?

EventMachine is strictly single threaded and sharing of sockets between threads is not recommended. What you might be seeing here is an issue with the main EventMachine thread being unaware that you've submitted a send_message call and leaving it buffered for an extended period of time.
I'd be very, very careful when using threads with EventMachine. I've seen it malfunction and crash if you hit thread timing or synchronization problems.

Related

Which of Ruby's concurrency devices would be best suited for this scenario?

The whole threads/fibers/processes thing is confusing me a little. I have a practical problem that can be solved with some concurrency, so I thought this was a good opportunity to ask professionals and people more knowledgable than me about it.
I have a long array, let's say 3,000 items. I want to send a HTTP request for each item in the array.
Actually iterating over the array, generating requests, and sending them is very rapid. What takes time is waiting for each item to be received, processed, and acknowledged by the party I'm sending to. I'm essentially sending 100 bytes, waiting 2 seconds, sending 100 bytes, waiting 2 seconds.
What I would like to do instead is send these requests asynchronously. I want to send a request, specify what to do when I get the response, and in the meantime, send the next request.
From what I can see, there are four concurrency options I could use here.
Threads.
Fibers.
Processes; unsuitable as far as I know because multiple processes accessing the same array isn't feasible/safe.
Asynchronous functionality like JavaScript's XMLHttpRequest.
The simplest would seem to be the last one. But what is the best, simplest way to do that using Ruby?
Failing #4, which of the remaining three is the most sensible choice here?
Would any of these options also allow me to say "Have no more than 10 pending requests at any time"?
This is your classic producer/consumer problem and is nicely suited for threads in Ruby. Just create a Queue
urls = [...] # array with bunches of urls
require "thread"
queue = SizedQueue.new(10) # this will only allow 10 items on the queue at once
p1 = Thread.new do
url_slice = urls.each do |url|
response = do_http_request(url)
queue << response
end
queue << "done"
end
consumer = Thread.new do
http_response = queue.pop(true) # don't block when zero items are in queue
Thread.exit if http_response == "done"
process(http_response)
end
# wait for the consumer to finish
consumer.join
EventMachine as an event loop and em-synchrony as a Fiber wrapper for it's callbacks into synchronous code
Copy Paste from em-synchrony README
require "em-synchrony"
require "em-synchrony/em-http"
require "em-synchrony/fiber_iterator"
EM.synchrony do
concurrency = 2
urls = ['http://url.1.com', 'http://url2.com']
results = []
EM::Synchrony::FiberIterator.new(urls, concurrency).each do |url|
resp = EventMachine::HttpRequest.new(url).get
results.push resp.response
end
p results # all completed requests
EventMachine.stop
end
This is an IO bounded case that fits more in both:
Threading model: no problem with MRI Ruby in this case cause threads work well with IO cases; GIL effect is almost zero.
Asynchronous model, which proves(in practice and theory) to be far superior than threads when it comes to IO specific problems.
For this specific case and to make things far simpler, I would have gone with Typhoeus HTTP client which has a parallel support that works as the evented(Asynchronous) concurrency model.
Example:
hydra = Typhoeus::Hydra.new
%w(url1 url2 url3).each do |url|
request = Typhoeus::Request.new(url, followlocation: true)
request.on_complete do |response|
# do something with response
end
hydra.queue(request)
end
hydra.run # this is a blocking call that returns once all requests are complete

Send multiply messages in websocket using threads

I'm making a Ruby server using the em-websocket gem. When a client sends some message (e.g. "thread") the server creates two different threads and sends two anwsers to the client in parallel (I'm actually studying multithreading and websockets). Here's my code:
EM.run {
EM::WebSocket.run(:host => "0.0.0.0", :port => 8080) do |ws|
ws.onmessage { |msg|
puts "Recieved message: #{msg}"
if msg == "thread"
threads = []
threads << a = Thread.new {
sleep(1)
puts "1"
ws.send("Message sent from thread 1")
}
threads << b = Thread.new{
sleep(2)
puts "2"
ws.send("Message sent from thread 2")
}
threads.each { |aThread| aThread.join }
end
How it executes:
I'm sending "thread" message to a server
After one second in my console I see printed string "1". After another second I see "2".
Only after that both messages simultaneously are sent to the client.
The problem is that I want to send messages exactly at the same time when debug output "1" and "2" are sent.
My Ruby version is 1.9.3p194.
I don't have experience with EM, so take this with a pinch of salt.
However, at first glance, it looks like "aThread.join" is actually blocking the "onmessage" method from completing and thus also preventing the "ws.send" from being processed.
Have you tried removing the "threads.each" block?
Edit:
After having tested this in arch linux with both ruby 1.9.3 and 2.0.0 (using "test.html" from the examples of em-websocket), I am sure that even if removing the "threads.each" block doesn't fix the problem for you, you will still have to remove it as Thread#join will suspend the current thread until the "joined" threads are finished.
If you follow the function call of "ws.onmessage" through the source code, you will end up at the Connection#send_data method of the Eventmachine module and find the following within the comments:
Call this method to send data to the remote end of the network connection. It takes a single String argument, which may contain binary data. Data is buffered to be sent at the end of this event loop tick (cycle).
As "onmessage" is blocked by the "join" until both "send" methods have run, the event loop tick cannot finish until both sets of data are buffered and thus, all the data cannot be sent until this time.
If it is still not working for you after removing the "threads.each" block, make sure that you have restarted your eventmachine and try setting the second sleep to 5 seconds instead. I don't know how long a typical event loop takes in eventmachine (and I can't imagine it to be as long as a second), however, the documentation basically says that if several "send" calls are made within the same tick, they will all be sent at the same time. So increasing the time difference will make sure that this is not happening.
I think the problem is that you are calling sleep method, passing 1 to the first thread and 2 to the second thread.
Try removing sleep call on both threads or passing the same value on each call.

How do I loop the restart of a daemon?

I am trying to use Ruby's daemon gem and loop the restart of a daemon that has its own loop. My code looks like this now:
require 'daemons'
while true
listener = Daemons.call(:force => true) do
users = accounts.get_updated_user_list
TweetStream::Client.new.follow(users) do |status|
puts "#{status.text}"
end
end
sleep(60)
listener.restart
end
Running this gives me the following error (after 60 seconds):
undefined method `restart' for #<Daemons::Application:0x007fc5b29f5658> (NoMethodError)
So obviously Daemons.call doesn't return a controllable daemon like I think it does. What do I need to do to set this up correctly. Is a daemon the right tool here?
I think this is what you're after, although I haven't tested it.
class RestartingUserTracker
def initialize
#client = TweetStream::Client.new
end
def handle_status(status)
# do whatever it is you're going to do with the status
end
def fetch_users
accounts.get_updated_user_list
end
def restart
#client.stop_stream
users = fetch_users
#client.follow(users) do |status|
handle_status(status)
end
end
end
EM.run do
client = RestartingUserTracker.new
client.restart
EM::PeriodicTimer.new(60) do
client.restart
end
end
Here's how it works:
TweetStream uses EventMachine internally, as a way of polling the API forever and handling the responses. I can see why you might have felt stuck, because the normal TweetStream API blocks forever and doesn't give you a way to intervene at any point. However, TweetStream does allow you to set up other things in the same event loop. In your case, a timer. I found the documentation on how to do that here: https://github.com/intridea/tweetstream#removal-of-on_interval-callback
By starting up our own EventMachine reactor, we're able to inject our own code into the reactor as well as use TweetStream. In this case, we're using a simple timer that just restarts the client every 60 seconds.
EventMachine is an implementation of something called the Reactor Pattern. If you want to fully understand and maintain this code, it would serve you well to find some resources about it and gain a full understanding. The reactor pattern is very powerful, but can be difficult to grasp at first.
However, this code should get you started. Also, I'd consider renaming the RestartingUserTracker to something more appropriate.

What happens when you don't join your Threads?

I'm writing a ruby program that will be using threads to do some work. The work that is being done takes a non-deterministic amount of time to complete and can range anywhere from 5 to 45+ seconds. Below is a rough example of what the threading code looks like:
loop do # Program loop
items = get_items
threads = []
for item in items
threads << Thread.new(item) do |i|
# do work on i
end
threads.each { |t| t.join } # What happens if this isn't there?
end
end
My preference would be to skip joining the threads and not block the entire application. However I don't know what the long term implications of this are, especially because the code is run again almost immediately. Is this something that is safe to do? Or is there a better way to spawn a thread, have it do work, and clean up when it's finished, all within an infinite loop?
I think it really depends on the content of your thread work. If, for example, your main thread needed to print "X work done", you would need to join to guarantee that you were showing the correct answer. If you have no such requirement, then you wouldn't necessarily need to join up.
After writing the question out, I realized that this is the exact thing that a web server does when serving pages. I googled and found the following article of a Ruby web server. The loop code looks pretty much like mine:
loop do
session = server.accept
request = session.gets
# log stuff
Thread.start(session, request) do |session, request|
HttpServer.new(session, request, basePath).serve()
end
end
Thread.start is effectively the same as Thread.new, so it appears that letting the threads finish and die off is OK to do.
If you split up a workload to several different threads and you need to combine at the end the solutions from the different threads you definately need a join otherwise you could do it without a join..
If you removed the join, you could end up with new items getting started faster than the older ones get finished. If you're working on too many items at once, it may cause performance issues.
You should use a Queue instead (snippet from http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html):
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join

Deadlock in ruby code using SizedQueue

I think I'm running up against a fundamental misunderstanding on my part of how threading works in ruby and I'm hoping to get some insight.
I'd like to have a simple producer and consumer. First, a producer thread that pulls lines from a file and sticks them into a SizedQueue; when those run out, stick some tokens on the end to let the consumer(s) know things are done.
require 'thread'
numthreads = 2
filename = 'edition-2009-09-11.txt'
bq = SizedQueue.new(4)
producerthread = Thread.new(bq) do |queue|
File.open(filename) do |f|
f.each do |r|
queue << r
end
end
numthreads.times do
queue << :end_of_producer
end
end
Now a few consumers. For simplicity, let's have them do nothing.
consumerthreads = []
numthreads.times do
consumerthreads << Thread.new(bq) do |queue|
until (line = queue.pop) === :end_of_producer
# do stuff in here
end
end
end
producerthread.join
consumerthreads.each {|t| t.join}
puts "All done"
My understanding is that (a) the producer thread will block once the SizedQueue is full and eventually get back to filling it up, and (b) the consumer threads will pull from the SizedQueue, blocking when it empties, and eventually finish.
But under ruby1.9 (ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin9]) I get a deadlock error on the joins. What's going on here? I just don't see where there's any interaction between the threads except via the SizedQueue,which is supposed to be thread-safe.
Any insight would be much-appreciated.
Your understanding is correct and your code works on my machine, on a slightly newer version of Ruby (both ruby 1.9.2dev (2009-08-30 trunk 24705) [i386-darwin10.0.0] and ruby 1.9.2dev (2009-08-30 trunk 24705) [i386-darwin10.0.0])

Resources