Deadlock in ruby code using SizedQueue - ruby

I think I'm running up against a fundamental misunderstanding on my part of how threading works in ruby and I'm hoping to get some insight.
I'd like to have a simple producer and consumer. First, a producer thread that pulls lines from a file and sticks them into a SizedQueue; when those run out, stick some tokens on the end to let the consumer(s) know things are done.
require 'thread'
numthreads = 2
filename = 'edition-2009-09-11.txt'
bq = SizedQueue.new(4)
producerthread = Thread.new(bq) do |queue|
File.open(filename) do |f|
f.each do |r|
queue << r
end
end
numthreads.times do
queue << :end_of_producer
end
end
Now a few consumers. For simplicity, let's have them do nothing.
consumerthreads = []
numthreads.times do
consumerthreads << Thread.new(bq) do |queue|
until (line = queue.pop) === :end_of_producer
# do stuff in here
end
end
end
producerthread.join
consumerthreads.each {|t| t.join}
puts "All done"
My understanding is that (a) the producer thread will block once the SizedQueue is full and eventually get back to filling it up, and (b) the consumer threads will pull from the SizedQueue, blocking when it empties, and eventually finish.
But under ruby1.9 (ruby 1.9.1p243 (2009-07-16 revision 24175) [i386-darwin9]) I get a deadlock error on the joins. What's going on here? I just don't see where there's any interaction between the threads except via the SizedQueue,which is supposed to be thread-safe.
Any insight would be much-appreciated.

Your understanding is correct and your code works on my machine, on a slightly newer version of Ruby (both ruby 1.9.2dev (2009-08-30 trunk 24705) [i386-darwin10.0.0] and ruby 1.9.2dev (2009-08-30 trunk 24705) [i386-darwin10.0.0])

Related

Why won't my simple Connection Pool execute these simple put statements?

I posted a question regarding how to effectively manage threads here How do I properly use Threads to connect ping a url?
I got some great recommendations and tips regarding pools, thread safety, and some libraries and gems to use. I'm trying to execute one of the recommendations listed by using concurrent-ruby to create a thread/connction pool to execute some threads. In a simple ruby file I have the following code:
pool = Concurrent::FixedThreadPool.new(5)
pool.post do
puts 'hello'
end
As per the documentation in concurrent-ruby I've done the required steps but my code won't execute. No puts statement is being executed. Here is another example:
pool = Concurrent::FixedThreadPool.new(5)
array = []
pool.post do
array << 1
puts 'Why am I not working?'
end
puts array.size
The size of this array is 0. the code in the pool is not executing. I would have at least expected a size of 1. I've followed the example to a tee. Why is this code not executing?
Your code is correct and the block is successfully pushed to the pool. However, before it gets executed, the program terminates and kills the pool. That's why you don't see any output - it did not have enough time to execute the job.
You can either add sleep statement at the end or, for more elegant solution, tell the pool to finish all the work and shut down. This will look like this:
require 'concurrent-ruby'
pool = Concurrent::FixedThreadPool.new(5)
pool.post do
puts 'hello'
end
pool.shutdown
pool.wait_for_termination

Ruby Sinatra with consumer thread and job queue

I’m trying to create a very simple restful server. When it receives a request, I want to create a new job on a queue that can be handled by another thread while the current thread returns a response to the client.
I looked at Sinatra, but haven't got too far.
require 'sinatra'
require 'thread'
queue = Queue.new
set :port, 9090
get '/' do
queue << 'item'
length = queue.size
puts 'QUEUE LENGTH %d', length
'Message Received'
end
consumer = Thread.new do
5.times do |i|
value = queue.pop(true) rescue nil
puts "consumed #{value}"
end
end
consumer.join
In the above example, I know the consumer thread would only run a few times (as opposed to the life of the application), but even this isn't working for me.
Is there a better approach?
Your main problem is your call to Queue#pop. You’re passing true, which causes it not to suspend the thread and raises an exception instead, which you rescue with nil. Your consumer thread therefore loops five times before any thing else can happen.
You need to change that line to
value = queue.pop
so that the thread waits for new data being pushed onto the queue.
You’ll also need to remove the consumer.join line from the end, since that will cause deadlock once you’ve changed the call to pop.
(Also, it’s not part of your main problem, but it looks like you want printf rather than puts when you print the queue length).

Sending outside of EventMachine loop

I'm using the em-ws-client gem, although I think my question is more general than that. I'm trying to send data from outside the EventMachine receive block, but it takes a very long time (~20s) for the data to be sent:
require "em-ws-client"
m = Mutex.new
c = ConditionVariable.new
Thread.new do
EM.run do
#ws = EM::WebSocketClient.new("ws://echo.websocket.org")
#ws.onopen do
puts "connected"
m.synchronize { c.broadcast }
end
#ws.onmessage do |msg, binary|
puts msg
end
end
end
m.synchronize { c.wait(m) }
#ws.send_message "test"
sleep 100
When I put the #ws.send_message "test" directly into the onopen method it works just fine. I don't understand why my version doesn't work. I found this issue in EventMachine, but I'm not sure whether it's related.
Why does it take so long, and how can I fix that?
EventMachine is strictly single threaded and sharing of sockets between threads is not recommended. What you might be seeing here is an issue with the main EventMachine thread being unaware that you've submitted a send_message call and leaving it buffered for an extended period of time.
I'd be very, very careful when using threads with EventMachine. I've seen it malfunction and crash if you hit thread timing or synchronization problems.

Parallelism in Ruby

I've got a loop in my Ruby build script that iterates over each project and calls msbuild and does various other bits like minify CSS/JS.
Each loop iteration is independent of the others so I'd like to parallelise it.
How do I do this?
I've tried:
myarray.each{|item|
Thread.start {
# do stuff
}
}
puts "foo"
but Ruby just seems to exit straight away (prints "foo"). That is, it runs over the loop, starts a load of threads, but because there's nothing after the each, Ruby exits killing the other threads :(
I know I can do thread.join, but if I do this inside the loop then it's no longer parallel.
What am I missing?
I'm aware of http://peach.rubyforge.org/ but using that I get all kinds of weird behaviour that look like variable scoping issues that I don't know how to solve.
Edit
It would be useful if I could wait for all child-threads to execute before putting "foo", or at least the main ruby thread exiting. Is this possible?
Store all your threads in an array and loop through the array calling join:
threads = myarray.map do |item|
Thread.start do
# do stuff
end
end
threads.each { |thread| thread.join }
puts "foo"
Use em-synchrony here :). Fibers are cute.
require "em-synchrony"
require "em-synchrony/fiber_iterator"
# if you realy need to get a Fiber per each item
# in real life you could set concurrency to, for example, 10 and it could even improve performance
# it depends on amount of IO in your job
concurrency = myarray.size
EM.synchrony do
EM::Synchrony::FiberIterator.new(myarray, concurrency).each do |url|
# do some job here
end
EM.stop
end
Take into account that ruby threads are green threads, so you dont have natively true parallelism. I f this is what you want I would recommend you to take a look to JRuby and Rubinius:
http://www.engineyard.com/blog/2011/concurrency-in-jruby/

What happens when you don't join your Threads?

I'm writing a ruby program that will be using threads to do some work. The work that is being done takes a non-deterministic amount of time to complete and can range anywhere from 5 to 45+ seconds. Below is a rough example of what the threading code looks like:
loop do # Program loop
items = get_items
threads = []
for item in items
threads << Thread.new(item) do |i|
# do work on i
end
threads.each { |t| t.join } # What happens if this isn't there?
end
end
My preference would be to skip joining the threads and not block the entire application. However I don't know what the long term implications of this are, especially because the code is run again almost immediately. Is this something that is safe to do? Or is there a better way to spawn a thread, have it do work, and clean up when it's finished, all within an infinite loop?
I think it really depends on the content of your thread work. If, for example, your main thread needed to print "X work done", you would need to join to guarantee that you were showing the correct answer. If you have no such requirement, then you wouldn't necessarily need to join up.
After writing the question out, I realized that this is the exact thing that a web server does when serving pages. I googled and found the following article of a Ruby web server. The loop code looks pretty much like mine:
loop do
session = server.accept
request = session.gets
# log stuff
Thread.start(session, request) do |session, request|
HttpServer.new(session, request, basePath).serve()
end
end
Thread.start is effectively the same as Thread.new, so it appears that letting the threads finish and die off is OK to do.
If you split up a workload to several different threads and you need to combine at the end the solutions from the different threads you definately need a join otherwise you could do it without a join..
If you removed the join, you could end up with new items getting started faster than the older ones get finished. If you're working on too many items at once, it may cause performance issues.
You should use a Queue instead (snippet from http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html):
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join

Resources