Ruby work distribution fails if threads are generated to fast - ruby

I ran into a problem the other day and I spent 2 hours looking for an answer at the wrong place.
In the process I stripped down the code to the version below. The Threading here will work as long as I have the sleep(0.1) in the loop creating the threads.
If the line is omitted, all threads are created - but only thread 7 will actually consume data from the queue.
With this "hack" I do have a working solution but not one I'm happy with. I'm really curious why this happens.
I am using a fairly old version of ruby under windows 2.4.1p111. However I was able to reproduce the same behavior with a new ruby 3.0.2p107 installation
#!/usr/bin/env ruby
#q = Queue.new
# Get all projects (would be a list of directories)
projects = [*0..100]
projects.each do |project|
#q.push project
end
def worker(num)
while not #q.empty?
puts "Thread: #{num} Project: #{#q.pop}"
sleep(0.5)
end
end
threads=[]
for i in 1..7 do
threads << Thread.new { worker(i) }
sleep(0.1) # Threading does not work without this line - but why?
end
threads.each {|thread| puts thread.join }
puts "done"

Fun bug! This is a race condition.
It's not that only thread 7 is doing work it's that all threads are referencing the same variable i in memory (there is only one copy!) so since
the number 7 gets written last (presumedly before any threads have started) they all read the same i==7.
Try this worker function and see if it doesn't clear things up
def worker(num)
my_thread_id = Thread.current.object_id
while not #q.empty?
puts "Thread: #{num} NumObjId: #{num.object_id} ThreadId: #{my_thread_id} Project: #{#q.pop}"
sleep(0.5)
end
end
Notice that NumObjId is the same in all threads. They are all pointing to the same number. But the actual ThreadId we get IS different.
If you really do need the number in each thread allocate as many numbers as threads. Something like
ids = (1..7).to_a
ids.each do |i|
threads << Thread.new { worker(i) }
end

Related

Ruby execution order in multithreaded program

In prep for hurricane irma I wrote a quick trash script to download a bunch of exercises off exercism.io. It works, but there's an error at the call to threads.each that I don't understand, all the code up until threads.each is synchronous if I understand correctly, so I'm not sure what the best way to fix is.:
rb:14:in '<main>': undefined method 'each' for nil:NilClass (NoMethodError)
It's interesting to me because I get the error but the program still runs as expected, so I'm sure I'm not writing this properly.
language = ARGV[0]
exercises = `exercism list #{#language}`.split("\n")
threads = exercises.map do |exercise|
break if exercise == ''
Thread.new do
system("exercism fetch #{language} #{exercise}")
end
end
threads.each(&:join)
Use next instead of break so that threads is still set if any exercises are blank. break will cancel the whole loop, but next will skip only the current iteration.
Then some threads could still be nil if their exercise is blank, because no thread has started for them. You can use threads.compact.each(&:join) to skip these nil values.
Or if you need the break, then add to threads inside the loop like:
threads = []
exercises.each do |exercise|
break if exercise == ''
threads << Thread.new do
system("exercism fetch #{language} #{exercise}")
end
end

Can't run multithreading with Celluloid

This simple example I run on jruby, but it only one thread runs
require 'benchmark'
require 'celluloid/current'
TIMES = 10
def delay
sleep 1
# 40_000_000.times.each{|i| i*i}
end
p 'celluloid: true multithreading?'
class FileWorker
include Celluloid
def create_file(id)
delay
p "Done!"
File.open("out_#{id}.txt", 'w') {|f| f.write(Time.now) }
end
end
workers_pool = FileWorker.pool(size: 10)
TIMES.times do |i|
# workers_pool.async.create_file(i) # also not happens
future = Celluloid::Future.new { FileWorker.new.create_file(i) }
p future.value
end
All created files have interval 1 second.
Please help to turn Celluloid into multithreading mode, where all files are created simultaneously.
Thanks!
FIXED:
Indeed, array of "futures" helps!
futures = []
TIMES.times do |i|
futures << Celluloid::Future.new { FileWorker.new.create_file(i) }
end
futures.each {|f| p f.value }
Thanks jrochkind !
Ah, I think I see.
Inside your loop, you are waiting for each future to complete, at the end of the loop -- which means you are waiting for one future to complete, before creating the next one.
TIMES.times do |i|
# workers_pool.async.create_file(i) # also not happens
future = Celluloid::Future.new { FileWorker.new.create_file(i) }
p future.value
end
Try changing it to this:
futures = []
TIMES.times do |i|
futures << Celluloid::Future.new { FileWorker.new.create_file(i) }
end
futures.each {|f| p f.value }
In your version, consider the first iteration the loop -- you create a future, then call future.value which waits for the future to complete. The future.value statement won't return until the future completes, and the loop iteration won't finish and loop again to create another future until the statement returns. So you've effectively made it synchronous, by waiting on each future with value before creating the next.
Make sense?
Also, for short code blocks like this, it's way easier on potential SO answerers if you put the code directly in the question, properly indented to format as code, instead of linking out.
In general, if you are using a fairly widely used library like Celluloid, and finding it doesn't seem to do the main thing it's supposed to do -- the first guess should probably be a bug in your code, not that the library fundamentally doesn't work at all (someone else would have noticed before now!). A question title reflecting that, even just "Why doesn't my Celluloid code appear to work multi-threaded" might have gotten more favorable attention than a title suggesting Celluloid fundamentally does not work -- without any code in the question itself demonstrating!

Ruby's speed of threads

I have the following code to thread-safe write into a file:
threads = []
##lock_flag = 0
##write_flag = 0
def add_to_file
old_i = 0
File.open( "numbers.txt", "r" ) { |f| old_i = f.read.to_i }
File.open( "numbers.txt", "w+") { |f| f.write(old_i+1) }
#puts old_i
end
File.open( "numbers.txt", "w") { |f| f.write(0) } unless File.exist? ("numbers.txt")
2000.times do
threads << Thread.new {
done_flag = 0
while done_flag == 0 do
print "." #### THIS LINE
if ##lock_flag == 0
##lock_flag = 1
if ##write_flag == 0
##write_flag = 1
add_to_file
##write_flag = 0
done_flag = 1
end
##lock_flag = 0
end
end
}
end
threads.each {|t| t.join}
If I run this code it take about 1.5 sec to write all 2000 numbers into the file. So, all is good.
But if I remove the line print "." marked with "THIS LINE" is takes ages! This code needs about 12sec for only 20 threads to complete.
Now my question: why does the print speed up that code so much?
I'm not sure how you can call that thread safe at all when it's simply not. You can't use a simple variable to ensure safety because of race conditions. What happens between testing that a flag is zero and setting it to one? You simply don't know. Anything can and will eventually happen in that very brief interval if you're unlucky enough.
What might be happening is the print statement causes the thread to stall long enough that your broken locking mechanism ends up working. When testing that example using Ruby 1.9.2 it doesn't even finish, printing dots seemingly forever.
You might want to try re-writing it using Mutex:
write_mutex = Mutex.new
read_mutex = Mutex.new
2000.times do
threads << Thread.new {
done_flag = false
while (!done_flag) do
print "." #### THIS LINE
write_mutex.synchronize do
read_mutex.synchronize do
add_to_file
done_flag = true
end
end
end
}
end
This is the proper Ruby way to do thread synchronization. A Mutex will not yield the lock until it is sure you have exclusive control over it. There's also the try_lock method that will try to grab it and will fail if it is already taken.
Threads can be a real nuisance to get right, so be very careful when using them.
First off, there are gems that can make this sort of thing easier. threach and jruby_threach ("threaded each") are ones that I wrote, and while I'm deeply unhappy with the implementation and will get around to making them cleaner at some point, they work fine when you have safe code.
(1..100).threach(2) {|i| do_something_with(i)} # run method in two threads
or
File.open('myfile.txt', 'r').threach(3, :each_line) {|line| process_line(line)}
You should also look at peach and parallel for other examples of easily working in parallel with multiple threads.
Above and beyond the problems already pointed out -- that your loop isn't thread-safe -- none of it matters because the code you're calling (add_to_file) isn't thread-safe. You're opening and closing the same file willy-nilly across threads, and that's gonna give you problems. I can't seem to understand what you're trying to do, but you need to keep in mind that you have absolutely no idea the order in which things in different threads are going to run.

How do I manage ruby threads so they finish all their work?

I have a computation that can be divided into independent units and the way I'm dealing with it now is by creating a fixed number of threads and then handing off chunks of work to be done in each thread. So in pseudo code here's what it looks like
# main thread
work_units.take(10).each {|work_unit| spawn_thread_for work_unit}
def spawn_thread_for(work)
Thread.new do
do_some work
more_work = work_units.pop
spawn_thread_for more_work unless more_work.nil?
end
end
Basically once the initial number of threads is created each one does some work and then keeps taking stuff to be done from the work stack until nothing is left. Everything works fine when I run things in irb but when I execute the script using the interpreter things don't work out so well. I'm not sure how to make the main thread wait until all the work is finished. Is there a nice way of doing this or am I stuck with executing sleep 10 until work_units.empty? in the main thread
In ruby 1.9 (and 2.0), you can use ThreadsWait from the stdlib for this purpose:
require 'thread'
require 'thwait'
threads = []
threads << Thread.new { }
threads << Thread.new { }
ThreadsWait.all_waits(*threads)
If you modify spawn_thread_for to save a reference to your created Thread, then you can call Thread#join on the thread to wait for completion:
x = Thread.new { sleep 0.1; print "x"; print "y"; print "z" }
a = Thread.new { print "a"; print "b"; sleep 0.2; print "c" }
x.join # Let the threads finish before
a.join # main thread exits...
produces:
abxyzc
(Stolen from the ri Thread.new documentation. See the ri Thread.join documentation for some more details.)
So, if you amend spawn_thread_for to save the Thread references, you can join on them all:
(Untested, but ought to give the flavor)
# main thread
work_units = Queue.new # and fill the queue...
threads = []
10.downto(1) do
threads << Thread.new do
loop do
w = work_units.pop
Thread::exit() if w.nil?
do_some_work(w)
end
end
end
# main thread continues while work threads devour work
threads.each(&:join)
Thread.list.each{ |t| t.join unless t == Thread.current }
It seems like you are replicating what the Parallel Each (Peach) library provides.
You can use Thread#join
join(p1 = v1) public
The calling thread will suspend execution and run thr. Does not return until thr exits or until limit seconds have passed. If the time limit expires, nil will be returned, otherwise thr is returned.
Also you can use Enumerable#each_slice to iterate over the work units in batches
work_units.each_slice(10) do |batch|
# handle each work unit in a thread
threads = batch.map do |work_unit|
spawn_thread_for work_unit
end
# wait until current batch work units finish before handling the next batch
threads.each(&:join)
end

Process n items at a time (using threads)

I'm doing what a lot of people probably need to do, processing tasks that have a variable execution time. I have the following proof of concept code:
threads = []
(1...10000).each do |n|
threads << Thread.new do
run_for = rand(10)
puts "Starting thread #{n}(#{run_for})"
time=Time.new
while 1 do
if Time.new - time >= run_for then
break
else
sleep 1
end
end
puts "Ending thread #{n}(#{run_for})"
end
finished_threads = []
while threads.size >= 10 do
threads.each do |t|
finished_threads << t unless t.alive?
end
finished_threads.each do |t|
threads.delete(t)
end
end
end
It doesn't start a new thread until one of the previous threads has dropped off. Does anyone know a better, more elegant way of doing this?
I'd suggest creating a work pool. See http://snippets.dzone.com/posts/show/3276. Then submit all of your variable length work to the pool, and call join to wait for all the threads to complete.
The work_queue gem is the easiest way to perform tasks asynchronously and concurrently in your application.
wq = WorkQueue.new 2 # Limit the maximum number of simultaneous worker threads
(1..10_000).each do
wq.enqueue_b do
# Task
end
end
wq.join # All tasks are complete after this

Resources