Ruby's speed of threads - ruby

I have the following code to thread-safe write into a file:
threads = []
##lock_flag = 0
##write_flag = 0
def add_to_file
old_i = 0
File.open( "numbers.txt", "r" ) { |f| old_i = f.read.to_i }
File.open( "numbers.txt", "w+") { |f| f.write(old_i+1) }
#puts old_i
end
File.open( "numbers.txt", "w") { |f| f.write(0) } unless File.exist? ("numbers.txt")
2000.times do
threads << Thread.new {
done_flag = 0
while done_flag == 0 do
print "." #### THIS LINE
if ##lock_flag == 0
##lock_flag = 1
if ##write_flag == 0
##write_flag = 1
add_to_file
##write_flag = 0
done_flag = 1
end
##lock_flag = 0
end
end
}
end
threads.each {|t| t.join}
If I run this code it take about 1.5 sec to write all 2000 numbers into the file. So, all is good.
But if I remove the line print "." marked with "THIS LINE" is takes ages! This code needs about 12sec for only 20 threads to complete.
Now my question: why does the print speed up that code so much?

I'm not sure how you can call that thread safe at all when it's simply not. You can't use a simple variable to ensure safety because of race conditions. What happens between testing that a flag is zero and setting it to one? You simply don't know. Anything can and will eventually happen in that very brief interval if you're unlucky enough.
What might be happening is the print statement causes the thread to stall long enough that your broken locking mechanism ends up working. When testing that example using Ruby 1.9.2 it doesn't even finish, printing dots seemingly forever.
You might want to try re-writing it using Mutex:
write_mutex = Mutex.new
read_mutex = Mutex.new
2000.times do
threads << Thread.new {
done_flag = false
while (!done_flag) do
print "." #### THIS LINE
write_mutex.synchronize do
read_mutex.synchronize do
add_to_file
done_flag = true
end
end
end
}
end
This is the proper Ruby way to do thread synchronization. A Mutex will not yield the lock until it is sure you have exclusive control over it. There's also the try_lock method that will try to grab it and will fail if it is already taken.
Threads can be a real nuisance to get right, so be very careful when using them.

First off, there are gems that can make this sort of thing easier. threach and jruby_threach ("threaded each") are ones that I wrote, and while I'm deeply unhappy with the implementation and will get around to making them cleaner at some point, they work fine when you have safe code.
(1..100).threach(2) {|i| do_something_with(i)} # run method in two threads
or
File.open('myfile.txt', 'r').threach(3, :each_line) {|line| process_line(line)}
You should also look at peach and parallel for other examples of easily working in parallel with multiple threads.
Above and beyond the problems already pointed out -- that your loop isn't thread-safe -- none of it matters because the code you're calling (add_to_file) isn't thread-safe. You're opening and closing the same file willy-nilly across threads, and that's gonna give you problems. I can't seem to understand what you're trying to do, but you need to keep in mind that you have absolutely no idea the order in which things in different threads are going to run.

Related

Ruby work distribution fails if threads are generated to fast

I ran into a problem the other day and I spent 2 hours looking for an answer at the wrong place.
In the process I stripped down the code to the version below. The Threading here will work as long as I have the sleep(0.1) in the loop creating the threads.
If the line is omitted, all threads are created - but only thread 7 will actually consume data from the queue.
With this "hack" I do have a working solution but not one I'm happy with. I'm really curious why this happens.
I am using a fairly old version of ruby under windows 2.4.1p111. However I was able to reproduce the same behavior with a new ruby 3.0.2p107 installation
#!/usr/bin/env ruby
#q = Queue.new
# Get all projects (would be a list of directories)
projects = [*0..100]
projects.each do |project|
#q.push project
end
def worker(num)
while not #q.empty?
puts "Thread: #{num} Project: #{#q.pop}"
sleep(0.5)
end
end
threads=[]
for i in 1..7 do
threads << Thread.new { worker(i) }
sleep(0.1) # Threading does not work without this line - but why?
end
threads.each {|thread| puts thread.join }
puts "done"
Fun bug! This is a race condition.
It's not that only thread 7 is doing work it's that all threads are referencing the same variable i in memory (there is only one copy!) so since
the number 7 gets written last (presumedly before any threads have started) they all read the same i==7.
Try this worker function and see if it doesn't clear things up
def worker(num)
my_thread_id = Thread.current.object_id
while not #q.empty?
puts "Thread: #{num} NumObjId: #{num.object_id} ThreadId: #{my_thread_id} Project: #{#q.pop}"
sleep(0.5)
end
end
Notice that NumObjId is the same in all threads. They are all pointing to the same number. But the actual ThreadId we get IS different.
If you really do need the number in each thread allocate as many numbers as threads. Something like
ids = (1..7).to_a
ids.each do |i|
threads << Thread.new { worker(i) }
end

Does race condition exist in `puts`?

I am reading a post about threading for ruby. And there is a snippet:
q = Queue.new
producer = Thread.new {
c = 0
while true do
q << c
c += 1
puts "#{q.size} in stock"
end
}
consumer1 = Thread.new {
while true
val = q.shift
puts "Consumer - 1: #{val}"
end
}
consumer2 = Thread.new {
while true
val = q.shift
puts "Consumer - 2: #{val}"
end
}
[producer, consumer1, consumer2].each(&:join)
The post says the output will be as:
Thread 2: 25
Thread 1: 22
Thread 2: 26Thread 1: 27
Thread 2: 29
Thread 1: 28
and the cause is:
 ... a pretty common race condition ...
But I couldn't reproduce that output. And as a java programmer, I don't think the output related to race condition here. I believe it's something related to puts, but I have no clue to that.
What's going on here?
UPDATE
Thanks for the help from #Damien MATHIEU which explains a lot to a ruby newbie. I found another answer in OS for STDOUT.sync = true that explains well why we need it and what problems it might cause.
Purpose:
This is done because IO operations are slow and usually it makes more sense to avoid writing every single character immediately to the console.
Possible issues as expected (and what happened in my question):
This behavior leads to problems in certain situations. Imagine you want to build a progress bar (run a loop that outputs single dots between extensive calculations). With buffering the result might be that there isn't any output for a while and then suddenly multiple dots are printed out at once.
That's because puts doesn't write to STDOUT right away, but buffers the string and writes in bigger chunks.
You can get ruby to write immediately with the following:
STDOUT.sync = true
which should resolve your ordering issue.

Can't run multithreading with Celluloid

This simple example I run on jruby, but it only one thread runs
require 'benchmark'
require 'celluloid/current'
TIMES = 10
def delay
sleep 1
# 40_000_000.times.each{|i| i*i}
end
p 'celluloid: true multithreading?'
class FileWorker
include Celluloid
def create_file(id)
delay
p "Done!"
File.open("out_#{id}.txt", 'w') {|f| f.write(Time.now) }
end
end
workers_pool = FileWorker.pool(size: 10)
TIMES.times do |i|
# workers_pool.async.create_file(i) # also not happens
future = Celluloid::Future.new { FileWorker.new.create_file(i) }
p future.value
end
All created files have interval 1 second.
Please help to turn Celluloid into multithreading mode, where all files are created simultaneously.
Thanks!
FIXED:
Indeed, array of "futures" helps!
futures = []
TIMES.times do |i|
futures << Celluloid::Future.new { FileWorker.new.create_file(i) }
end
futures.each {|f| p f.value }
Thanks jrochkind !
Ah, I think I see.
Inside your loop, you are waiting for each future to complete, at the end of the loop -- which means you are waiting for one future to complete, before creating the next one.
TIMES.times do |i|
# workers_pool.async.create_file(i) # also not happens
future = Celluloid::Future.new { FileWorker.new.create_file(i) }
p future.value
end
Try changing it to this:
futures = []
TIMES.times do |i|
futures << Celluloid::Future.new { FileWorker.new.create_file(i) }
end
futures.each {|f| p f.value }
In your version, consider the first iteration the loop -- you create a future, then call future.value which waits for the future to complete. The future.value statement won't return until the future completes, and the loop iteration won't finish and loop again to create another future until the statement returns. So you've effectively made it synchronous, by waiting on each future with value before creating the next.
Make sense?
Also, for short code blocks like this, it's way easier on potential SO answerers if you put the code directly in the question, properly indented to format as code, instead of linking out.
In general, if you are using a fairly widely used library like Celluloid, and finding it doesn't seem to do the main thing it's supposed to do -- the first guess should probably be a bug in your code, not that the library fundamentally doesn't work at all (someone else would have noticed before now!). A question title reflecting that, even just "Why doesn't my Celluloid code appear to work multi-threaded" might have gotten more favorable attention than a title suggesting Celluloid fundamentally does not work -- without any code in the question itself demonstrating!

Ruby script hangs forever

This little script is supposed to generate a user-specified amount of random numbers and print them. It's a multithreaded script and I think that's where my trouble lies. I'm not getting any errors, but when run the script just hangs.
num = []
while 0.upto ARGV[0].to_i do
num << rand{254}
end
current_index = 0
while current_index < num.size
chunk = num[current_index, 5]
threads = []
chunk.each do |n|
threads << Thread.new do
puts n
end
end
threads.each do |thread|
thread.join
end
current_index += chunk.size
end
You cannot use while loop with upto.
Change it to:
0.upto ARGV[0].to_i do
num << rand(254)
end
And it works properly (I've changed braces to curly one, because I believe you want 254 to be parameter here).
Sidenote:
Remember when writing threads program in Ruby, that CRuby has GIL - Global Interpreter Lock. Therefore only one thread will be operating at one time. If you want different behaviour - switch for example to jRuby. More information about GIL can be found f.e. here: http://www.jstorimer.com/blogs/workingwithcode/8085491-nobody-understands-the-gil
upto returns self, which is a number. Everything which isn't false or nil is trueish in Ruby, including numbers. So, you have a while loop whose condition is always trueish, ergo, will never stop.

Implementing a synchronization barrier in Ruby

I'm trying to "replicate" the behaviour of CUDA's __synchtreads() function in Ruby. Specifically, I have a set of N threads that need to execute some code, then all wait on each other at mid-point in execution before continuing with the rest of their business. For example:
x = 0
a = Thread.new do
x = 1
syncthreads()
end
b = Thread.new do
syncthreads()
# x should have been changed
raise if x == 0
end
[a,b].each { |t| t.join }
What tools do I need to use to accomplish this? I tried using a global hash, and then sleeping until all the threads have set a flag indicating they're done with the first part of the code. I couldn't get it to work properly; it resulted in hangs and deadlock. I think I need to use a combination of Mutex and ConditionVariable but I am unsure as to why/how.
Edit: 50 views and no answer! Looks like a candidate for a bounty...
Let's implement a synchronization barrier. It has to know the number of threads it will handle, n, up front. During first n - 1 calls to sync the barrier will cause a calling thread to wait. The call number n will wake all threads up.
class Barrier
def initialize(count)
#mutex = Mutex.new
#cond = ConditionVariable.new
#count = count
end
def sync
#mutex.synchronize do
#count -= 1
if #count > 0
#cond.wait #mutex
else
#cond.broadcast
end
end
end
end
Whole body of sync is a critical section, i.e. it cannot be executed by two threads concurrently. Hence the call to Mutex#synchronize.
When the decreased value of #count is positive the thread is frozen. Passing the mutex as an argument to the call to ConditionVariable#wait is critical to prevent deadlocks. It causes the mutex to be unlocked before freezing the thread.
A simple experiment starts 1k threads and makes them add elements to an array. Firstly they add zeros, then they synchronize and add ones. The expected result is a sorted array with 2k elements, of which 1k are zeros and 1k are ones.
mtx = Mutex.new
arr = []
num = 1000
barrier = Barrier.new num
num.times.map do
Thread.start do
mtx.synchronize { arr << 0 }
barrier.sync
mtx.synchronize { arr << 1 }
end
end .map &:join;
# Prints true. See it break by deleting `barrier.sync`.
puts [
arr.sort == arr,
arr.count == 2 * num,
arr.count(&:zero?) == num,
arr.uniq == [0, 1],
].all?
As a matter of fact, there's a gem named barrier which does exactly what I described above.
On a final note, don't use sleep for waiting in such circumstances. It's called busy waiting and is considered a bad practice.
There might be merits of having the threads wait for each other. But I think that it is cleaner to have the threads actually finish at "midpoint", because your question obviously impliest that the threads need each others' results at the "midpoint". Clean design solution would be to let them finish, deliver the result of their work, and start a brand new set of threads based on these.

Resources