Does race condition exist in `puts`? - ruby

I am reading a post about threading for ruby. And there is a snippet:
q = Queue.new
producer = Thread.new {
c = 0
while true do
q << c
c += 1
puts "#{q.size} in stock"
end
}
consumer1 = Thread.new {
while true
val = q.shift
puts "Consumer - 1: #{val}"
end
}
consumer2 = Thread.new {
while true
val = q.shift
puts "Consumer - 2: #{val}"
end
}
[producer, consumer1, consumer2].each(&:join)
The post says the output will be as:
Thread 2: 25
Thread 1: 22
Thread 2: 26Thread 1: 27
Thread 2: 29
Thread 1: 28
and the cause is:
 ... a pretty common race condition ...
But I couldn't reproduce that output. And as a java programmer, I don't think the output related to race condition here. I believe it's something related to puts, but I have no clue to that.
What's going on here?
UPDATE
Thanks for the help from #Damien MATHIEU which explains a lot to a ruby newbie. I found another answer in OS for STDOUT.sync = true that explains well why we need it and what problems it might cause.
Purpose:
This is done because IO operations are slow and usually it makes more sense to avoid writing every single character immediately to the console.
Possible issues as expected (and what happened in my question):
This behavior leads to problems in certain situations. Imagine you want to build a progress bar (run a loop that outputs single dots between extensive calculations). With buffering the result might be that there isn't any output for a while and then suddenly multiple dots are printed out at once.

That's because puts doesn't write to STDOUT right away, but buffers the string and writes in bigger chunks.
You can get ruby to write immediately with the following:
STDOUT.sync = true
which should resolve your ordering issue.

Related

Multi Threading in Ruby

I need to create 3 threads.
Each thread will print on the screen a collor and sleep for x seconds.
Thread A will print red; Thread B will print yellow; Thread C will print green;
All threads must wait until its their turn to print.
The first thread to print must be Red, after printing, Red will tell Yellow that's its turn to print and so on.
The threads must be able to print multiple times (user specific)
I'm stuck because calling #firstFlag.signal outside a Thread isn't working and the 3 threads aren't working on the right order
How do I make the red Thread go first?
my code so far:
#lock = Mutex.new
#firstFlag = ConditionVariable.new
#secondFlag = ConditionVariable.new
#thirdFlag = ConditionVariable.new
print "Tell me n's vallue:"
#n = gets.to_i
#threads = Array.new
#threads << Thread.new() {
t = Random.rand(1..3)
n = 0
#lock.synchronize {
for i in 0...#n do
#firstFlag.wait(#lock, t)
puts "red : #{t}s"
sleep(t)
#secondFlag.signal
end
}
}
#threads << Thread.new() {
t = Random.rand(1..3)
n = 0
#lock.synchronize {
for i in 0...#n do
#secondFlag.wait(#lock, t)
puts "yellow : #{t}s"
sleep(t)
#thirdFlag.signal
end
}
}
#threads << Thread.new() {
t = Random.rand(1..3)
n = 0
#lock.synchronize {
for i in 0...#n do
#thirdFlag.wait(#lock, t)
puts "green : #{t}s"
sleep(t)
#firstFlag.signal
end
}
}
#threads.each {|t| t.join}
#firstFlag.signal
There are three bugs in your code:
First bug
Your wait calls use a timeout. This means your threads will become de-synchronized from your intended sequence, because the timeout will let each thread slip past your intended wait point.
Solution: change all your wait calls to NOT use a timeout:
#xxxxFlag.wait(#lock)
Second bug
You put your sequence trigger AFTER your Thread.join call in the end. Your join call will never return, and hence the last statement in your code will never be executed, and your thread sequence will never start.
Solution: change the order to signal the sequence start first, and then join the threads:
#firstFlag.signal
#threads.each {|t| t.join}
Third bug
The problem with a wait/signal construction is that it does not buffer the signals.
Therefore you have to ensure all threads are in their wait state before calling signal, otherwise you may encounter a race condition where a thread calls signal before another thread has called wait.
Solution: This a bit harder to solve, although it is possible to solve with Queue. But I propose a complete rethinking of your code instead. See below for the full solution.
Better solution
I think you need to rethink the whole construction, and instead of condition variables just use Queue for everything. Now the code becomes much less brittle, and because Queue itself is thread safe, you do not need any critical sections any more.
The advantage of Queue is that you can use it like a wait/signal construction, but it buffers the signals, which makes everything much simpler in this case.
Now we can rewrite the code:
redq = Queue.new
yellowq = Queue.new
greenq = Queue.new
Then each thread becomes like this:
#threads << Thread.new() {
t = Random.rand(1..3)
n = 0
for i in 0...#n do
redq.pop
puts "red : #{t}s"
sleep(t)
yellowq.push(1)
end
}
And finally to kick off the whole sequence:
redq.push(1)
#threads.each { |t| t.join }
I'd redesign this slightly. Think of your ConditionVariables as flags that a thread uses to say it's done for now, and name them accordingly:
#lock = Mutex.new
#thread_a_done = ConditionVariable.new
#thread_b_done = ConditionVariable.new
#thread_c_done = ConditionVariable.new
Now, thread A signals it's done by doing #thread_a_done.signal, and thread B can wait for that signal, etc. Thread A of course needs to wait until thread C is done, so we get this kind of structure:
#threads << Thread.new() {
t = Random.rand(1..3)
#lock.synchronize {
for i in 0...#n do
#thread_c_done.wait(#lock)
puts "A: red : #{t}s"
sleep(t)
#thread_a_done.signal
end
}
}
A problem here is that you need to make sure that thread A in the first iteration doesn't wait for a flag signal. After all, it's to go first, so it shouldn't wait for anyone else. So modify it to:
#thread_c_done.wait(#lock) unless i == 0
Finally, once you have created your threads, kick them all off by invoking run, then join on each thread (so that your program doesn't exit before the last thread is done):
#threads.each(&:run)
#threads.each(&:join)
Oh btw I'd get rid of the timeouts in your wait as well. You have a hard requirement that they go in order. If you make the signal wait time out you screw that up - threads might still "jump the queue" so to speak.
EDIT as #casper remarked below, this still has a potential race condition: Thread A could call signal before thread B is waiting to receive it, in which case thread B will miss it and just wait indefinitely. A possible way to fix this is to use some form of a CountDownLatch - a shared object that all threads can wait on, which gets released as soon as all threads have signalled that they're ready. The ruby-concurrency gem has an implementation of this, and in fact might have other interesting things to use for more elegant multi-threaded programming.
Sticking with pure ruby though, you could possibly fix this by adding a second Mutex that guards shared access to a boolean flag to indicate the thread is ready.
Ok, thank you guys that answered. I've found a solution:
I've created a fourth thread. Because I found out that calling "#firstFlag.signal" outside a thread doesn't work, because ruby has a "main thread" that sleeps when you "run" other threads.
So, "#firstFlag.signal" calling must be inside a thread so it can be on the same level of the CV.wait
I solved the issue using this:
#threads << Thread.new {
sleep 1
#firstFlag.signal
}
This fourth thread will wait for 1 sec before sending the first signal to red. This only sec seems to be enough for the others thread reach the wait point.
And, I've removed the timeout, as you sugested.
//Edit//
I realized I don't need a fourth Thread, I could just make thread C do the first signal.
I made thread C sleep for 1 sec to wait the other two threads enter in wait state, then it signals red to start and goes to wait too
#threads << Thread.new() {
sleep 1
#redFlag.signal
t = Random.rand(1..3)
n = 0
#lock.synchronize {
for i in 0...#n do
#greenFlag.wait(#lock)
puts "verde : #{t}s"
sleep(t)
#redFlag.signal
n += 1
end
}
}

Can't run multithreading with Celluloid

This simple example I run on jruby, but it only one thread runs
require 'benchmark'
require 'celluloid/current'
TIMES = 10
def delay
sleep 1
# 40_000_000.times.each{|i| i*i}
end
p 'celluloid: true multithreading?'
class FileWorker
include Celluloid
def create_file(id)
delay
p "Done!"
File.open("out_#{id}.txt", 'w') {|f| f.write(Time.now) }
end
end
workers_pool = FileWorker.pool(size: 10)
TIMES.times do |i|
# workers_pool.async.create_file(i) # also not happens
future = Celluloid::Future.new { FileWorker.new.create_file(i) }
p future.value
end
All created files have interval 1 second.
Please help to turn Celluloid into multithreading mode, where all files are created simultaneously.
Thanks!
FIXED:
Indeed, array of "futures" helps!
futures = []
TIMES.times do |i|
futures << Celluloid::Future.new { FileWorker.new.create_file(i) }
end
futures.each {|f| p f.value }
Thanks jrochkind !
Ah, I think I see.
Inside your loop, you are waiting for each future to complete, at the end of the loop -- which means you are waiting for one future to complete, before creating the next one.
TIMES.times do |i|
# workers_pool.async.create_file(i) # also not happens
future = Celluloid::Future.new { FileWorker.new.create_file(i) }
p future.value
end
Try changing it to this:
futures = []
TIMES.times do |i|
futures << Celluloid::Future.new { FileWorker.new.create_file(i) }
end
futures.each {|f| p f.value }
In your version, consider the first iteration the loop -- you create a future, then call future.value which waits for the future to complete. The future.value statement won't return until the future completes, and the loop iteration won't finish and loop again to create another future until the statement returns. So you've effectively made it synchronous, by waiting on each future with value before creating the next.
Make sense?
Also, for short code blocks like this, it's way easier on potential SO answerers if you put the code directly in the question, properly indented to format as code, instead of linking out.
In general, if you are using a fairly widely used library like Celluloid, and finding it doesn't seem to do the main thing it's supposed to do -- the first guess should probably be a bug in your code, not that the library fundamentally doesn't work at all (someone else would have noticed before now!). A question title reflecting that, even just "Why doesn't my Celluloid code appear to work multi-threaded" might have gotten more favorable attention than a title suggesting Celluloid fundamentally does not work -- without any code in the question itself demonstrating!

JS-style async/non-blocking callback execution with Ruby, without heavy machinery like threads?

I'm a frontend developer, somewhat familiar with Ruby. I only know how to do Ruby in a synchronous/sequential manner, while in JS i'm used to async/non-blocking callbacks.
Here's sample Ruby code:
results = []
rounds = 5
callback = ->(item) {
# This imitates that the callback may take time to complete
sleep rand(1..5)
results.push item
if results.size == rounds
puts "All #{rounds} requests have completed! Here they are:", *results
end
}
1.upto(rounds) { |item| callback.call(item) }
puts "Hello"
The goal is to have the callbacks run without blocking main script execution. In other words, i want "Hello" line to appear in output above the "All 5 requests..." line. Also, the callbacks should run concurrently, so that the callback fastest to finish makes it into the resulting array first.
With JavaScript, i would simply wrap the callback call into a setTimeout with zero delay:
setTimeout( function() { callback(item); }, 0);
This JS approach does not implement true multithreading/concurrency/parallel execution. Under the hood, the callbacks would run all in one thread sequentially, or rather interlaced on the low level.
But on practical level it would appear as concurrent execution: the resulting array would be populated in an order corresponding to the amount of time spent by each callback, i. e. the resulting array would appear sorted by the time it took each callback to finish.
Note that i only want the asynchronous feature of setTimeout(). I don't need the sleep feature built into setTimeout() (not to be confused with a sleep used in the callback example to imitate a time-consuming operation).
I tried to inquire into how to do that JS-style async approach with Ruby and was given suggestions to use:
Multithreading. This is probably THE approach for Ruby, but it requires a substantial amount of scaffolding:
Manually define an array for threads.
Manually define a mutex.
Start a new thread for each callback, add it to the array.
Pass the mutex into each callback.
Use mutex in the callback for thread synchronization.
Ensure all threads are completed before program completion.
Compared to JavaScript's setTimeout(), this is just too much. As i don't need true parallel execution, i don't want to build that much scaffolding every time i want to execute a proc asynchronously.
A sophisticated Ruby library like Celluloid and Event Machine. They look like it will take weeks to learn them.
A custom solution like this one (the author, apeiros#freenode, claims it to be very close to what setTimeout does under the hood). It requires almost no scaffolding to build and it does not involve threads. But it seems to run callbacks synchronously, in the order they've been executed.
I have always considered Ruby to be a programming language most close to my ideal, and JS to be a poor man's programming language. And it kinda discourages me that Ruby is not able to do a thing which is trivial with JS, without involving heavy machinery.
So the question is: what is the simplest, most intuitive way to do do async/non-blocking callback with Ruby, without involving complicated machinery like threads or complex libraries?
PS If there will be no satisfying answer during the bounty period, i will dig into #3 by apeiros and probably make it the accepted answer.
Like people said, it's not possible to achieve what you want without using Threads or a library that abstracts their functionality. But, if it's just the setTimeout functionality you want, then the implementation is actually very small.
Here's my attempt at emulating Javascript's setTimeout in ruby:
require 'thread'
require 'set'
module Timeout
#timeouts = Set[]
#exiting = false
#exitm = Mutex.new
#mutex = Mutex.new
at_exit { wait_for_timeouts }
def self.set(delay, &blk)
thrd = Thread.start do
sleep delay
blk.call
#exitm.synchronize do
unless #exiting
#mutex.synchronize { #timeouts.delete thrd }
end
end
end
#mutex.synchronize { #timeouts << thrd }
end
def self.wait_for_timeouts
#exitm.synchronize { #exiting = true }
#timeouts.each(&:join)
#exitm.synchronize { #exiting = false }
end
end
Here's how to use it:
$results = []
$rounds = 5
mutex = Mutex.new
def callback(n, mutex)
-> {
sleep rand(1..5)
mutex.synchronize {
$results << n
puts "Fin: #{$results}" if $results.size == $rounds
}
}
end
1.upto($rounds) { |i| Timeout.set(0, &callback(i, mutex)) }
puts "Hello"
This outputs:
Hello
Fin: [1, 2, 3, 5, 4]
As you can see, the way you use it is essentially the same, the only thing I've changed is I've added a mutex to prevent race conditions on the results array.
Aside: Why we need the mutex in the usage example
Even if javascript is only running on a single core, that does not prevent race conditions due to atomicity of operations. Pushing to an array is not an atomic operation, so more than one instruction is executed.
Suppose it is two instructions, putting the element at the end, and incrementing the size. (SET, INC).
Consider all the ways two pushes can be interleaved (taking symmetry into account):
SET1 INC1 SET2 INC2
SET1 SET2 INC1 INC2
The first one is what we want, but the second one results in the second append overwriting the first.
Okay, after some fiddling with threads and studying contributions by apeiros and asQuirreL, i came up with a solution that suits me.
I'll show sample usage first, source code in the end.
Example 1: simple non-blocking execution
First, a JS example that i'm trying to mimic:
setTimeout( function() {
console.log("world");
}, 0);
console.log("hello");
// 'Will print "hello" first, then "world"'.
Here's how i can do it with my tiny Ruby library:
# You wrap all your code into this...
Branch.new do
# ...and you gain access to the `branch` method that accepts a block.
# This block runs non-blockingly, just like in JS `setTimeout(callback, 0)`.
branch { puts "world!" }
print "Hello, "
end
# Will print "Hello, world!"
Note how you don't have to take care of creating threads, waiting for them to finish. The only scaffolding required is the Branch.new { ... } wrapper.
Example 2: synchronizing threads with a mutex
Now we'll assume that we're working with some input and output shared among threads.
JS code i'm trying to reproduce with Ruby:
var
results = [],
rounds = 5;
for (var i = 1; i <= rounds; i++) {
console.log("Starting thread #" + i + ".");
// "Creating local scope"
(function(local_i) {
setTimeout( function() {
// "Assuming there's a time-consuming operation here."
results.push(local_i);
console.log("Thread #" + local_i + " has finished.");
if (results.length === rounds)
console.log("All " + rounds + " threads have completed! Bye!");
}, 0);
})(i);
}
console.log("All threads started!");
This code produces the following output:
Starting thread #1.
Starting thread #2.
Starting thread #3.
Starting thread #4.
Starting thread #5.
All threads started!
Thread #5 has finished.
Thread #4 has finished.
Thread #3 has finished.
Thread #2 has finished.
Thread #1 has finished.
All 5 threads have completed! Bye!
Notice that the callbacks finish in reverse order.
We're also gonna assume that working the results array may produce a race condition. In JS this is never an issue, but in multithreaded Ruby this has to be addressed with a mutex.
Ruby equivalent of the above:
Branch.new 1 do
# Setting up an array to be filled with that many values.
results = []
rounds = 5
# Running `branch` N times:
1.upto(rounds) do |item|
puts "Starting thread ##{item}."
# The block passed to `branch` accepts a hash with mutexes
# that you can use to synchronize threads.
branch do |mutexes|
# This imitates that the callback may take time to complete.
# Threads will finish in reverse order.
sleep (6.0 - item) / 10
# When you need a mutex, you simply request one from the hash.
# For each unique key, a new mutex will be created lazily.
mutexes[:array_and_output].synchronize do
puts "Thread ##{item} has finished!"
results.push item
if results.size == rounds
puts "All #{rounds} threads have completed! Bye!"
end
end
end
end
puts "All threads started."
end
puts "All threads finished!"
Note how you don't have to take care of creating threads, waiting for them to finish, creating mutexes and passing them into the block.
Example 3: delaying execution of the block
If you need the delay feature of setTimeout, you can do it like this.
JS:
setTimeout(function(){ console.log('Foo'); }, 2000);
Ruby:
branch(2) { puts 'Foo' }
Example 4: waiting for all threads to finish
With JS, there's no simple way to have the script wait for all threads to finish. You'll need an await/defer library for that.
But in Ruby it's possible, and Branch makes it even simpler. If you write code after the Branch.new{} wrapper, it will be executed after all branches within the wrapper have been completed. You don't need to manually ensure that all threads have finished, Branch does that for you.
Branch.new do
branch { sleep 10 }
branch { sleep 5 }
# This will be printed immediately
puts "All threads started!"
end
# This will be printed after 10 seconds (the duration of the slowest branch).
puts "All threads finished!"
Sequential Branch.new{} wrappers will be executed sequentially.
Source
# (c) lolmaus (Andrey Mikhaylov), 2014
# MIT license http://choosealicense.com/licenses/mit/
class Branch
def initialize(mutexes = 0, &block)
#threads = []
#mutexes = Hash.new { |hash, key| hash[key] = Mutex.new }
# Executing the passed block within the context
# of this class' instance.
instance_eval &block
# Waiting for all threads to finish
#threads.each { |thr| thr.join }
end
# This method will be available within a block
# passed to `Branch.new`.
def branch(delay = false, &block)
# Starting a new thread
#threads << Thread.new do
# Implementing the timeout functionality
sleep delay if delay.is_a? Numeric
# Executing the block passed to `branch`,
# providing mutexes into the block.
block.call #mutexes
end
end
end

Implementing a synchronization barrier in Ruby

I'm trying to "replicate" the behaviour of CUDA's __synchtreads() function in Ruby. Specifically, I have a set of N threads that need to execute some code, then all wait on each other at mid-point in execution before continuing with the rest of their business. For example:
x = 0
a = Thread.new do
x = 1
syncthreads()
end
b = Thread.new do
syncthreads()
# x should have been changed
raise if x == 0
end
[a,b].each { |t| t.join }
What tools do I need to use to accomplish this? I tried using a global hash, and then sleeping until all the threads have set a flag indicating they're done with the first part of the code. I couldn't get it to work properly; it resulted in hangs and deadlock. I think I need to use a combination of Mutex and ConditionVariable but I am unsure as to why/how.
Edit: 50 views and no answer! Looks like a candidate for a bounty...
Let's implement a synchronization barrier. It has to know the number of threads it will handle, n, up front. During first n - 1 calls to sync the barrier will cause a calling thread to wait. The call number n will wake all threads up.
class Barrier
def initialize(count)
#mutex = Mutex.new
#cond = ConditionVariable.new
#count = count
end
def sync
#mutex.synchronize do
#count -= 1
if #count > 0
#cond.wait #mutex
else
#cond.broadcast
end
end
end
end
Whole body of sync is a critical section, i.e. it cannot be executed by two threads concurrently. Hence the call to Mutex#synchronize.
When the decreased value of #count is positive the thread is frozen. Passing the mutex as an argument to the call to ConditionVariable#wait is critical to prevent deadlocks. It causes the mutex to be unlocked before freezing the thread.
A simple experiment starts 1k threads and makes them add elements to an array. Firstly they add zeros, then they synchronize and add ones. The expected result is a sorted array with 2k elements, of which 1k are zeros and 1k are ones.
mtx = Mutex.new
arr = []
num = 1000
barrier = Barrier.new num
num.times.map do
Thread.start do
mtx.synchronize { arr << 0 }
barrier.sync
mtx.synchronize { arr << 1 }
end
end .map &:join;
# Prints true. See it break by deleting `barrier.sync`.
puts [
arr.sort == arr,
arr.count == 2 * num,
arr.count(&:zero?) == num,
arr.uniq == [0, 1],
].all?
As a matter of fact, there's a gem named barrier which does exactly what I described above.
On a final note, don't use sleep for waiting in such circumstances. It's called busy waiting and is considered a bad practice.
There might be merits of having the threads wait for each other. But I think that it is cleaner to have the threads actually finish at "midpoint", because your question obviously impliest that the threads need each others' results at the "midpoint". Clean design solution would be to let them finish, deliver the result of their work, and start a brand new set of threads based on these.

Ruby's speed of threads

I have the following code to thread-safe write into a file:
threads = []
##lock_flag = 0
##write_flag = 0
def add_to_file
old_i = 0
File.open( "numbers.txt", "r" ) { |f| old_i = f.read.to_i }
File.open( "numbers.txt", "w+") { |f| f.write(old_i+1) }
#puts old_i
end
File.open( "numbers.txt", "w") { |f| f.write(0) } unless File.exist? ("numbers.txt")
2000.times do
threads << Thread.new {
done_flag = 0
while done_flag == 0 do
print "." #### THIS LINE
if ##lock_flag == 0
##lock_flag = 1
if ##write_flag == 0
##write_flag = 1
add_to_file
##write_flag = 0
done_flag = 1
end
##lock_flag = 0
end
end
}
end
threads.each {|t| t.join}
If I run this code it take about 1.5 sec to write all 2000 numbers into the file. So, all is good.
But if I remove the line print "." marked with "THIS LINE" is takes ages! This code needs about 12sec for only 20 threads to complete.
Now my question: why does the print speed up that code so much?
I'm not sure how you can call that thread safe at all when it's simply not. You can't use a simple variable to ensure safety because of race conditions. What happens between testing that a flag is zero and setting it to one? You simply don't know. Anything can and will eventually happen in that very brief interval if you're unlucky enough.
What might be happening is the print statement causes the thread to stall long enough that your broken locking mechanism ends up working. When testing that example using Ruby 1.9.2 it doesn't even finish, printing dots seemingly forever.
You might want to try re-writing it using Mutex:
write_mutex = Mutex.new
read_mutex = Mutex.new
2000.times do
threads << Thread.new {
done_flag = false
while (!done_flag) do
print "." #### THIS LINE
write_mutex.synchronize do
read_mutex.synchronize do
add_to_file
done_flag = true
end
end
end
}
end
This is the proper Ruby way to do thread synchronization. A Mutex will not yield the lock until it is sure you have exclusive control over it. There's also the try_lock method that will try to grab it and will fail if it is already taken.
Threads can be a real nuisance to get right, so be very careful when using them.
First off, there are gems that can make this sort of thing easier. threach and jruby_threach ("threaded each") are ones that I wrote, and while I'm deeply unhappy with the implementation and will get around to making them cleaner at some point, they work fine when you have safe code.
(1..100).threach(2) {|i| do_something_with(i)} # run method in two threads
or
File.open('myfile.txt', 'r').threach(3, :each_line) {|line| process_line(line)}
You should also look at peach and parallel for other examples of easily working in parallel with multiple threads.
Above and beyond the problems already pointed out -- that your loop isn't thread-safe -- none of it matters because the code you're calling (add_to_file) isn't thread-safe. You're opening and closing the same file willy-nilly across threads, and that's gonna give you problems. I can't seem to understand what you're trying to do, but you need to keep in mind that you have absolutely no idea the order in which things in different threads are going to run.

Resources