Ruby threads and mutex - ruby

Why does the following ruby code not work?
2 | require 'thread'
3 |
4 | $mutex = Mutex.new
5 | $mutex.lock
6 |
7 | t = Thread.new {
8 | sleep 10
9 | $mutex.unlock
10 | }
11 |
12 | $mutex.lock
13 | puts "Delayed hello"
When I'm running it, I get an error:
./test.rb:13:in `lock': thread 0x7f4557856378 tried to join itself (ThreadError)
from ./test.rb:13
What is the right way to synchronize two threads without joining them (both threads must continue running after synchronization)?

This is old but I'm contributing since it's a bit scary that none of the other answers (at time of writing) seem to be correct. The original code is clearly attempting to:
Create a mutex in the main thread and lock it.
Start a new thread, which may begin running at any time and after any delay subject to the whims of the Ruby runtime.
Have this thread unlock the mutex only once it's finished doing its work.
Have the main thread then deliberately re-lock the mutex, with the intention that it's spawned a thread which will unlock it. The main thread waits for that.
Then the main thread continues running.
#user2413915: Your solution omits the step of locking again in the main thread, so it won't wait for the spawned thread as intended.
#Paul Rubel: Your code assumes that the spawned thread gets as far as its lock of the mutex before the main thread does. This is a race condition. If the main thread continues to execute and locks first, the spawned thread will be blocked until after the main thread has printed "Delayed hello", which is the exact opposite of the desired outcome. You probably ran it by pasting into the IRB prompt; if you try with your example modified so that the end and Mutex lock are on the same line, it'll fail, printing the message too early (i.e. "end; $mutex.lock"). Either way, it's relying on behaviour of the Ruby runtime that's working by chance.
The original code should actually work fine in principle, albeit arguably lacking in elegance - in practice the Ruby 1.9+ runtime won't allow it as it "sees" two consecutive locks in the main thread without an unlock and doesn't "realise" that there's a spawned thread which is going to do the unlocking. Ruby (in this case technically erroneously) raises a ThreadError deadlock exception.
Instead, make cunning use of the ruby Queue. When you try to pull something off a Queue, the call will block until an item is available. So:
require 'thread'
require 'queue'
queue = Queue.new
t = Thread.new {
sleep 10
queue.push( nil ) # Push any object you like - here, it's a NilClass instance
}
queue.pop() # Blocks until thread 't' pushes onto the queue
puts "Delayed hello"
If the spawned thread runs first and pushes onto the queue, then the main thread will just pop the item and keep going. If the main thread tries to pop before the spawned thread pushes, it'll wait for the spawned thread.
[Edit: Note that the object pushed onto the queue could be the results of the spawned thread's processing task, so the main thread gets to wait until processing is complete and get the processing result in one go].
I've tested this on Ruby 1.8.7-p375 and Ruby 2.1.2 via rbenv with success, so it's reasonable to assume that the standard library Queue class is functional across all common major Ruby versions.

You do not need to call the mutex on line 12 again.
require 'thread'
$mutex = Mutex.new
$mutex.lock
t = Thread.new {
sleep 10
$mutex.unlock
}
puts "Delayed hello"
This will work.

Related

Why do Ruby fibers that run sequentially without a scheduler set run concurrently when a scheduler is set?

I have the following Gemfile:
source "https://rubygems.org"
ruby "3.1.2"
gem "libev_scheduler", "~> 0.2"
and the following Ruby code in a file called main.rb:
require 'libev_scheduler'
set_sched = ARGV[0] == "--set-sched"
if set_sched then
Fiber.set_scheduler Libev::Scheduler.new
end
N_FIBERS = 5
fibers = []
N_FIBERS.times do |i|
n = i + 1
fiber = Fiber.new do
puts "Beginning calculation ##{n}..."
sleep 1
end
fibers.push({fiber: fiber, n: n})
end
fibers.each do |fiber|
fiber[:fiber].resume
end
puts "Finished all calculations!"
I'm executing the code with Ruby 3.1.2 installed via RVM.
When I run the program with time bundle exec ruby main.rb, I get the following output:
Beginning calculation #1...
Beginning calculation #2...
Beginning calculation #3...
Beginning calculation #4...
Beginning calculation #5...
Finished all calculations!
real 0m5.179s
user 0m0.146s
sys 0m0.027s
When I run the program with time bundle exec ruby main.rb --set-sched, I get the following output:
Beginning calculation #1...
Beginning calculation #2...
Beginning calculation #3...
Beginning calculation #4...
Beginning calculation #5...
Finished all calculations!
real 0m1.173s
user 0m0.150s
sys 0m0.021s
Why do my fibers only run concurrently when I've set a scheduler? Some older Stack Overflow answers (like this one) state that fibers are a construct for flow control, not concurrency, and that it is impossible to use fibers to write concurrent code. My results seem to contradict this.
My understanding so far of fibers is that they are meant for cooperative concurrency, as opposed to preemptive concurrency. Therefore, in order to get concurrency out of them, you'd need to have them yield to some other code as early as they can (ex. when IO begins) so that the other code can be executed while the fiber waits for its next opportunity to execute.
Based on this understanding, I think I understand why my code without a scheduler isn't able to run concurrently. It sleeps and because it lacks yield statements before and after code in it, there are no points in time where it could yield control to any other code I've written. But when I add a scheduler, it appears to somehow yield to something. Is sleep detecting the scheduler and yielding to it so that my code resuming the fibers is immediately yielded to, making it able to immediately resume all five fibers?
Great question!
As #stefan noted above, Ruby 3.0 introduced the concept of a "non-blocking fiber." The way the actual non-blocking behavior is accomplished is left up to the scheduler implementation. There is no default scheduler as far as I know; per the Ruby docs:
If Fiber.scheduler is not set in the current thread, blocking and non-blocking fibers’ behavior is identical.
Now, to answer your last question:
But when I add a scheduler, it appears to somehow yield to something ... Is sleep detecting the scheduler and yielding to it so that my code resuming the fibers is immediately yielded to, making it able to immediately resume all five fibers?
You're onto something! When you set a fiber scheduler, it's expected to conform to Fiber::SchedulerInterface, which defines several "hooks." One of those hooks is #kernel_sleep, which is invoked by Kernel#sleep (and Mutex#sleep)!
I can't say I've read much libev code, but you can find libev_scheduler's implementation of that hook here.
The idea is (emphasis my own):
The scheduler runs into a wait loop, checking all the blocked fibers (which it has registered on hook calls) and resuming them when the awaited resource is ready (e.g. I/O ready or sleep time elapsed).
So, in summary:
Your fiber calls Kernel#sleep with some duration.
Kernel#sleep calls the scheduler's #kernel_sleep hook with that same duration.
The schedule "somehow registers what the current fiber is waiting on, and yields control to other fibers with Fiber.yield" (quote from the docs there)
"The scheduler runs into a wait loop, checking all the blocked fibers (which it has registered on hook calls) and resuming them when the awaited resource is ready (e.g. I/O ready or sleep time elapsed)."
Hope this helps!

Celluloid async inside ruby blocks does not work

Trying to implement Celluloid async on my working example seem to exhibit weird behavior.
here my code looks
class Indefinite
include Celluloid
def run!
loop do
[1].each do |i|
async.on_background
end
end
end
def on_background
puts "Running in background"
end
end
Indefinite.new.run!
but when I run the above code, I never see the puts "Running in Background"
But, if I put a sleep the code seem to work.
class Indefinite
include Celluloid
def run!
loop do
[1].each do |i|
async.on_background
end
sleep 0.5
end
end
def on_background
puts "Running in background"
end
end
Indefinite.new.run!
Any idea? why such a difference in the above two scenario.
Thanks.
Your main loop is dominating the actor/application's threads.
All your program is doing is spawning background processes, but never running them. You need that sleep in the loop purely to allow the background threads to get attention.
It is not usually a good idea to have an unconditional loop spawn infinite background processes like you have here. There ought to be either a delay, or a conditional statement put in there... otherwise you just have an infinite loop spawning things that never get invoked.
Think about it like this: if you put puts "looping" just inside your loop, while you do not see Running in the background ... you will see looping over and over and over.
Approach #1: Use every or after blocks.
The best way to fix this is not to use sleep inside a loop, but to use an after or every block, like this:
every(0.1) {
on_background
}
Or best of all, if you want to make sure the process runs completely before running again, use after instead:
def run_method
#running ||= false
unless #running
#running = true
on_background
#running = false
end
after(0.1) { run_method }
end
Using a loop is not a good idea with async unless there is some kind of flow control done, or a blocking process such as with #server.accept... otherwise it will just pull 100% of the CPU core for no good reason.
By the way, you can also use now_and_every as well as now_and_after too... this would run the block right away, then run it again after the amount of time you want.
Using every is shown in this gist:
https://gist.github.com/digitalextremist/686f42e58a58b743142b
The ideal situation, in my opinion:
This is a rough but immediately usable example:
https://gist.github.com/digitalextremist/12fc824c6a4dbd94a9df
require 'celluloid/current'
class Indefinite
include Celluloid
INTERVAL = 0.5
ONE_AT_A_TIME = true
def self.run!
puts "000a Instantiating."
indefinite = new
indefinite.run
puts "000b Running forever:"
sleep
end
def initialize
puts "001a Initializing."
#mutex = Mutex.new if ONE_AT_A_TIME
#running = false
puts "001b Interval: #{INTERVAL}"
end
def run
puts "002a Running."
unless ONE_AT_A_TIME && #running
if ONE_AT_A_TIME
#mutex.synchronize {
puts "002b Inside lock."
#running = true
on_background
#running = false
}
else
puts "002b Without lock."
on_background
end
end
puts "002c Setting new timer."
after(INTERVAL) { run }
end
def on_background
if ONE_AT_A_TIME
puts "003 Running background processor in foreground."
else
puts "003 Running in background"
end
end
end
Indefinite.run!
puts "004 End of application."
This will be its output, if ONE_AT_A_TIME is true:
000a Instantiating.
001a Initializing.
001b Interval: 0.5
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
000b Running forever:
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
002a Running.
002b Inside lock.
003 Running background processor in foreground.
002c Setting new timer.
And this will be its output if ONE_AT_A_TIME is false:
000a Instantiating.
001a Initializing.
001b Interval: 0.5
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
000b Running forever:
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
002a Running.
002b Without lock.
003 Running in background
002c Setting new timer.
You need to be more "evented" than "threaded" to properly issue tasks and preserve scope and state, rather than issue commands between threads/actors... which is what the every and after blocks provide. And besides that, it's good practice either way, even if you didn't have a Global Interpreter Lock to deal with, because in your example, it doesn't seem like you are dealing with a blocking process. If you had a blocking process, then by all means have an infinite loop. But since you're just going to end up spawning an infinite number of background tasks before even one is processed, you need to either use a sleep like your question started with, or use a different strategy altogether, and use every and after which is how Celluloid itself encourages you to operate when it comes to handling data on sockets of any kind.
Approach #2: Use a recursive method call.
This just came up in the Google Group. The below example code will actually allow execution of other tasks, even though it's an infinite loop.
https://groups.google.com/forum/#!topic/celluloid-ruby/xmkdrMQBGbY
This approach is less desirable because it will likely have more overhead, spawning a series of fibers.
def work
# ...
async.work
end
Question #2: Thread vs. Fiber behaviors.
The second question is why the following would work: loop { Thread.new { puts "Hello" } }
That spawns an infinite number of process threads, which are managed by the RVM directly. Even though there is a Global Interpreter Lock in the RVM you are using... that only means no green threads are used, which are provided by the operating system itself... instead these are handled by the process itself. The CPU scheduler for the process runs each Thread itself, without hesitation. And in the case of the example, the Thread runs very quickly and then dies.
Compared to an async task, a Fiber is used. So what's happening is this, in the default case:
Process starts.
Actor instantiated.
Method call invokes loop.
Loop invokes async method.
async method adds task to mailbox.
Mailbox is not invoked, and loop continues.
Another async task is added to the mailbox.
This continues infinitely.
The above is because the loop method itself is a Fiber call, which is not ever being suspended ( unless a sleep is called! ) and therefore the additional task added to the mailbox is never an invoking a new Fiber. A Fiber behaves differently than a Thread. This is a good piece of reference material discussing the differences:
https://blog.engineyard.com/2010/concurrency-real-and-imagined-in-mri-threads
Question #3: Celluloid vs. Celluloid::ZMQ behavior.
The third question is why include Celluloid behaves differently than Celluloid::ZMQ ...
That's because Celluloid::ZMQ uses a reactor-based evented mailbox, versus Celluloid which uses a condition variable based mailbox.
Read more about pipelining and execution modes:
https://github.com/celluloid/celluloid/wiki/Pipelining-and-execution-modes
That is the difference between the two examples. If you have additional questions about how these mailboxes behave, feel free to post on the Google Group ... the main dynamic you are facing is the unique nature of the GIL interacting with the Fiber vs. Thread vs. Reactor behavior.
You can read more about the reactor-pattern here:
http://en.wikipedia.org/wiki/Reactor_pattern
Explanation of the "Reactor pattern"
What is the difference between event driven model and reactor pattern?
And see the specific reactor used by Celluloid::ZMQ here:
https://github.com/celluloid/celluloid-zmq/blob/master/lib/celluloid/zmq/reactor.rb
So what's happening in the evented mailbox scenario, is that when sleep is hit, that is a blocking call, which causes the reactor to move to the next task in the mailbox.
But also, and this is unique to your situation, the specific reactor being used by Celluloid::ZMQ is using an eternal C library... specifically the 0MQ library. That reactor is external to your application, which behaves differently than Celluloid::IO or Celluloid itself, and that is also why the behavior is occurring differently than you expected.
Multi-core Support Alternative
If maintaining state and scope is not important to you, if you use jRuby or Rubinius which are not limited to one operating system thread, versus using MRI which has the Global Interpreter Lock, you can instantiate more than one actor and issue async calls between actors concurrently.
But my humble opinion is that you would be much better served using a very high frequency timer, such as 0.001 or 0.1 in my example, which will seem instantaneous for all intents and purposes, but also allow the actor thread plenty of time to switch fibers and run other tasks in the mailbox.
Let's make an experiment, by modifying your example a bit (we modify it because this way we get the same "weird" behaviour, while making things clearner):
class Indefinite
include Celluloid
def run!
(1..100).each do |i|
async.on_background i
end
puts "100 requests sent from #{Actor.current.object_id}"
end
def on_background(num)
(1..100000000).each {}
puts "message #{num} on #{Actor.current.object_id}"
end
end
Indefinite.new.run!
sleep
# =>
# 100 requests sent from 2084
# message 1 on 2084
# message 2 on 2084
# message 3 on 2084
# ...
You can run it on any Ruby interpreter, using Celluloid or Celluloid::ZMQ, the result always will be the same. Also note that, output from Actor.current.object_id is the same in both methods, giving us the clue, that we are dealing with a single actor in our experiment.
So there is not much difference between ruby and Celluloid implementations, as long as this experiment is concerned.
Let's first address why this code behaves in this way?
It's not hard to understand why it's happening. Celluloid is receiving incoming requests and saving them in the queue of tasks for appropriate actor. Note, that our original call to run! is on the top of the queue.
Celluloid then processes those tasks, one at a time. If there happens to be a blocking call or sleep call, according to the documentation, the next task will be invoked, not waiting for the current task to be completed.
Note, that in our experiment there are no blocking calls. It means, that the run! method will be executed from the beginning to the end, and only after it's done, each of the on_background calls will be invoked in the perfect order.
And it's how it's supposed to work.
If you add sleep call in your code, it will notify Celluloid, that it should start processing of the next task in queue. Thus, the behavior, you have in your second example.
Let's now continue to the part on how to design the system, so that it does not depend on sleep calls, which is weird at least.
Actually there is a good example at Celluloid-ZMQ project page. Note this loop:
def run
loop { async.handle_message #socket.read }
end
The first thing it does is #socket.read. Note that it's a blocking operation. So, Celluloid will process to the next message in the queue (if there are any). As soon as #socket.read responds, a new task will be generated. But this task won't be executed before #socket.read is called again, thus blocking execution, and notifying Celluloid to process with the next item on the queue.
You probably see the difference with your example. You are not blocking anything, thus not giving Celluloid a chance to process with queue.
How can we get behavior given in Celluloid::ZMQ example?
The first (in my opinion, better) solution is to have actual blocking call, like #socket.read.
If there are no blocking calls in your code and you still need to process things in background, then you should consider other mechanisms provided by Celluloid.
There are several options with Celluloid.
One can use conditions, futures, notifications, or just calling wait/signal on low level, like in this example:
class Indefinite
include Celluloid
def run!
loop do
async.on_background
result = wait(:background) #=> 33
end
end
def on_background
puts "background"
# notifies waiters, that they can continue
signal(:background, 33)
end
end
Indefinite.new.run!
sleep
# ...
# background
# background
# background
# ...
Using sleep(0) with Celluloid::ZMQ
I also noticed working.rb file you mentioned in your comment. It contains the following loop:
loop { [1].each { |i| async.handle_message 'hello' } ; sleep(0) }
It looks like it's doing the proper job. Actually, running it under jRuby revealed, it's leaking memory. To make it even more apparent, try to add a sleep call into the handle_message body:
def handle_message(message)
sleep 0.5
puts "got message: #{message}"
end
High memory usage is probably related to the fact, that queue is filled very fast and cannot be processed in given time. It will be more problematic, if handle_message is more work-intensive, then it's now.
Solutions with sleep
I'm skeptical about solutions with sleep. They potentially require much memory and even generate memory leaks. And it's not clear what should you pass as a parameter to the sleep method and why.
How threads work with Celluloid
Celluloid is not creating a new thread for each asynchronous task. It has a pool of threads in which it runs every task, synchronous and asynchronous ones. The key point is that the library sees the run! function as a synchronous task, and performs it in the same context than an asynchronous task.
By default, Celluloid runs everything in a single thread, using a queue system to schedule asynchronous tasks for later. It creates new threads only when needed.
Besides that, Celluloid overrides the sleep function. It means that every time you call sleep in a class extending the Celluloid class, the library will check if there are non-sleeping threads in its pool.
In your case, the first time you call sleep 0.5, it will create a new Thread to perform the asynchronous tasks in the queue while the first thread is sleeping.
So in your first example, only one Celluloid thread is running, performing the loop. In your second example, two Celluloid threads are running, the first one performing the loop and sleeping at each iteration, the other one performing the background task.
You could for instance change your first example to perform a finite number of iterations:
def run!
(0..100).each do
[1].each do |i|
async.on_background
end
end
puts "Done!"
end
When using this run! function, you'll see that Done! is printed before all the Running in background, meaning that Celluloid finishes the execution of the run! function before starting the asynchronous tasks in the same thread.

ruby multithreading - stop and resume specific thread

I want to be able to stop and run specific thread in ruby in the following context:
thread_hash = Hash.new()
loop do
Thread.start(call.function) do |execute|
operation = execute.extract(some_value_from_incoming_message)
if thread_hash.has_key? operation
thread_hash[operation].run
elsif !thread_hash.has_key?
thread_hash[operation] = Thread.current
do_something_else_1
Thread.stop
do_something_else_2
Thread.stop
do_something_else_3
thread_hash.delete(operation)
else
exit
end
end
end
In human language script above acts as a server which receives a message, extracts some parameter from the incoming message. If that parameter is already in the thread_hash, suspended thread should be resumed.
If the parameter is not present in the thread_hash, parameter along with thread id is stored in the thread_hash, some function is executed and current thread is suspended until resumed in the new loop and again until do_something_else_3 function is executed and operation serviced in the current thread is removed from hash.
Can thread be resumed in Ruby based on thread id or should new thread be given name during start like
thr = Thread.start
and can be resumed only by this name like:
thr.run
Is the solution described above realistic? Could it cause some sort of leak/deadlock due to old thread resumption in the new thread or redundant threads are automatically taken care of by Ruby?
It sounds to me like you're trying to do everything in every thread: read input, run existing threads, store new threads, delete old threads. Why not break up the problem?
hash = {}
loop do
operation = get_value_from message
if hash[operation] and hash[operation].alive?
hash[operation].wakeup
else
hash[operation] = Thread.new do
do_something1
Thread.stop
do_something2
Thread.stop
do_something3
end
end
end
Instead of wrapping the whole contents of the loop in a thread, only thread the message processing code. That lets it run in the background while the loop goes back to waiting for a message. This solves any sort of race/deadlock problem since all of the thread management occurs in the main thread.

Mutexes not working, using queues works. Why?

In this example I'm looking to sync two puts, in a way that the output will be ababab..., without any double as or bs on the output.
I have three examples for that: Using a queue, using mutexes in memory and using mutex with files. The queue example work just fine, but the mutexes don't.
I'm not looking for a working code. I'm looking to understand why using a queue it works, and using mutexes don't. By my understanding, they are supposed to be equivalent.
Queue example: Work.
def a
Thread.new do
$queue.pop
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$queue << true
end
end
$queue = Queue.new
$queue << true
loop{a; sleep(rand)}
Mutex file example: Don't work.
def a
Thread.new do
$mutex.flock(File::LOCK_EX)
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$mutex.flock(File::LOCK_UN)
end
end
MUTEX_FILE_PATH = '/tmp/mutex'
File.open(MUTEX_FILE_PATH, "w") unless File.exists?(MUTEX_FILE_PATH)
$mutex = File.new(MUTEX_FILE_PATH,"r+")
loop{a; sleep(rand)}
Mutex variable example: Don't work.
def a
Thread.new do
$mutex.lock
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$mutex.unlock
end
end
$mutex = Mutex.new
loop{a; sleep(rand)}
Short answer
Your use of the mutex is incorrect. With Queue, you can populate with one thread and then pop it with another, but you cannot lock a Mutex with one one thread and then unlock with another.
As #matt explained, there are several subtle things happening like the mutex getting unlocked automatically and the silent exceptions you don't see.
How Mutexes Are Commonly Used
Mutexes are used to access a particular shared resource, like a variable or a file. The synchronization of variables and files consequently allow multiple threads to be synchronized. Mutexes don't really synchronize threads by themselves.
For example:
thread_a and thread_b could be synchronized via a shared boolean variable such as true_a_false_b.
You'd have to access, test, and toggle that boolean variable every time you use it - a multistep process.
It's necessary to ensure that this multistep process occurs atomically, i.e. is not interrupted. This is when you would use a mutex. A trivialized example follows:
require 'thread'
Thread.abort_on_exception = true
true_a_false_b = true
mutex = Mutex.new
thread_a = Thread.new do
loop do
mutex.lock
if true_a_false_b
puts "a"
true_a_false_b = false
end
mutex.unlock
end
end
thread_b = Thread.new do
loop do
mutex.lock
if !true_a_false_b
puts "b"
true_a_false_b = true
end
mutex.unlock
end
sleep(1) # if in irb/console, yield the "current" thread to thread_a and thread_b
In your mutex example, the thread created in method b sleeps for a while, prints b then tries to unlock the mutex. This isn’t legal, a thread cannot unlock a mutex unless it already holds that lock, and raises an ThreadError if you try:
m = Mutex.new
m.unlock
results in:
release.rb:2:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
from release.rb:2:in `<main>'
You won’t see this in your example because by default Ruby silently ignores exceptions raised in threads other than the main thread. You can change this using Thread::abort_on_exception= – if you add
Thread.abort_on_exception = true
to the top of your file you’ll see something like:
a
b
with-mutex.rb:15:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
from with-mutex.rb:15:in `block in b'
(you might see more than one a, but there’ll only be one b).
In the a method you create threads that acquire a lock, print a, call another method (that creates a new thread and returns straight away) and then terminate. It doesn’t seem to be well documented but when a thread terminates it releases any locks it has, so in this case the lock is released almost immediately allowing other a threads to run.
Overall the lock doesn’t have much effect. It doesn’t prevent the b threads from running at all, and whilst it does prevent two a threads running at the same time, it is released as soon as the thread holding it exits.
I think you might be thinking of semaphores, and whilst the Ruby docs say “Mutex implements a simple semaphore” they are not quite the same.
Ruby doesn’t provide semaphores in the standard library, but it does provide condition variables. (That link goes to the older 2.0.0 docs. The thread standard library is required by default in Ruby 2.1+, and the move seems to have resulted in the current docs not being available. Also be aware that Ruby also has a separate monitor library which (I think) adds the same features (mutexes and condition variables) in a more object-orientated fashion.)
Using condition variables and mutexes you can control the coordination between threads. Uri Agassi’s answer shows one possible way to do that (although I think there’s a race condition with how his solution gets started).
If you look at the source for Queue (again this is a link to 2.0.0 – the thread library has been converted to C in recent versions and the Ruby version is easier to follow) you can see that it is implemented with Mutexes and ConditionVariables. When you call $queue.pop in the a thread in your queue example you end up calling wait on the mutex in the same way as Uri Agassi’s answer calls $cv.wait($mutex) in his method a. Similarly when you call $queue << true in your b thread you end up calling signal on the condition variable in the same way as Uri Agassi’s calls $cv.signal in his b thread.
The main reason your file locking example doesn’t work is that file locking provides a way for multiple processes to coordinate with each other (usually so only one tries to write to a file at the same time) and doesn’t help with coordinating threads within a process. Your file locking code is structured in a similar way to the mutex example so it’s likely it would suffer the same problems.
The problem with file-based version has not been sorted out properly.
The reason why it does not work is that f.flock(File::LOCK_EX) does not block if called on the same file f multiple times.
This can be checked with this simple sequential program:
require 'thread'
MUTEX_FILE_PATH = '/tmp/mutex'
$fone= File.new( MUTEX_FILE_PATH, "w")
$ftwo= File.open( MUTEX_FILE_PATH)
puts "start"
$fone.flock( File::LOCK_EX)
puts "locked"
$fone.flock( File::LOCK_EX)
puts "so what"
$ftwo.flock( File::LOCK_EX)
puts "dontcare"
which prints everything except dontcare.
So the file-based program does not work because
$mutex.flock(File::LOCK_EX)
never blocks.

ruby thread block?

I read somewhere that ruby threads/fibre block the IO even with 1.9. Is this true and what does it truly mean? If I do some net/http stuff on multiple threads, is only 1 thread running at a given time for that request?
thanks
Assuming you are using CRuby, only one thread will be running at a time. However, the requests will be made in parallel, because each thread will be blocked on its IO while its IO is not finished. So if you do something like this:
require 'open-uri'
threads = 10.times.map do
Thread.new do
open('http://example.com').read.length
end
end
threads.map &:join
puts threads.map &:value
it will be much faster than doing it sequentially.
Also, you can check to see if a thread is finished w/o blocking on it's completion.
For example:
require 'open-uri'
thread = Thread.new do
sleep 10
open('http://example.com').read.length
end
puts 'still running' until thread.join(5)
puts thread.value
With CRuby, the threads cannot run at the same time, but they are still useful. Some of the other implementations, like JRuby, have real threads and can run multiple threads in parallel.
Some good references:
http://yehudakatz.com/2010/08/14/threads-in-ruby-enough-already/
http://www.engineyard.com/blog/2011/ruby-concurrency-and-you/
All threads run simultaneously but IO will be blocked until they all finish.
In other words, threading doesn't give you the ability to "background" a process. The interpreter will wait for all of the threads to complete before sending further messages.
This is good if you think about it because you don't have to wonder about whether they are complete if your next process uses data that the thread is modifying/working with.
If you want to background processes checkout delayed_job

Resources