What does Thread.exclusive actually do? - ruby

The information from the API is very sparse - also considering that Thread.critical appears not to be documented.
Wraps a block in Thread.critical, restoring the original value upon exit from the critical section, and returns the value of the block.

tl;dr: It makes the given block execute, so that no other ruby-thread can interrupt it.
The documentation of Thread.exclusive is misleading, as it mentions Thread.critical= which was removed in ruby version 1.9. However, we can deduce what it does by looking at the ruby source code.
In MRI Thread.exclusive is defined in prelude.rb. As it is a pretty short file, I'll cite it's content here:
class Thread
MUTEX_FOR_THREAD_EXCLUSIVE = Mutex.new # :nodoc:
def self.exclusive
MUTEX_FOR_THREAD_EXCLUSIVE.synchronize{
yield
}
end
end
Here the Thread class is extended by a static constant MUTEX_FOR_THREAD_EXCLUSIVE, which holds a Mutex. When exclusive is called, we ask the mutex to synchronize execution of the block.
As stated in Mutex' documentation, synchronize obtains a lock, runs the block, and releases the lock when the block completes.
Because the mutex is thread-global state, only one single Thread can hold it at the same time.
So this works:
# no other Thread can do something between this puts statements
Thread.exclusive do
puts 1
puts 2
end
But this won't:
Thread.exclusive do
puts 1
Thread.exclusive do
puts 2
end
puts 3
end
because of this error: ThreadError: deadlock; recursive locking.

Related

Why won't my simple Connection Pool execute these simple put statements?

I posted a question regarding how to effectively manage threads here How do I properly use Threads to connect ping a url?
I got some great recommendations and tips regarding pools, thread safety, and some libraries and gems to use. I'm trying to execute one of the recommendations listed by using concurrent-ruby to create a thread/connction pool to execute some threads. In a simple ruby file I have the following code:
pool = Concurrent::FixedThreadPool.new(5)
pool.post do
puts 'hello'
end
As per the documentation in concurrent-ruby I've done the required steps but my code won't execute. No puts statement is being executed. Here is another example:
pool = Concurrent::FixedThreadPool.new(5)
array = []
pool.post do
array << 1
puts 'Why am I not working?'
end
puts array.size
The size of this array is 0. the code in the pool is not executing. I would have at least expected a size of 1. I've followed the example to a tee. Why is this code not executing?
Your code is correct and the block is successfully pushed to the pool. However, before it gets executed, the program terminates and kills the pool. That's why you don't see any output - it did not have enough time to execute the job.
You can either add sleep statement at the end or, for more elegant solution, tell the pool to finish all the work and shut down. This will look like this:
require 'concurrent-ruby'
pool = Concurrent::FixedThreadPool.new(5)
pool.post do
puts 'hello'
end
pool.shutdown
pool.wait_for_termination

Ruby thread safe thread creation

I came across this bit of code today:
#thread ||= Thread.new do
# this thread should only spin up once
end
It is being called by multiple threads. I was worried that if multiple threads are calling this code that you could have multiple threads created since #thread access is not synchronized. However, I was told that this could not happen because of the Global Interpreter Lock. I did a little bit of reading about threads in Ruby and it seems like individual threads that are running Ruby code can get preempted by other threads. If this is the case, couldn't you have an interleaving like this:
Thread A Thread B
======== ========
Read from #thread .
Thread.New .
[Thread A preempted] .
. Read from #thread
. Thread.New
. Write to #thread
Write to #thread
Additionally, since access to #thread is not synchronized are writes to #thread guaranteed to be visible to all other threads? The memory models of other languages I've used in the past do not guarantee visibility of writes to memory unless you synchronize access to that memory using atomics, mutexes, etc.
I'm still learning Ruby and realize I have a long way to go to understanding concurrency in Ruby. Any help on this would be super appreciated!
You need a mutex. Essentially the only thing the GIL protects you from is accessing uninitialized memory. If something in Ruby could be well-defined without being atomic, you should not assume it is atomic.
A simple example to show that your example ordering is possible. I get the "double set" message every time I run it:
$global = nil
$thread = nil
threads = []
threads = Array.new(1000) do
Thread.new do
sleep 1
$thread ||= Thread.new do
if $global
warn "double set!"
else
$global = true
end
end
end
end
threads.each(&:join)

How to pass a block to a yielding thread in Ruby

I am trying to wrap my head around Threads and Yielding in Ruby, and I have a question about how to pass a block to a yielding thread.
Specifically, I have a thread that is sleeping, and waiting to be told to do something, and I would like that Thread to execute a different block if told to (ie, it is sleeping, and if a user presses a button, do something besides sleep).
Say I have code like this:
window = Thread.new do
#thread1 = Thread.new do
# Do some cool stuff
# Decide it is time to sleep
until #told_to_wakeup
if block_given?
yield
end
sleep(1)
end
# At some point after #thread1 starts sleeping,
# a user might do something, so I want to execute
# some code in ##thread1 (unfortunately spawning a new thread
# won't work correctly in my case)
end
Is it possible to do that?
I tried using ##thread1.send(), but send was looking for a method name.
Thanks for taking the time to look at this!
Here's a simple worker thread:
queue = Queue.new
worker = Thread.new do
# Fetch an item from the work queue, or wait until one is available
while (work = queue.pop)
# ... Do something with work
end
end
queue.push(thing: 'to do')
The pop method will block until something is pushed into the queue.
When you're done you can push in a deliberately empty job:
queue.push(nil)
That will make the worker thread exit.
You can always expand on that functionality to do more things, or to handle more conditions.

Mutexes not working, using queues works. Why?

In this example I'm looking to sync two puts, in a way that the output will be ababab..., without any double as or bs on the output.
I have three examples for that: Using a queue, using mutexes in memory and using mutex with files. The queue example work just fine, but the mutexes don't.
I'm not looking for a working code. I'm looking to understand why using a queue it works, and using mutexes don't. By my understanding, they are supposed to be equivalent.
Queue example: Work.
def a
Thread.new do
$queue.pop
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$queue << true
end
end
$queue = Queue.new
$queue << true
loop{a; sleep(rand)}
Mutex file example: Don't work.
def a
Thread.new do
$mutex.flock(File::LOCK_EX)
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$mutex.flock(File::LOCK_UN)
end
end
MUTEX_FILE_PATH = '/tmp/mutex'
File.open(MUTEX_FILE_PATH, "w") unless File.exists?(MUTEX_FILE_PATH)
$mutex = File.new(MUTEX_FILE_PATH,"r+")
loop{a; sleep(rand)}
Mutex variable example: Don't work.
def a
Thread.new do
$mutex.lock
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$mutex.unlock
end
end
$mutex = Mutex.new
loop{a; sleep(rand)}
Short answer
Your use of the mutex is incorrect. With Queue, you can populate with one thread and then pop it with another, but you cannot lock a Mutex with one one thread and then unlock with another.
As #matt explained, there are several subtle things happening like the mutex getting unlocked automatically and the silent exceptions you don't see.
How Mutexes Are Commonly Used
Mutexes are used to access a particular shared resource, like a variable or a file. The synchronization of variables and files consequently allow multiple threads to be synchronized. Mutexes don't really synchronize threads by themselves.
For example:
thread_a and thread_b could be synchronized via a shared boolean variable such as true_a_false_b.
You'd have to access, test, and toggle that boolean variable every time you use it - a multistep process.
It's necessary to ensure that this multistep process occurs atomically, i.e. is not interrupted. This is when you would use a mutex. A trivialized example follows:
require 'thread'
Thread.abort_on_exception = true
true_a_false_b = true
mutex = Mutex.new
thread_a = Thread.new do
loop do
mutex.lock
if true_a_false_b
puts "a"
true_a_false_b = false
end
mutex.unlock
end
end
thread_b = Thread.new do
loop do
mutex.lock
if !true_a_false_b
puts "b"
true_a_false_b = true
end
mutex.unlock
end
sleep(1) # if in irb/console, yield the "current" thread to thread_a and thread_b
In your mutex example, the thread created in method b sleeps for a while, prints b then tries to unlock the mutex. This isn’t legal, a thread cannot unlock a mutex unless it already holds that lock, and raises an ThreadError if you try:
m = Mutex.new
m.unlock
results in:
release.rb:2:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
from release.rb:2:in `<main>'
You won’t see this in your example because by default Ruby silently ignores exceptions raised in threads other than the main thread. You can change this using Thread::abort_on_exception= – if you add
Thread.abort_on_exception = true
to the top of your file you’ll see something like:
a
b
with-mutex.rb:15:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
from with-mutex.rb:15:in `block in b'
(you might see more than one a, but there’ll only be one b).
In the a method you create threads that acquire a lock, print a, call another method (that creates a new thread and returns straight away) and then terminate. It doesn’t seem to be well documented but when a thread terminates it releases any locks it has, so in this case the lock is released almost immediately allowing other a threads to run.
Overall the lock doesn’t have much effect. It doesn’t prevent the b threads from running at all, and whilst it does prevent two a threads running at the same time, it is released as soon as the thread holding it exits.
I think you might be thinking of semaphores, and whilst the Ruby docs say “Mutex implements a simple semaphore” they are not quite the same.
Ruby doesn’t provide semaphores in the standard library, but it does provide condition variables. (That link goes to the older 2.0.0 docs. The thread standard library is required by default in Ruby 2.1+, and the move seems to have resulted in the current docs not being available. Also be aware that Ruby also has a separate monitor library which (I think) adds the same features (mutexes and condition variables) in a more object-orientated fashion.)
Using condition variables and mutexes you can control the coordination between threads. Uri Agassi’s answer shows one possible way to do that (although I think there’s a race condition with how his solution gets started).
If you look at the source for Queue (again this is a link to 2.0.0 – the thread library has been converted to C in recent versions and the Ruby version is easier to follow) you can see that it is implemented with Mutexes and ConditionVariables. When you call $queue.pop in the a thread in your queue example you end up calling wait on the mutex in the same way as Uri Agassi’s answer calls $cv.wait($mutex) in his method a. Similarly when you call $queue << true in your b thread you end up calling signal on the condition variable in the same way as Uri Agassi’s calls $cv.signal in his b thread.
The main reason your file locking example doesn’t work is that file locking provides a way for multiple processes to coordinate with each other (usually so only one tries to write to a file at the same time) and doesn’t help with coordinating threads within a process. Your file locking code is structured in a similar way to the mutex example so it’s likely it would suffer the same problems.
The problem with file-based version has not been sorted out properly.
The reason why it does not work is that f.flock(File::LOCK_EX) does not block if called on the same file f multiple times.
This can be checked with this simple sequential program:
require 'thread'
MUTEX_FILE_PATH = '/tmp/mutex'
$fone= File.new( MUTEX_FILE_PATH, "w")
$ftwo= File.open( MUTEX_FILE_PATH)
puts "start"
$fone.flock( File::LOCK_EX)
puts "locked"
$fone.flock( File::LOCK_EX)
puts "so what"
$ftwo.flock( File::LOCK_EX)
puts "dontcare"
which prints everything except dontcare.
So the file-based program does not work because
$mutex.flock(File::LOCK_EX)
never blocks.

Parallelism in Ruby

I've got a loop in my Ruby build script that iterates over each project and calls msbuild and does various other bits like minify CSS/JS.
Each loop iteration is independent of the others so I'd like to parallelise it.
How do I do this?
I've tried:
myarray.each{|item|
Thread.start {
# do stuff
}
}
puts "foo"
but Ruby just seems to exit straight away (prints "foo"). That is, it runs over the loop, starts a load of threads, but because there's nothing after the each, Ruby exits killing the other threads :(
I know I can do thread.join, but if I do this inside the loop then it's no longer parallel.
What am I missing?
I'm aware of http://peach.rubyforge.org/ but using that I get all kinds of weird behaviour that look like variable scoping issues that I don't know how to solve.
Edit
It would be useful if I could wait for all child-threads to execute before putting "foo", or at least the main ruby thread exiting. Is this possible?
Store all your threads in an array and loop through the array calling join:
threads = myarray.map do |item|
Thread.start do
# do stuff
end
end
threads.each { |thread| thread.join }
puts "foo"
Use em-synchrony here :). Fibers are cute.
require "em-synchrony"
require "em-synchrony/fiber_iterator"
# if you realy need to get a Fiber per each item
# in real life you could set concurrency to, for example, 10 and it could even improve performance
# it depends on amount of IO in your job
concurrency = myarray.size
EM.synchrony do
EM::Synchrony::FiberIterator.new(myarray, concurrency).each do |url|
# do some job here
end
EM.stop
end
Take into account that ruby threads are green threads, so you dont have natively true parallelism. I f this is what you want I would recommend you to take a look to JRuby and Rubinius:
http://www.engineyard.com/blog/2011/concurrency-in-jruby/

Resources