Mutexes not working, using queues works. Why? - ruby

In this example I'm looking to sync two puts, in a way that the output will be ababab..., without any double as or bs on the output.
I have three examples for that: Using a queue, using mutexes in memory and using mutex with files. The queue example work just fine, but the mutexes don't.
I'm not looking for a working code. I'm looking to understand why using a queue it works, and using mutexes don't. By my understanding, they are supposed to be equivalent.
Queue example: Work.
def a
Thread.new do
$queue.pop
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$queue << true
end
end
$queue = Queue.new
$queue << true
loop{a; sleep(rand)}
Mutex file example: Don't work.
def a
Thread.new do
$mutex.flock(File::LOCK_EX)
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$mutex.flock(File::LOCK_UN)
end
end
MUTEX_FILE_PATH = '/tmp/mutex'
File.open(MUTEX_FILE_PATH, "w") unless File.exists?(MUTEX_FILE_PATH)
$mutex = File.new(MUTEX_FILE_PATH,"r+")
loop{a; sleep(rand)}
Mutex variable example: Don't work.
def a
Thread.new do
$mutex.lock
puts "a"
b
end
end
def b
Thread.new do
sleep(rand)
puts "b"
$mutex.unlock
end
end
$mutex = Mutex.new
loop{a; sleep(rand)}

Short answer
Your use of the mutex is incorrect. With Queue, you can populate with one thread and then pop it with another, but you cannot lock a Mutex with one one thread and then unlock with another.
As #matt explained, there are several subtle things happening like the mutex getting unlocked automatically and the silent exceptions you don't see.
How Mutexes Are Commonly Used
Mutexes are used to access a particular shared resource, like a variable or a file. The synchronization of variables and files consequently allow multiple threads to be synchronized. Mutexes don't really synchronize threads by themselves.
For example:
thread_a and thread_b could be synchronized via a shared boolean variable such as true_a_false_b.
You'd have to access, test, and toggle that boolean variable every time you use it - a multistep process.
It's necessary to ensure that this multistep process occurs atomically, i.e. is not interrupted. This is when you would use a mutex. A trivialized example follows:
require 'thread'
Thread.abort_on_exception = true
true_a_false_b = true
mutex = Mutex.new
thread_a = Thread.new do
loop do
mutex.lock
if true_a_false_b
puts "a"
true_a_false_b = false
end
mutex.unlock
end
end
thread_b = Thread.new do
loop do
mutex.lock
if !true_a_false_b
puts "b"
true_a_false_b = true
end
mutex.unlock
end
sleep(1) # if in irb/console, yield the "current" thread to thread_a and thread_b

In your mutex example, the thread created in method b sleeps for a while, prints b then tries to unlock the mutex. This isn’t legal, a thread cannot unlock a mutex unless it already holds that lock, and raises an ThreadError if you try:
m = Mutex.new
m.unlock
results in:
release.rb:2:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
from release.rb:2:in `<main>'
You won’t see this in your example because by default Ruby silently ignores exceptions raised in threads other than the main thread. You can change this using Thread::abort_on_exception= – if you add
Thread.abort_on_exception = true
to the top of your file you’ll see something like:
a
b
with-mutex.rb:15:in `unlock': Attempt to unlock a mutex which is not locked (ThreadError)
from with-mutex.rb:15:in `block in b'
(you might see more than one a, but there’ll only be one b).
In the a method you create threads that acquire a lock, print a, call another method (that creates a new thread and returns straight away) and then terminate. It doesn’t seem to be well documented but when a thread terminates it releases any locks it has, so in this case the lock is released almost immediately allowing other a threads to run.
Overall the lock doesn’t have much effect. It doesn’t prevent the b threads from running at all, and whilst it does prevent two a threads running at the same time, it is released as soon as the thread holding it exits.
I think you might be thinking of semaphores, and whilst the Ruby docs say “Mutex implements a simple semaphore” they are not quite the same.
Ruby doesn’t provide semaphores in the standard library, but it does provide condition variables. (That link goes to the older 2.0.0 docs. The thread standard library is required by default in Ruby 2.1+, and the move seems to have resulted in the current docs not being available. Also be aware that Ruby also has a separate monitor library which (I think) adds the same features (mutexes and condition variables) in a more object-orientated fashion.)
Using condition variables and mutexes you can control the coordination between threads. Uri Agassi’s answer shows one possible way to do that (although I think there’s a race condition with how his solution gets started).
If you look at the source for Queue (again this is a link to 2.0.0 – the thread library has been converted to C in recent versions and the Ruby version is easier to follow) you can see that it is implemented with Mutexes and ConditionVariables. When you call $queue.pop in the a thread in your queue example you end up calling wait on the mutex in the same way as Uri Agassi’s answer calls $cv.wait($mutex) in his method a. Similarly when you call $queue << true in your b thread you end up calling signal on the condition variable in the same way as Uri Agassi’s calls $cv.signal in his b thread.
The main reason your file locking example doesn’t work is that file locking provides a way for multiple processes to coordinate with each other (usually so only one tries to write to a file at the same time) and doesn’t help with coordinating threads within a process. Your file locking code is structured in a similar way to the mutex example so it’s likely it would suffer the same problems.

The problem with file-based version has not been sorted out properly.
The reason why it does not work is that f.flock(File::LOCK_EX) does not block if called on the same file f multiple times.
This can be checked with this simple sequential program:
require 'thread'
MUTEX_FILE_PATH = '/tmp/mutex'
$fone= File.new( MUTEX_FILE_PATH, "w")
$ftwo= File.open( MUTEX_FILE_PATH)
puts "start"
$fone.flock( File::LOCK_EX)
puts "locked"
$fone.flock( File::LOCK_EX)
puts "so what"
$ftwo.flock( File::LOCK_EX)
puts "dontcare"
which prints everything except dontcare.
So the file-based program does not work because
$mutex.flock(File::LOCK_EX)
never blocks.

Related

Ruby thread safe thread creation

I came across this bit of code today:
#thread ||= Thread.new do
# this thread should only spin up once
end
It is being called by multiple threads. I was worried that if multiple threads are calling this code that you could have multiple threads created since #thread access is not synchronized. However, I was told that this could not happen because of the Global Interpreter Lock. I did a little bit of reading about threads in Ruby and it seems like individual threads that are running Ruby code can get preempted by other threads. If this is the case, couldn't you have an interleaving like this:
Thread A Thread B
======== ========
Read from #thread .
Thread.New .
[Thread A preempted] .
. Read from #thread
. Thread.New
. Write to #thread
Write to #thread
Additionally, since access to #thread is not synchronized are writes to #thread guaranteed to be visible to all other threads? The memory models of other languages I've used in the past do not guarantee visibility of writes to memory unless you synchronize access to that memory using atomics, mutexes, etc.
I'm still learning Ruby and realize I have a long way to go to understanding concurrency in Ruby. Any help on this would be super appreciated!
You need a mutex. Essentially the only thing the GIL protects you from is accessing uninitialized memory. If something in Ruby could be well-defined without being atomic, you should not assume it is atomic.
A simple example to show that your example ordering is possible. I get the "double set" message every time I run it:
$global = nil
$thread = nil
threads = []
threads = Array.new(1000) do
Thread.new do
sleep 1
$thread ||= Thread.new do
if $global
warn "double set!"
else
$global = true
end
end
end
end
threads.each(&:join)

How to pass a block to a yielding thread in Ruby

I am trying to wrap my head around Threads and Yielding in Ruby, and I have a question about how to pass a block to a yielding thread.
Specifically, I have a thread that is sleeping, and waiting to be told to do something, and I would like that Thread to execute a different block if told to (ie, it is sleeping, and if a user presses a button, do something besides sleep).
Say I have code like this:
window = Thread.new do
#thread1 = Thread.new do
# Do some cool stuff
# Decide it is time to sleep
until #told_to_wakeup
if block_given?
yield
end
sleep(1)
end
# At some point after #thread1 starts sleeping,
# a user might do something, so I want to execute
# some code in ##thread1 (unfortunately spawning a new thread
# won't work correctly in my case)
end
Is it possible to do that?
I tried using ##thread1.send(), but send was looking for a method name.
Thanks for taking the time to look at this!
Here's a simple worker thread:
queue = Queue.new
worker = Thread.new do
# Fetch an item from the work queue, or wait until one is available
while (work = queue.pop)
# ... Do something with work
end
end
queue.push(thing: 'to do')
The pop method will block until something is pushed into the queue.
When you're done you can push in a deliberately empty job:
queue.push(nil)
That will make the worker thread exit.
You can always expand on that functionality to do more things, or to handle more conditions.

Does this code require a Mutex to access the #clients variable thread safely?

For fun I wrote this Ruby socket server which actually works quite nicely. I'm plannin on using it for the backend of an iOS App. My question for now is, when in the thread do I need a Mutex? Will I need one when accessing a shared variable such as #clients?
require 'rubygems'
require 'socket'
module Server
#server = Object.new
#clients = []
#sessions
def self.run(port=3000)
#server = TCPServer.new port
while (socket=#server.accept)
#clients << socket
Thread.start(socket) do |socket|
begin
loop do
begin
msg = String.new
while(data=socket.read_nonblock(1024))
msg << data
break if data.to_s.length < 1024
end
#clients.each do |client| client.write "#{socket} says: #{msg}" unless client == socket end
rescue
end
end
rescue => e
#clients.delete socket
puts e
puts "Killed client #{socket}"
Thread.kill self
end
end
end
end
end
Server.run
--Edit--
According to the answer from John Bollinger I need to synchronize the thread any time that a thread needs to access a shared resource. Does this apply to database queries? Can I read/write from a postgres database with ActiveRecord ORM inside multiple threads at once?
Any data that may be modified by one thread and read by a different one must be protected by a Mutex or a similar synchronization construct. Inasmuch as multiple threads may safely read the same data at the same time, a synchronization construct a bit more sophisticated than a single Mutex might yield better performance.
In your code, it looks like not only does #clients need to be properly synchronized, but so also do all its elements because writing to a socket is a modification.
Don't use a mutex unless you really have to.
It's pity the literature on Ruby multi-threading is so scarce, the only good book written on the topic is Working With Ruby Threads from Jesse Storimer. I've learned a lot of useful principles from there, one of which is: Don't use a mutex if there are better alternatives. In your case, there are. If you use Ruby without any gems, the only thread-safe data structure is a Queue. An array is not safe. However, with the thread_safe gem you can create one:
require 'thread_safe'
sa = ThreadSafe::Array.new # supports standard Array.new forms
sh = ThreadSafe::Hash.new # supports standard Hash.new forms
Regarding your question, it's only if any thread MODIFIES a shared data structure that you'll need to protect it with a mutex (assuming all the threads just read from that data structure, none writes to it, see John's comment for explanation on a case where you might need a mutex if one thread is reading, while another is writing to a thread etc). You don't need one for accessing unchanging data. If you're using Active Record + Postgres, yes Active Records IS thread safe, as for Postgres, you might want to follow these instructions (Behavior in Threaded Programs) to check that.
Also, be aware of race conditions (see How to Make ActiveRecord ThreadSafe which is one inherent problem which you should be aware of when coding multi-threaded apps).
Avdi Grimm had one very sound advice for multi-threaded apps: When testing them, make them fail loud and fast. So don't forget to add at the top:
Thread.abort_on_exception = true
so your threads don't silently fail if something wrong happens.

What does Thread.exclusive actually do?

The information from the API is very sparse - also considering that Thread.critical appears not to be documented.
Wraps a block in Thread.critical, restoring the original value upon exit from the critical section, and returns the value of the block.
tl;dr: It makes the given block execute, so that no other ruby-thread can interrupt it.
The documentation of Thread.exclusive is misleading, as it mentions Thread.critical= which was removed in ruby version 1.9. However, we can deduce what it does by looking at the ruby source code.
In MRI Thread.exclusive is defined in prelude.rb. As it is a pretty short file, I'll cite it's content here:
class Thread
MUTEX_FOR_THREAD_EXCLUSIVE = Mutex.new # :nodoc:
def self.exclusive
MUTEX_FOR_THREAD_EXCLUSIVE.synchronize{
yield
}
end
end
Here the Thread class is extended by a static constant MUTEX_FOR_THREAD_EXCLUSIVE, which holds a Mutex. When exclusive is called, we ask the mutex to synchronize execution of the block.
As stated in Mutex' documentation, synchronize obtains a lock, runs the block, and releases the lock when the block completes.
Because the mutex is thread-global state, only one single Thread can hold it at the same time.
So this works:
# no other Thread can do something between this puts statements
Thread.exclusive do
puts 1
puts 2
end
But this won't:
Thread.exclusive do
puts 1
Thread.exclusive do
puts 2
end
puts 3
end
because of this error: ThreadError: deadlock; recursive locking.

What happens when you don't join your Threads?

I'm writing a ruby program that will be using threads to do some work. The work that is being done takes a non-deterministic amount of time to complete and can range anywhere from 5 to 45+ seconds. Below is a rough example of what the threading code looks like:
loop do # Program loop
items = get_items
threads = []
for item in items
threads << Thread.new(item) do |i|
# do work on i
end
threads.each { |t| t.join } # What happens if this isn't there?
end
end
My preference would be to skip joining the threads and not block the entire application. However I don't know what the long term implications of this are, especially because the code is run again almost immediately. Is this something that is safe to do? Or is there a better way to spawn a thread, have it do work, and clean up when it's finished, all within an infinite loop?
I think it really depends on the content of your thread work. If, for example, your main thread needed to print "X work done", you would need to join to guarantee that you were showing the correct answer. If you have no such requirement, then you wouldn't necessarily need to join up.
After writing the question out, I realized that this is the exact thing that a web server does when serving pages. I googled and found the following article of a Ruby web server. The loop code looks pretty much like mine:
loop do
session = server.accept
request = session.gets
# log stuff
Thread.start(session, request) do |session, request|
HttpServer.new(session, request, basePath).serve()
end
end
Thread.start is effectively the same as Thread.new, so it appears that letting the threads finish and die off is OK to do.
If you split up a workload to several different threads and you need to combine at the end the solutions from the different threads you definately need a join otherwise you could do it without a join..
If you removed the join, you could end up with new items getting started faster than the older ones get finished. If you're working on too many items at once, it may cause performance issues.
You should use a Queue instead (snippet from http://ruby-doc.org/stdlib/libdoc/thread/rdoc/classes/Queue.html):
require 'thread'
queue = Queue.new
producer = Thread.new do
5.times do |i|
sleep rand(i) # simulate expense
queue << i
puts "#{i} produced"
end
end
consumer = Thread.new do
5.times do |i|
value = queue.pop
sleep rand(i/2) # simulate expense
puts "consumed #{value}"
end
end
consumer.join

Resources