Ruby Semaphores? - ruby

I'm working on an implementation of the "Fair Barbershop" problem in Ruby. This is for a class assignment, but I'm not looking for any handouts. I've been searching like crazy, but I cannot seem to find a Ruby implementation of Semaphores that mirror those found in C.
I know there is Mutex, and that's great. Single implementation, does exactly what that kind of semaphore should do.
Then there's Condition Variables. I thought that this was going to work out great, but looking at these, they require a Mutex for every wait call, which looks to me like I can't put numerical values to the semaphore (as in, I have seven barbershops, 3 barbers, etc.).
I think I need a Counting Semaphore, but I think it's a little bizarre that Ruby doesn't (from what I can find) contain such a class in its core. Can anyone help point me in the right direction?

If you are using JRuby, you can import semaphores from Java as shown in this article.
require 'java'
java_import 'java.util.concurrent.Semaphore'
SEM = Semaphore.new(limit_of_simultaneous_threads)
SEM.acquire #To decrement the number available
SEM.release #To increment the number available

There's http://sysvipc.rubyforge.org/SysVIPC.html which gives you SysV semaphores. Ruby is perfect for eliminating the API blemishes of SysV semaphores and SysV semaphores are the best around -- they are interprocess semaphores, you can use SEM_UNDO so that even SIGKILLs won't mess up your global state (POSIX interprocess semaphores don't have this), and you with SysV semaphores you can perform atomic operations on several semaphores at once as long as they're in the same semaphore set.
As for inter-thread semaphores, those should be perfectly emulatable with Condition Variables and Mutexes. (See Bernanrdo Martinez's link for how it can be done).

I also found this code:
https://gist.github.com/pettyjamesm/3746457
probably someone might like this other option.

since concurrent-ruby is stable (beyond 1.0) and is being widely used thus the best (and portable across Ruby impls) solution is to use its Concurrent::Semaphore class

Thanks to #x3ro for his link. That pointed me in the right direction. However, with the implementation that Fukumoto gave (at least for rb1.9.2) Thread.critical isn't available. Furthermore, my attempts to replace the Thread.critical calls with Thread.exclusive{} simply resulted in deadlocks. It turns out that there is a proposed Semaphore patch for Ruby (which I've linked below) that has solved the problem by replacing Thread.exclusive{} with a Mutex::synchronize{}, among a few other tweaks. Thanks to #x3ro for pushing me in the right direction.
http://redmine.ruby-lang.org/attachments/1109/final-semaphore.patch

Since the other links here aren't working for me, I decided to quickly hack something together. I have not tested this, so input and corrections are welcome. It's based simply on the idea that a Mutex is a binary Semaphore, thus a Semaphore is a set of Mutexes.
https://gist.github.com/3439373

I think it might be useful to mention the Thread::Queue in this context for others arriving at this question.
The Queue is a thread-safe tool (implemented with some behind-the-scenes synchronization primitives) that can be used like a traditional multi-processing semaphore with just a hint of imagination. And it comes preloaded by default, at least in ruby v3:
#!/usr/bin/ruby
# hold_your_horses.rb
q = Queue.new
wait_thread = Thread.new{
puts "Wait for it ..."
q.pop
puts "... BOOM!"
}
sleep 1
puts "... click, click ..."
q.push nil
wait_thread.join
And can be demonstrated simply enough:
user#host:~/scripts$ ruby hold_your_horses.rb
Wait for it ...
... click, click ...
... BOOM!
The docs for ruby v3.1 say a Queue can be initialized with an enumerable object to set up initial contents but that wasn't available in my v3.0. But if you want a semaphore with, say, 7 permits, it's easy to stuff the box with something like:
q = Queue.new
7.times{ q.push nil }
I used the Queue to implement baton-passing between some worker-threads:
class WaitForBaton
def initialize
#q = Queue.new
end
def pass_baton
#q.push nil
sleep 0.0
end
def wait_for_baton
#q.pop
end
end
So that thread task_master could perform steps one and three with thread little_helper stepping in at the appropriate time to handle step two:
baton = WaitForBaton.new
task_master = Thread.new{
step_one(ARGV[0])
baton.pass_baton
baton.wait_for_baton
step_three(logfile)
}
little_helper = Thread.new{
baton.wait_for_baton
step_two(ARGV[1])
baton.pass_baton
}
task_master.join
little_helper.join
Note that the sleep 0.0 in the .pass_baton method of my WaitForBaton class is necessary to prevent task_master from passing the baton to itself: unless thread scheduling happens to jump away from task_master right after baton.pass_baton, the very next thing that happens is task_master's baton.wait_for_baton - which takes the baton right back again. sleep 0.0 explicitly cedes execution to any other threads that might be waiting to run (and, in this case, blocking on the underlying Queue).
Ceding execution is not the default behavior because this is a somewhat unusual usage of semaphore technology - imagine that task_master could be generating many tasks for little_helpers to do and task_master can efficiently get right back to generating tasks right after passing a task off through a Thread::Queue's .push([object]) method.

Related

Why a+=1 is a thread-safe operation in ruby?

Code Snippet:
a = 0
Array.new(50){
Thread.new {
500_000.times { a += 1 }
}
}.each(&:join)
p "a: #{a}"
Result: a = 25_000_000.
In my understanding, (MRI) Ruby use GIL, so there is only one ruby thread can get the CPU, but when thread-switch happend, some data of ruby thread will be stored for restoring thread later. So, in theory, a += 1 may not be thread-safe.
But the result above turns out I'm wrong. Does Ruby makes a+=1 atomic? If true, which operations can be considered thread-safe?
It's Neither Atomic Nor Thread-Safe
In your example, the apparent consistency is largely due to the global interpreter lock, but is also partly due to the way your Ruby engine and your code sequences (theoretically) asynchronous threads. You are getting consistent results because each loop in each thread is simply incrementing the current value of a, which is not a block-local or thread-local variable. With threads on the YARV virtual machine, only one thread at a time is inspecting or setting the current value of a, but I wouldn't really say that it's an atomic operation. It's just a byproduct of the engine’s lack of real-time concurrency between threads, and the underlying implementation of the Ruby virtual machine.
If you're concerned about preserving thread-safety in Ruby without relying on idiosyncratic behaviors that just happen to appear consistent, consider using a thread-safe library like concurrent-ruby. Otherwise, you may be relying on behaviors that aren't guaranteed across Ruby engines or Ruby versions.
For example, three consecutive runs of your code in JRuby (which does have concurrent threads) will generally yield different results on each run. For example:
#=> "a: 3353241"
#=> "a: 3088145"
#=> "a: 2642263"
Ruby doesn't have a well-defined Memory Model, so in some philosophical sense, the question is non-sensical, since without a Memory Model, the term "thread-safe" isn't even defined. For example, the ISO Ruby Language Specification doesn't even document the Thread class.
The way that people write concurrent code in Ruby without a well-defined Memory Model is essentially "guess-and-test". You guess what the implementations will do, then you test as many versions of as many implementations on as many platforms and as many operating systems on as many CPU architectures and as many different system sizes as possible.
As you can see in Todd's answer, even just testing one other implementation already reveals that your conclusion was wrong. (Pro tip: never make a generalization based on a sample size of 1!)
The alternative is to use a library that has already done the above, such as the concurrent-ruby library mentioned in Todd's answer. They do all the testing I mentioned above. They also work closely with the maintainers of the various implementations. E.g. Chris Seaton, the lead developer of TruffleRuby is also one of the maintainers of concurrent-ruby, and Charlie Nutter, the lead developer of JRuby, is one of the contributors.
But the result above turns out I'm wrong.
The results are misleading. In Ruby, a += 1 is a shorthand for:
a = a + 1
With a + 1 being a method call that occurs before the assignment. Since integers are objects in Ruby, we can override that method:
module ThreadTest
def +(other)
super
end
end
Integer.prepend(ThreadTest)
The above code doesn't do anything useful, it just calls super. But merely adding a Ruby implementation on top of the built-in C implementation is enough to break (or fix) your test:
Integer.prepend(ThreadTest)
a = 0
Array.new(50){
Thread.new {
500_000.times { a += 1 }
}
}.each(&:join)
p "a: #{a}"
#=> "a: 11916339"

What is the use-case for TryEnterCriticalSection?

I've been using Windows CRITICAL_SECTION since the 1990s and I've been aware of the TryEnterCriticalSection function since it first appeared. I understand that it's supposed to help me avoid a context switch and all that.
But it just occurred to me that I have never used it. Not once.
Nor have I ever felt I needed to use it. In fact, I can't think of a situation in which I would.
Generally when I need to get an exclusive lock on something, I need that lock and I need it now. I can't put it off until later. I certainly can't just say, "oh well, I won't update that data after all". So I need EnterCriticalSection, not TryEnterCriticalSection
So what exactly is the use case for TryEnterCriticalSection?
I've Googled this, of course. I've found plenty of quick descriptions on how to use it but almost no real-world examples of why. I did find this example from Intel that, frankly doesn't help much:
CRITICAL_SECTION cs;
void threadfoo()
{
while(TryEnterCriticalSection(&cs) == FALSE)
{
// some useful work
}
// Critical Section of Code
LeaveCriticalSection (&cs);
}
// other work
}
What exactly is a scenario in which I can do "some useful work" while I'm waiting for my lock? I'd love to avoid thread-contention but in my code, by the time I need the critical section, I've already been forced to do all that "useful work" in order to get the values that I'm updating in shared data (for which I need the critical section in the first place).
Does anyone have a real-world example?
As an example you might have multiple threads that each produce a high volume of messages (events of some sort) that all need to go on a shared queue.
Since there's going to be frequent contention on the lock on the shared queue, each thread can have a local queue and then, whenever the TryEnterCriticalSection call succeeds for the current thread, it copies everything it has in its local queue to the shared one and releases the CS again.
In C++11 therestd::lock which employs deadlock-avoidance algorithm.
In C++17 this has been elaborated to std::scoped_lock class.
This algorithm tries to lock on mutexes in one order, and then in another, until succeeds. It takes try_lock to implement this approach.
Having try_lock method in C++ is called Lockable named requirement, whereas mutexes with only lock and unlock are BasicLockable.
So if you build C++ mutex on top of CTRITICAL_SECTION, and you want to implement Lockable, or you'll want to implement lock avoidance directly on CRITICAL_SECTION, you'll need TryEnterCriticalSection
Additionally you can implement timed mutex on TryEnterCriticalSection. You can do few iterations of TryEnterCriticalSection, then call Sleep with increasing delay time, until TryEnterCriticalSection succeeds or deadline has expired. It is not a very good idea though. Really timed mutexes based on user-space WIndows synchronization objects are implemented on SleepConditionVariableSRW, SleepConditionVariableCS or WaitOnAddress.
Because windows CS are recursive TryEnterCriticalSection allows a thread to check whether it already owns a CS without risk of stalling.
Another case would be if you have a thread that occasionally needs to perform some locked work but usually does something else, you could use TryEnterCriticalSection and only perform the locked work if you actually got the lock.

how to know what is NOT thread-safe in ruby?

starting from Rails 4, everything would have to run in threaded environment by default. What this means is all of the code we write AND ALL the gems we use are required to be threadsafe
so, I have few questions on this:
what is NOT thread-safe in ruby/rails? Vs What is thread-safe in ruby/rails?
Is there a list of gems that is known to be threadsafe or vice-versa?
is there List of common patterns of code which are NOT threadsafe example #result ||= some_method?
Are the data structures in ruby lang core such as Hash etc threadsafe?
On MRI, where there a GVL/GIL which means only 1 ruby thread can run at a time except for IO, does the threadsafe change effect us?
None of the core data structures are thread safe. The only one I know of that ships with Ruby is the queue implementation in the standard library (require 'thread'; q = Queue.new).
MRI's GIL does not save us from thread safety issues. It only makes sure that two threads cannot run Ruby code at the same time, i.e. on two different CPUs at the exact same time. Threads can still be paused and resumed at any point in your code. If you write code like #n = 0; 3.times { Thread.start { 100.times { #n += 1 } } } e.g. mutating a shared variable from multiple threads, the value of the shared variable afterwards is not deterministic. The GIL is more or less a simulation of a single core system, it does not change the fundamental issues of writing correct concurrent programs.
Even if MRI had been single-threaded like Node.js you would still have to think about concurrency. The example with the incremented variable would work fine, but you can still get race conditions where things happen in non-deterministic order and one callback clobbers the result of another. Single threaded asynchronous systems are easier to reason about, but they are not free from concurrency issues. Just think of an application with multiple users: if two users hit edit on a Stack Overflow post at more or less the same time, spend some time editing the post and then hit save, whose changes will be seen by a third user later when they read that same post?
In Ruby, as in most other concurrent runtimes, anything that is more than one operation is not thread safe. #n += 1 is not thread safe, because it is multiple operations. #n = 1 is thread safe because it is one operation (it's lots of operations under the hood, and I would probably get into trouble if I tried to describe why it's "thread safe" in detail, but in the end you will not get inconsistent results from assignments). #n ||= 1, is not and no other shorthand operation + assignment is either. One mistake I've made many times is writing return unless #started; #started = true, which is not thread safe at all.
I don't know of any authoritative list of thread safe and non-thread safe statements for Ruby, but there is a simple rule of thumb: if an expression only does one (side-effect free) operation it is probably thread safe. For example: a + b is ok, a = b is also ok, and a.foo(b) is ok, if the method foo is side-effect free (since just about anything in Ruby is a method call, even assignment in many cases, this goes for the other examples too). Side-effects in this context means things that change state. def foo(x); #x = x; end is not side-effect free.
One of the hardest things about writing thread safe code in Ruby is that all core data structures, including array, hash and string, are mutable. It's very easy to accidentally leak a piece of your state, and when that piece is mutable things can get really screwed up. Consider the following code:
class Thing
attr_reader :stuff
def initialize(initial_stuff)
#stuff = initial_stuff
#state_lock = Mutex.new
end
def add(item)
#state_lock.synchronize do
#stuff << item
end
end
end
A instance of this class can be shared between threads and they can safely add things to it, but there's a concurrency bug (it's not the only one): the internal state of the object leaks through the stuff accessor. Besides being problematic from the encapsulation perspective, it also opens up a can of concurrency worms. Maybe someone takes that array and passes it on to somewhere else, and that code in turn thinks it now owns that array and can do whatever it wants with it.
Another classic Ruby example is this:
STANDARD_OPTIONS = {:color => 'red', :count => 10}
def find_stuff
#some_service.load_things('stuff', STANDARD_OPTIONS)
end
find_stuff works fine the first time it's used, but returns something else the second time. Why? The load_things method happens to think it owns the options hash passed to it, and does color = options.delete(:color). Now the STANDARD_OPTIONS constant doesn't have the same value anymore. Constants are only constant in what they reference, they do not guarantee the constancy of the data structures they refer to. Just think what would happen if this code was run concurrently.
If you avoid shared mutable state (e.g. instance variables in objects accessed by multiple threads, data structures like hashes and arrays accessed by multiple threads) thread safety isn't so hard. Try to minimize the parts of your application that are accessed concurrently, and focus your efforts there. IIRC, in a Rails application, a new controller object is created for every request, so it is only going to get used by a single thread, and the same goes for any model objects you create from that controller. However, Rails also encourages the use of global variables (User.find(...) uses the global variable User, you may think of it as only a class, and it is a class, but it is also a namespace for global variables), some of these are safe because they are read only, but sometimes you save things in these global variables because it is convenient. Be very careful when you use anything that is globally accessible.
It's been possible to run Rails in threaded environments for quite a while now, so without being a Rails expert I would still go so far as to say that you don't have to worry about thread safety when it comes to Rails itself. You can still create Rails applications that aren't thread safe by doing some of the things I mention above. When it comes other gems assume that they are not thread safe unless they say that they are, and if they say that they are assume that they are not, and look through their code (but just because you see that they go things like #n ||= 1 does not mean that they are not thread safe, that's a perfectly legitimate thing to do in the right context -- you should instead look for things like mutable state in global variables, how it handles mutable objects passed to its methods, and especially how it handles options hashes).
Finally, being thread unsafe is a transitive property. Anything that uses something that is not thread safe is itself not thread safe.
In addition to Theo's answer, I'd add a couple problem areas to lookout for in Rails specifically, if you're switching to config.threadsafe!
Class variables:
##i_exist_across_threads
ENV:
ENV['DONT_CHANGE_ME']
Threads:
Thread.start
starting from Rails 4, everything would have to run in threaded environment by default
This is not 100% correct. Thread-safe Rails is just on by default. If you deploy on a multi-process app server like Passenger (community) or Unicorn there will be no difference at all. This change only concerns you, if you deploy on a multi-threaded environment like Puma or Passenger Enterprise > 4.0
In the past if you wanted to deploy on a multi-threaded app server you had to turn on config.threadsafe, which is default now, because all it did had either no effects or also applied to a Rails app running in a single process (Prooflink).
But if you do want all the Rails 4 streaming benefits and other real time stuff of the multi-threaded deployment
then maybe you will find this article interesting. As #Theo sad, for a Rails app, you actually just have to omit mutating static state during a request. While this a simple practice to follow, unfortunately you cannot be sure about this for every gem you find. As far as i remember Charles Oliver Nutter from the JRuby project had some tips about it in this podcast.
And if you want to write a pure concurrent Ruby programming, where you would need some data structures which are accessed by more than one thread you maybe will find the thread_safe gem useful.

Testing concurrency features

How would you test Ruby code that has some concurrency features? For instance, let's assume I have a synchronization mechanism that is expected to prevent deadlocks. Is there a viable way to test what it really does? Could controlled execution in fibers be the way forward?
I had the exact same problem and have implemented a simple gem for synchronizing subprocesses using breakpoints: http://github.com/remen/fork_break
I've also documented an advanced usage scenario for rails3 at http://www.hairoftheyak.com/testing-concurrency-in-rails/
I needed to make sure a gem (redis-native_hash) I authored could handle concurrent writes to the same Redis hash, detect the race condition, and elegantly recover. I found that to test this I didn't need to use threads at all.
it "should respect changes made since last read from redis" do
concurrent_edit = Redis::NativeHash.find :test => #hash.key
concurrent_edit["foo"] = "race value"
concurrent_edit.save
#hash["yin"] = "yang"
#hash["foo"] = "bad value"
#hash.save
hash = Redis::NativeHash.find :test => #hash.key
hash["foo"].should == "race value"
hash["yin"].should == "yang"
end
In this test case I just instantiated another object which represents the concurrent edit of the Redis hash, had it make a change, then make sure saving the already-existing object pointing to the same hash respected those changes.
Not all problems involving concurrency can be tested without actually USING concurrency, but in this case it was possible. You may want to try looking for something similar to test your concurrency solutions. If its possible its definitely the easier route to go.
It's definitely a difficult problem. I started writing my test using threads, and realized that they way the code I was testing was implemented, I needed the Process IDs (PID) to actually be different. Threads run using the same PID as the process that kicked off the Thread. Lesson learned.
It was at that point I started exploring forks, and came across this Stack Overflow thread, and played with fork_break. Pretty cool, and easy to set up. Though I didn't need the breakpoints for what I was doing, I just wanted processes to run through concurrently, using breakpoints could be very useful in the future. The problem I ran into was that I kept getting an EOFError and I didn't know why. So I started implementing forking myself, instead of going through fork_break, and found out it was that an exception was happening in the code under test. Sad that the stack trace was hidden from me by the EOFError, though I understand that the child process ended abruptly and that's kinda how it goes.
The next problem I came across was with the DatabaseCleaner. No matter which strategy it used (truncation, or transaction), the child process's data was truncated/rolled back when the child process finished, so the data that was inserted by child processes was gone and the parent process couldn't select and verify that it was correct.
After banging my head on that and trying many other unsuccessful things, I came across this post http://makandracards.com/makandra/556-test-concurrent-ruby-code which was almost exactly what I was already doing, with one little addition. Calling "Process.exit!" at the end of the fork. My best guess (based on my fairly limited understanding of forking) is that this causes the process to end abruptly enough that it completely bypasses any type of database cleanup when the child process ends. So my parent process, the actual test, can continue and verify the data it needs to verify. Then during the normal after hooks of the test (in this case cucumber, but could easily be rspec too), the database cleaner kicks in and cleans up data as it normally would for a test.
So, just thought I'd share some of my own lessons learned in this discusson of how to test concurrent features.

Redis multi/exec with an evented driver

How do you use MULTI/EXEC (and WATCH) in an evented Redis driver like the em-hiredis (a Ruby driver that use EventMachine)? If I run:
redis.multi do
redis.sadd("foo", "bar") do
redis.inc("baz", "qux") do
redis.exec do
puts 'yay!'
end
end
end
end
there's a chance that some other part of the application manages to sneak in an operation before the EXEC, if there is a lot going on (imagine, for example, that I have a timer that increments some key every second, and that the code above takes more than one second to run, then some of the increment commands will be sent as part of the MULTI/EXEC -- what if I want to abort the transaction? Then any increments that happened to become part of it will disappear. It's easy to come up with even worse scenarios).
I guess I could implement some kind of locking so that no other actions can be done while a MULTI/EXEC is in progress, but that doesn't feel like a great solution, has anyone else found a better way?
As #balu stated in the comments to the question it cannot be done without multiple connections.

Resources