Multi-threading in Ruby (MRI) - ruby

According to GIL implementation in Ruby (MRI), the code below must fail by printing a message more than one time. But it doesn't, it always print it one time:
class Sheep
def initialize
#shorn = false
end
def shorn?
#shorn
end
def shorn!
puts "shearing..."
#shorn = true
end
end
s = Sheep.new
55.times.map do
Thread.new { s.shorn! unless s.shorn? }
end.each(&:join)
How come?
$ ruby --version
ruby 2.1.2p95 (2014-05-08 revision 45877) [x86_64-darwin13.0]

It depends a bit on which exact ruby version you use (which differ in the way they schedule threads). On my system it depends a bit on the overall system load and how fast the terminal feels, but on Ruby 2.0.00p481 I get between 1 and 55 lines of output, on Ruby 1.8.7, I consistently get only one line.
It should be noted here that Ruby 2.0 and higher uses actual OS threads (albeit still with a GIL) while Ruby 1.8 uses internal green threads with its own scheduling. It might be very well possible that older ruby versions schedule threads more granular.
In any case, you should not rely on any incidentally thread scheduling behavior. This is not part of any documented behavior and things will change on different systems as as Ruby matures. You should always ensure that you use shared data structures safely when using threads.

I use Ruby version ruby 2.1.5p273 and I suppose your slightly different Ruby version should yield similar results.
I have different results every time I run the program.
I tried with one core enabled and fore cores enabled. I don't see a difference. It is not thread safe, as you expected.
Otherwise the only answer I can come up with is that your program is too fast/lightweight, so that the interpreter does not think of thread switching too often.
I have only one suggestion in this case. A trick you could use to give the interpreter a hint that maybe she could switch threads. You could use the sleep function.
In your example I would put it just before the race condition:
def shorn!
sleep 0.0001
puts "shearing..."
#shorn = true
end
If you'd like to have more info about the GIL I can recommend Jesse Storimer's Nobody understands the GIL
If you'd like to read more about Ruby and concurrency I can recommend Dotan Nahum's Pragmatic Concurrency with Ruby
The trick I suggested was mentioned in this answer

As others have mentioned, the GIL's behavior is not documented and is totally implementation-dependent. You shouldn't rely on any expectations about its scheduling behavior.
A more detailed (and also more general) answer, however, is that the scheduler switches execution between threads to make sure that no single thread blocks the process. This switch is called a context switch or more specifically a thread switch.
When the context switch occurs, the current thread's execution is paused and another thread's execution is resumed. If it's a brand new thread that's being "resumed," then it means that the new thread's execution starts from the beginning.
In the case of your program, each new thread begins with
s.shorn?
as it evaluates unless s.shorn?. At this point, #shorn == false and s.shorn? evaluates to false. So then the thread runs:
s.shorn!
The first command in #shorn! that gets run is:
puts "shearing..."
What happens next depends on the thread scheduler:
If the scheduler decides to let the current thread continue executing, then the next command that gets executed is #shorn = true. Then the thread ends, the scheduler starts the next thread, unless s.shorn? evaluates to true, and the thread stops. This behavior repeats in a loop until there are no more threads left.
If the scheduler decides to switch to another thread, then it will pause execution right before #shorn = true and start running the same code as before from the beginning. That means that #shorn == false when the new thread starts, and so puts "shearing..." will execute again.
As you can see, it all depends on when the scheduler decides to perform a context switch.
But what about the GIL?
The GIL is a horribly misunderstood part of MRI Ruby. There are plenty of resources out there to explain how the GIL works, but in this case the most important thing that you should know is that the GIL doesn't guarantee that each thread will run sequentially.
Instead, the GIL merely guarantees that most core Ruby methods that are implemented in C (for example, Array#<<) won't be interrupted by a context switch until they are finished. In the case of puts "shearing...", I haven't looked at the code for puts, but probably the GIL guarantees that no other thread will run until the currently running thread finishes executing puts.
As for why when you ran your code under MRI 1.8.7 it only displayed shearing... once, that doesn't necessarily have anything to do with green vs. native threads. The better answer is that it was a coincidence. The more precise answer is that in your case, for some reason the scheduler decided to interrupt the first thread after running #shorn = true. This behavior may possibly have been due to green threads in the sense that maybe your native scheduler interrupts more frequently than Ruby's scheduler (hence the "more granular" suggestion in one of the answers below), but that's not necessarily true. It could also have been a fluke.
Multithreading in Ruby is really easy to mess up. Hence why Matz recommends sticking to forking processes, which is memory-inefficient but removes the burden of managing threads. Another approach for larger projects would be to use a library like Celluloid, which abstracts away Ruby's thread safety mechanisms. For a small example like this, however, a simple mutex would do:
semaphore = Mutex.new
s = Sheep.new
55.times.map {
Thread.new {
semaphore.synchronize do
s.shorn! unless s.shorn?
end
}
}.each(&:join)

Related

Introductory Ruby Threading Issue

I've been learning Ruby for the past few days, and I've run into a few issues concerning the implementation of threads. I've programmed in other languages before (mainly Java and C), and I still couldn't figure out what the issue is. I'm running ruby 2.1.2p95 on Ubuntu Server 14.10. The code in question is from Mr. Neighborly's Humble Little Ruby Book:
mate = Thread.new do
puts "Ahoy! Can I be dropping the anchor sir?"
Thread.stop
puts "Aye sir, dropping anchor!"
end
Thread.pass
puts "CAPTAIN: Aye, laddy!"
mate.run
mate.join
The output should be:
Ahoy! Can I be dropping the anchor sir?
CAPTAIN: Aye, laddy!
Aye sir, dropping anchor!
But instead, I'm receiving the following join and deadlock error:
CAPTAIN: Aye, laddy!
Ahoy! Can I be dropping the anchor sir?
ex2.rb:12:in `join': No live threads left. Deadlock? (fatal)
from ex2.rb:12:in `<main>'
I've run into errors with other threading examples from other resources as well, and have tried running the examples on other Ubuntu machines as well as trying Ruby 2.2. Is there a blatant concept that I'm missing out on? Has something changed in recent revisions of Ruby that would deem the examples out-of-date? Thank you for your help!
Has something changed in recent revisions of Ruby that would deem the examples out-of-date?
Yes. It looks like this book was written for Ruby 1.8, which used green threads. Ruby 1.9 onwards uses native threads (where the threads are scheduled by the OS).
Compare the documentation for the Thread.pass method in Ruby 1.8.7:
Invokes the thread scheduler to pass execution to another thread.
In Ruby 2.1.2 (the version you are using), this methods documentation looks like this:
Give the thread scheduler a hint to pass execution to another thread. A running thread may or may not switch, it depends on OS and processor.
So in current versions the scheduling is not deterministic in the way it was in Ruby 1.8.7, the OS is free to ignore the call to Thread.pass and run the main thread first, which causes the problems.
Running this script on my machine (Mac OS 10.9, Ruby 2.2.0) I get both results, sometimes it works and I see:
Ahoy! Can I be dropping the anchor sir?
CAPTAIN: Aye, laddy!
Aye sir, dropping anchor!
Other times it fails with:
CAPTAIN: Aye, laddy!
Ahoy! Can I be dropping the anchor sir?
capt-thread.rb:12:in `join': No live threads left. Deadlock? (fatal)
from capt-thread.rb:12:in `<main>'

Ruby poor performance in thread

I have a function that does IO/computation. I made a demo function which copies ~300MB from here to there. If I run it in a thread which I immediately join, it is much slower than if I run it without a thread. I checked with:
def cp
start = Time.now
FileUtils.cp_r("C:/tmp", "C:/tmp1")
fin = Time.now - start
p fin
end
Comparing these:
cp
Thread.new{cp}.join
the first cp call is always two to four times faster than the threaded call. The same happens if I do
cp
Thread.new{cp}
sleep 200
I heard about GIL, etc., but here, only one thread runs at a time, so no race for running time. Any ideas on how I can make it faster or why that is happening?
Threading isn't a guarantee that things will run faster, or even the same speed, as non-threaded code, at least currently with MRI. JRuby might be better. Your cp isn't getting the full attention of the CPU, which is why doing it without threading, and allowing it to block until done, is faster.
Consider using fork instead.
"A dozen (or so) ways to start sub-processes in Ruby: Part 1" looks useful. Also "How do you spawn a child process in Ruby?".

How does asynchronous Ruby work in Vim?

If you compile a recent version of Vim with +ruby, you can use the :ruby command inside Vim.
What's happening 'under the hood' when I run some asynchronous Ruby code?
For example:
:ruby <<EOS
print 'hello'
Thread.new do
sleep 1
print 'world'
end
EOS
# hello
:ruby print 'foo'
# world
# foo
This immediately prints 'hello', as expected. However, 'world' doesn't print until I run another :ruby command. Does Vim only support one thread, and push new threads onto some sort of queue for run on the next :ruby command?
I've tried looking through Vim's source for this in src/if_ruby.c, but my Ruby C-Extension reading skills aren't the greatest.
I'm asking, because I'd like to write some Ruby that polls every few seconds and updates a Vim window.
Vim itself is single-threaded. But there are some exceptions or workarounds:
Python threads are working, though not on ARM for some reason. I can’t say though I can predict what would happen if you run vim.* method from non-main thread. I saw it used in some plugins, but without vim.* in threads.
Python multiprocessing module is working perfectly (though you will want to disable all vim signal handlers). I personally use this solution in my aurum plugin. I guess ruby equivalent will work, but AFAIR it is just a fork() call with simple bytes pipe as the only communication, nothing so complicated as multiprocessing.Pipe (pipe that passes a limited set of python objects), multiprocessing.Queue (wrapper around a pipe that implements objects queue), multiprocessing.Value (shared memory storing fixed-sized values with object interface) or multiprocessing.Lock (dunno what it is, but name says for itself about the purpose). At least not in standard library or core.
AFAIK some older ruby versions used green threads thus (from the OS point of view) were single-threaded while newer ruby is now using POSIX threads. You can try to update, maybe this will work. Though you’d better choose something other as the test (like modifying some variable in a separate thread), not a thing that calls vim. Any current python version you can find on users systems is using POSIX threads, this may be the root of the reason why ruby threads do not work while python ones do.

Are Process::detach and Process::wait mutually exclusive (Ruby)?

I'm refactoring a bit of concurrent processing in my Ruby on Rails server (running on Linux) to use Spawn. Spawn::fork_it documentation claims that forked processes can still be waited on after being detached: https://github.com/tra/spawn/blob/master/lib/spawn.rb (line 186):
# detach from child process (parent may still wait for detached process if they wish)
Process.detach(child)
However, the Ruby Process::detach documentation says you should not do this: http://www.ruby-doc.org/core/classes/Process.html
Some operating systems retain the status of terminated child processes until the parent collects that status (normally using some variant of wait(). If the parent never collects this status, the child stays around as a zombie process. Process::detach prevents this by setting up a separate Ruby thread whose sole job is to reap the status of the process pid when it terminates. Use detach only when you do not intent to explicitly wait for the child to terminate.
Yet Spawn::wait effectively allows you to do just that by wrapping Process::wait. On a side note, I specifically want to use the Process::waitpid2 method to wait on the child processes, instead of using the Spawn::wait method.
Will detach-and-wait not work correctly on Linux? I'm concerned that this may cause a race condition between the detached reaper thread and the waiting parent process, as to who collects the child status first.
The answer to this question is there in the documentation. Are you writing code for your own use in a controlled environment? Or to be used widely by third parties? Ruby is written to be widely used by third parties, so their recommendation is to not do something that could fail on "some operating systems". Perhaps the Spawn library is designed primarily for use on Linux machines and tested only on a small subset thereof where this tactic works.
If you're distributing the code you're writing to be used by anyone and everyone, I would take Ruby's approach.
If you control the environment where this code will be run, I would write two tests:
A test that spawns a process, detaches it and then waits for it.
A test that spawns a process and then just waits for it.
Count the failure rate for both and if they are equal (within a margin that you feel is acceptable), go for it!

ruby: How do i get the number of subprocess(fork) running

I want to limit the subprocesses count to 3. Once it hits 3 i wait until one of the processes stops and then execute a new one. I'm using Kernel.fork to start the process.
How do i get the number of running subprocesses? or is there a better way to do this?
A good question, but I don't think there's such a method in Ruby, at least not in the standard library. There's lots of gems out there....
This problem though sounds like a job for the Mutex class. Look up the section Condition Variables here on how to use Ruby's mutexes.
I usually have a Queue of tasks to be done, and then have a couple of threads consuming tasks until they receive an item indicating the end of work. There's an example in "Programming Ruby" under the Thread library. (I'm not sure if I should copy and paste the example to Stack Overflow - sorry)
My solution was to use trap("CLD"), to trap SIGCLD whenever a child process ended and decrease the counter (a global variable) of processes running.

Resources