Ruby GIL of MRI 1.9

Ruby GIL of MRI 1.9 - ruby

In my understanding, here's how MRI 1.9 GIL works:
Interpreter spawns a new thread by calling the corresponding underlying C function and ask to acquire the "GIL".
If "GIL" is free, we are happy. If not, the new thread will wait and invoke another separate timer thread to set up "timeslice"
When current executing thread hit some boundaries such as return or checking backward branches, interpreter checks the timer to decide if context switch should happen.
However, as pointed by this article, we can only guarantee atomicity for pure C implementing. That being said, if some parts of our thread contains ruby code, we are still in danger of race condition.
My question is if a thread needs to acquire GIL before executing, why only C implementation methods guarantee atomicity?
Thank you in advance!

The GVL guarantees that only one thread can execute Ruby code at the same time. But of course different Ruby threads can execute Ruby code at different times.
Besides, the majority of Ruby implementations doesn't have a GVL anyway.

Related

When to use fibers and when to use co-routines in Tarantool?

In Tarantool, are fibers used when the Lua code author wants Tarantool to schedule the execution? Are co-routines (in the Tarantool/LuaJIT process) used when the Lua code author wants to be in control of the execution?

In Tarantool, fibers are synonymous with coroutines. The fibers are more integrated to Tarantool I/O etc, you should use them instead of lua coroutines. We suggest you always use our fibers, rather than Lua coroutines, since they are more powerful. Our entire I/O stack is integrated with them: sockets, files, net.box, mysql, postgresql, etc.
Link to docs: http://tarantool.org/doc/reference/fiber.html
There are some tasks that coroutines could be used for, like iterators. It is perfectly valid to use both coroutines and fibers simultaneously but that may cause a confusion. Coroutine yield may fail with an infamous attempt to yield across C-call boundary, while fibers work in this situation.

Fiber stack is larger than one of a coroutine. It's mmapp'ed to 64KB, and is at least one OS page (4KB usually). Fiber context switching incurs extra overhead, since it hides/restores registers in addition to hiding/restoring the coroutine. Fiber context switches break JIT in LuaJIT, since LuaJIT is not able to hide/restore traced execution. Unlike coroutines, fibers work well with all non-blocking IO which is built into the application server: whenver a fiber yields implicitly on a IO call, another fiber kicks in. But not another coroutine, of course, you'll have to take care of this yourself if you're using them.

how come ruby's single os thread doesn't block while copying a file?

My assumptions:
MRI ruby 1.8.X doesn't have native threads but green threads.
The OS is not aware of these green threads.
issuing an IO-heavy operation should suspend the whole process until the proper IO interruption is issued back.
With these I've created a simple ruby program that does the following:
starts a thread that prints "working!" every second.
issues an IO request to copy a large (1gb) file on the "main" thread.
Now one would guess that being the green threads invisible to the OS, it would put the whole process on the "blocked" queue and the "working!" green thread would not execute. Surprisingly, it works :S
Does anyone know what's going on there? Thanks.

There is no atomic kernel file copy operation. It's a lot of fairly short reads and writes that are entering and exiting the kernel.
As a result, the process is constantly getting control back. Signals are delivered.
Green threads work by hooking the Ruby-level thread dispatcher into low-level I/O and signal reception. As long as these hooks catch control periodically the green threads will act quite a bit like more concurrent threads would.
Unix originally had a quite thread-unaware but beautifully simple abstract machine model for the user process environment.
As the years went by support for concurrency in general and threads in particular were added bit-by-bit in two different ways.
Lots of little kludges were added to check if I/O would block, to fail (with later retry) if I/O would block, to interrupt slow tty I/O for signals but then transparently return to it, etc. When the Unix API's were merged each kludge existed in more than one form. Lots of choices.1.
Direct support for threads in the form of multiple kernel-visible processes sharing an address space was also added. These threads are dangerous and untestable but widely supported and used. Mostly, programs don't crash. As time goes on, latent bugs become visible as the hardware supports more true concurrency. I'm not the least bit worried that Ruby doesn't fully support that nightmare.
1. The good thing about standards is that there are so many of them.

When MRI 1.9 initiates, it spawns two native threads. One thread is for the VM, the other is used to handle signals. Rubinis uses this strategy, as does the JVM. Pipes can be used to communicate any info from other processes.
As for the FileUtils module, the cd, pwd, mkdir, rm, ln, cp, mv, chmod, chown, and touch methods are all, to some degree, outsourced to OS native utilities using the internal API of the StreamUtils submodule while the second thread is left to wait for a signal from the an outside process. Since these methods are quite thread-safe, there is no need to lock the interpreter and thus the methods don't block eachother.
Edit:
MRI 1.8.7 is quite smart, and knows that when a Thread is waiting for some external event (such as a browser to send an HTTP request), the Thread can be put to sleep and be woken up when data is detected. - Evan Phoenix from Engine Yard in Ruby, Concurrency, and You
The implementation basic implementation for FileUtils has not changed much sense 1.8.7 from looking at the source. 1.8.7 also uses a sleepy timer thread to wait for a IO response. The main difference in 1.9 is the use of native threads rather than green threads. Also the thread source code is much more refined.
By thread-safe I mean that since there is nothing shared between the processes, there is no reason to lock the global interpreter. There is a misconception that Ruby "blocks" when doing certain tasks. Whenever a thread has to block, i.e. wait without using any cpu, Ruby simply schedules another thread. However in certain situations, like a rack-server using 20% of the CPU waiting for a response, it can be appropriate to unlock the interpreter and allow concurrent threads to handle other requests during the wait. These threads are, in a sense, working in parallel. The GIL is unlocked with the rb_thread_blocking_region API. Here is a good post on this subject.

Why doesn't Ruby have a ThreadPool built-in?

I have a program that creates 10000 threads at once, and runs 8 at the same time.
But ruby doesn't have a ThreadPool built-in as Java. Is there a good reason?

probably because it's easy to roll your own using the standard library "Queue" class.
q = Queue.new
3.times { Thread.new { while something = q.pop(true) rescue nil; ... }
It's a good question though--I might suggest bringing it up with Ruby Core.

My suspicion would be it's because a ThreadPool wouldn't be that useful in C-based implementations of Ruby. You can use only one processor at a time with Matz's Ruby Intepreter or Yet Another Ruby VM.
If you want multiple threads to be run on multiple processors, you need to use JRuby instead.

Most likely the reason is because ruby doesn't have "real" threads. It has what are called Green threads. The ruby interpreter takes care of scheduling execution threads without using any underlying OS threads. This effectively makes Ruby single threaded.

How do I run two threads in Ruby at the same time?

Is there some way to run 2 threads at the same time?
I want to have my app run its current function and then bring up another thread running another function, that can change variables in the first thread.

If you want to run two threads at the same time, the entire execution stack has to be capable of doing that. Let's start at the top:
Ruby itself is capable of running two threads at the same time, no problem there. However, Ruby is just a programming language, i.e. just a bunch of rules. In order to run your program, you need a Ruby implementation. Unfortunately, many popular Ruby implementations are not capable of running multiple threads at the same time, including MRI, YARV and Rubinius. In fact, the only production-ready Ruby implementation which can run threads simultaneously is JRuby. (IronRuby too, but that is technically not yet production-ready although the final 1.0 release is probably only days away.)
But JRuby (and IronRuby) don't actually implement threads themselves, they just use the underlying platform's threads. I.e. JRuby maps Ruby threads to JVM threads and IronRuby maps them to CLI threads. So, the underlying platform has to be able to run threads in parallel, too.
Again: both the JVM and the CLI are in principle capable of running threads in parallel, but the JVM and the CLI are just specifications, they are just pieces of paper. In order to run your code, you need an implementation of those specifications, and not all of them do support truly concurrent threads.
Even if your platform implementation supports truly concurrent threads, they might themselves delegate their threading implementation to the underlying OS, just like JRuby delegates to the JVM. .NET, Mono, HotSpot and JRockit for example (which are the most popular implementations of the CLI and the JVM respectively) use native OS threads for their platform threads. So, obviously, the OS has to be able to run threads in parallel. And again: not all of them are.
And, of course, all the parallelism in the OS doesn't help if you only have one CPU. If you want two threads to run at the same time, you need either two CPUs, two cores or two simultaneous hardware threads.

http://ruby-doc.org/core/classes/Thread.html
x = 2
Thread.new do
x = 3
end
x = 4
For true concurrency having more then 2 cores or 2 processors is required - but it may not work if implementation is single-threaded (such as the MRI).

First, I'm gonna answer your question:
thread_1 = Thread.new do
#do something here
end
thread_2 = Thread.new do
#do something here
end
thread_1.join
thread_2.join(timeout_in_seconds)
Thread#join makes the main thread to wait until the joined thread finishes. If you specify a timeout in seconds, Ruby will close the thread after that timeout is reached.
Now, the truth, there's no real concurrency in ruby 1.8 with the Matz Ruby Interpreter (MRI) and there's no real concurrency with only one processor though. According to this page:
However, as part of this runtime, the interpreter also instantiates an instance of a Global Interpreter Lock (or more affectionately known as GIL), which is the culprit of our lack of concurrency
Read the article itself for more information.
The MRI tries to cheat you using what's called Green Threads, which means that the Ruby interpreter takes care of everything to do with threads, not the OS, the other kind of threads, the ones really concurrent are called native threads and Ruby 1.9 support them through YARV but it doesn't mean that every Ruby thread runs in parallel because YARV has global VM lock (global interpreter lock or GIL) so concurrency is a myth in ruby and it'll be for a long time.

http://ruby-doc.org/core/classes/Thread.html
Remember that only in JRuby threads are truly parallel (other interpreters implement GIL). From here:
# mutexsyncex.rb
require 'thread' # For Mutex class in Ruby 1.8
# A BankAccount has a name, a checking amount, and a savings amount
class BankAccount
def initialize(name, checking, savings)
#name,#checking,#savings = name,checking,savings
#lock = Mutex.new # For thread safety
end
# Lock account and transfer money from savings to checking
def transfer_from_savings(x)
#lock.synchronize {
#savings -= x
#checking += x
}
end
# Lock account and report current balances
def report
#lock.synchronize {
"##name\nChecking: ##checking\nSavings: ##savings"
}
end
end
ba = BankAccount.new('me', 1, 400)
ba.transfer_from_savings(10);
puts ba.report

Is putting thread on hold optimal?

Application has an auxiliary thread. This thread is not meant to run all the time, but main process can call it very often.
So, my question is, what is more optimal in terms of CPU performance: suspend thread when it is not being used or keep it alive and use WaitForSingleObject function in order to wait for a signal from main process?

In terms of CPU resources used, both solutions are the same - the thread which is suspended and thread which is waiting in WaitForSingleObject for an object which is not signalled both get no CPU cycles at all.
That said, WaitForSingleObject is almost always a prefered solution because the code using it will be much more "natural" - easier to read, and easier to make right. Suspending/Resuming threads can be dangerous, because you need to take a lot of care to make sure you know you are suspending a thread in a state where suspending it will do no harm (imagine suspending a thread which is currently holding a mutex).

I would assume Andrei doesn't use Delphi to write .NET and therefore Suspend doesn't translate to System.Threading.Thread.Suspend but to SuspendThread Win32 API.
I would strongly suggest against it. If you suspend the thread in an arbitrary moment, then you don't know what's going to happen (for example, you may suspend the thread in such a state the some shared resource is blocked). If you however already know that the thread is in suspendable state, then simply use WaitForSingleObject (or any other WaitFor call) - it will be equally effective as suspending the thread, i.e. thread will use zero CPU time until it is awaken.

What do you mean by "suspend"? WaitForSingleObject will suspend the thread, i.e., it will not consume any CPU, until the signal arrives.

If it's a worker thread that has units of work given to it externally, it should definitely be using signalling objects as that will ensure it doesn't use CPU needlessly.
If it has any of its own work to do as well, that's another matter. I wouldn't suspend the thread from another thread (what happens if there's two threads delivering work to it?) - my basic rule is that threads should control their own lifetime with suggestions from other threads. This localizes all control in the thread itself.

See the excellent tutorial on multi-threading in Delphi :
Multi Threading Tutorial

Another option would be the TMonitor introduced in Delphi 2009, which has functions like Wait, Pulse and PulseAll to keep threads inactive when there is nothing to do for them, and notify them as soon as they should continue with their work. It is loosely modeled after the object locks in Java. Like there, Delphi object now have a "lock" field which can be used for thread synchrinozation.
A blog which gives an example for a threaded queue class can be found at http://delphihaven.wordpress.com/2011/05/04/using-tmonitor-1/
Unfortunately there was a bug in the TMonitor implementation, which seems to be fixed in XE2

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio