Ruby threads not good enough? - ruby

How is that JRuby's support for multithreading is any better than regular Ruby's support for it? What's wrong with threads in plain old Ruby?

"Normal" ruby (or mri) has a great big lock that prevents more than one thread from running ruby code at a time (known as the GIL or GVL).
Rubinius and jruby don't have this lock. In ruby 1.8.x the threads were green threads too, but as of ruby 1.9 ruby threads are mapped to native threads. The GVL stops you from gaining much benefit though.
Native extensions can run code outside of the lock so that, for example, multiple MySQL queries can run simultaneously from different threads but they can't call into the regular ruby api when they don't hold the lock

Related

Ruby - how to thread across cores / processors

Im (re)writing a socket server in ruby in hopes of simplifying it. Reading about ruby sockets I ran across a site that says multithreaded ruby apps only use one core / processor in a machine.
Questions:
Is this accurate?
Do I care? Each thread in this server will potentially run for several minutes and there will be lots of them. Is the OS (CentOS 6.5) smart enough to share the load?
Is this any different from threading in C++? (language of the current socket server) IE do pthreads use multiple cores automatically?
What if I fork instead of thread?
CRuby has a global interpreter lock, so it cannot run threads in parallel. Jruby and some other implementations can do it, but CRuby will never run any kind of code in parallel. This means that, no matter how smart your OS is, it can never share the load.
This is different in threading in C++. pthreads create real OS threads, and the kernal's scheduler will run them on multiple cores at the same time. Technically Ruby uses pthreads as well, but the GIL prevents them from running in parallel.
Fork creates a new process, and your OS's scheduler will almost certainly be smart enough to run it on a separate core. If you need parallelism in Ruby, either use an implementation without a GIL, or use fork.
There is a very nice gem called parallel which allows data processing with parallel threads or multiple processes by forking (work around GIL of current CRuby implementation).
Due to GIL in YARV, ruby is not thread friendly. If you want to write multithreaded ruby use jruby or rubinius. It would be even better to use a functional language with actor model such as Erlang or Elixir and let the Virtual Machine handle the threads and you only manage the Erlang processes.
Threading
If you're going to want multi-core threading, you need to use an interpreter that actively uses multiple cores. MRI Ruby as of 2.1.3 is still only single-core; JRuby and Rubinius allow access to multiple cores.
Threading Alternatives
Alternatives to changing your interpreter include:
DRb with multiple Ruby processes.
A queuing system with multiple workers.
Socket programming with multiple interpreters.
Forking processes, if the underlying platform supports the fork(2) system call.

jRuby and Rubinius support parallel computing, but what about gems that don't support this?

What I'm trying to understand is, practically speaking, how much benefit do I get from the parallel computing support in jRuby / Rubinius? A lot of ruby libraries keep track of global internal state. Is there any way to deal with these libraries, or do they just become unusable if I decide to parallelize my Ruby script? Maybe Rubinius automatically puts a mutex in front of all usage of unsafe libraries?
Whenever you want to use jruby or rubinius, you'll be forced to use gems that are compatible.
Many gems has been implemented using C extensions and you can not use them along jruby e.g.
Regarding others, mostly they clearly states their thread safety status.
So choosing jruby or rubinius will narrow down your options regarding the gems.
but a huge opportunity will be exposed, you'd be able to use many mature Java Libraries(in jruby case).

Does Ruby Enterprise use Green threads?

I was wondering this and couldn't find anything except this
"Thread scheduler bug fixes and performance improvements. Threading on Ruby Enterprise Edition can be more than 10 times faster than official Ruby 1.8"
REE is derived from MRI 1.8.7. As such, it only used green threads. REE changes some parts of 1.8.7 (esp. in the areas memory management and garbage collection). But it still widely follows the design of the upstream MRI (the original Matz's Ruby Interpreter)
While YARV (1.9) switched to OS native threads, they still have a global interpreter lock making sure that only exactly one of these threads runs at a time.
There are a couple of Ruby implementations with OS native threads and without a GIL. The most prominent are JRuby (based on the JVM) and Rubinius (with its own VM). These implementations offer "real" concurrent threads.
Besides JRuby and Rubinius, who have got rid of an interpreter lock entirely, the state of affairs in CRuby/MRI has also made some progress with regard to concurrency.
One notable feature is that with the Bitmap Marking GC by Narihiro Nakamura, as of Ruby 2.0, another advantage of REE over CRuby will be gone: REE has a copy on write-friendly GC algorithm which made it attractive for achieving concurrency through processes (forking) rather than through threading. The new Bitmap Marking GC will have the same advantage of saving unnecessary copying of memory around when forking a new process.
The GIL (or GVL as it is officially called) is also not quite as bad as it sounds at first. For example, Ruby releases the interpreter lock when doing IO. Another feature that we see much more often lately is that C extension developers have the ability to manually release the lock by calling rb_thread_blocking_region, which will execute a C-level function with the GIL released. This can have huge effects if some operation in C is to be performed where we can rest assured that it will have no side effects. A nice example is RSA key generation - this runs completely in C with memory allocated by OpenSSL, so we can run it safely with the GIL released.
Fibers introduced in 1.9 or recent projects like Celluloid also cast a much more friendly light on the state of Ruby concurrency today as when compared to a few years ago.
Last not least, Koichi Sasada, the author of CRuby's VM, is actively working on the MVM technology, which will allow to run multiple VMs in a single Ruby process, and therefore achieving concurrency in yet another way.
Taking all the other performance improvements into account, there are less and less arguments for using REE, it's safe to switch to 1.9.3 or 2.0.0 once it's out, especially since the 1.8 series will no longer be actively developed and many popular projects have announced to quit their support for 1.8 sometime soon.
Edit:
As Holger pointed out, REE has also been EOLed, and there will be no port to 1.9 or further. So it's not only safe to switch, but also the right thing to do :)

Disabling Ruby 1.9.x's YARV compiler

There is a very noticeable difference in application initiation time between running my specs from the command line with ruby 1.9.x vs. 1.8.7. My application initiates much faster with ruby 1.8.7 than with ruby 1.9.1 or 1.9.2. The application initiation difference is approximately 18 seconds. It takes about 5 seconds for my app to initialize with 1.8.7 and 23 seconds with 1.9.1 and 1.9.2.
Application initialization time is not a big deal for production, but it is a very big deal for BDD development. Every time I change my code and run my specs, I have to wait an additional 18 seconds per iteration.
I assume this application initialization time is attributed to YARV compiling bytecode as my application initializes.
Am I right about my YARV slowing down my application initialization, and is there a way to disable YARV on the command line. It would be very nice to be able to disable YARV only when I am running my specs.
YARV is a pure Ruby compiler. If you disable it, there's nothing left.
More precisely: YARV is a multi-phase implementation, where each of the phases is single-mode. It consists of a Ruby-to-YARV compiler and a YARV interpreter. If you remove the compiler, the only thing you are left with is the YARV bytecode interpreter. Unless you want to start writing your app in YARV bytecode, that interpreter is not going to be of much use to you.
This is in contrast to mixed-mode implementations such as JRuby and IronRuby which implement multiple execution modes (in particular, both a compiler and an interpreter) within a single phase. If you turn off the compiler in JRuby or IronRuby, you are still left with a usable execution engine, because they both also contain an interpreter. In fact, JRuby actually started out as a pure interpreter and added the compiler later and IronRuby started out as pure compiler and they added an interpreter exactly because of the same problem that you are seeing: compiling unit tests is simply a waste of time.
The only interpreted implementation of Ruby 1.9 right now is JRuby. Of course, there you have the whole JVM overhead to deal with. The best thing you can do is try how fast you can get JRuby to start up (use the nightly 1.6.0.dev builds from http://CI.JRuby.Org/snapshots/ since both 1.9 support and startup time are heavily worked on right this very moment) using either some very fast starting desktop-oriented JVM like IBM J9 or try JRuby's Nailgun support, which keeps a JVM running in the background.
You could also try to get rid of RubyGems, which generally eats up quite a lot of startup time, especially on YARV. (Use the --disable-gem commandline option to truly get rid of it.)
There's currently no way to disable YARV, simply because MRI 1.9 only includes the virtual machine, and not an interpreter. Maintaining both would be way too much job for the core team.
In the future there will probably be ways to cache the bytecode YARV generates (like Rubinius does). At the moment there is no way to load such bytecode through Ruby (see #971), but you could easily write a C extension which accomplishes it.
However, I would say that 18 seconds is way too much and it's probably a bug. I know there are some threads at ruby-core which discusses the slowness of require; maybe you find something interesting there!
the next RC of 1.9.2 out might be faster as it doesn't preload $: with all your gem lib dirs.

Differences between Ruby VMs

What are the advantages/disadvantages of the major Ruby VMs (things like features, compatibility, performance, and quirks?) I know there are also some bonus features like being able to use Java interfaces through JRuby, too. Those would also be helpful to note. Does any VM have a clear advantage at this point, and in what contexts?
I've used both Matz's Ruby and JRuby, and they solve different tasks. If you are developing a straight Ruby or Rails app, then that will probably suffice, but if there are some powerful Java libraries that would help a lot, then JRuby might be worthwhile.
I haven't done anything overly complicated, but JRuby seemed to match up pretty well, at least as far as implementing the core language features (I haven't run into any differences yet, but they may exist).
One little anecdote I wish to share... I was writing a script to interact with a DB2 database. The DB2 support in Ruby is abysmal... you have to install the whole DB2 express version just to be able to compile the Ruby drivers, which didn't even work for me. I got fed up and switched to JRuby, using JDBC and a few small DB2 JDBC jars. It resolved my problem perfectly. The point? Well, if gaining access to some Java libraries will simplify the problem at hand, by all means go for it!
I hope this was helpful! Sorry I don't have any experience with other VMs....
One more caveat I have read about, but I don't know the details too well... JRuby I think supports threading via Java threads, instead of the "green" threads supported in Matz's implementation... so if you want multithreading on multicore systems, JRuby will probably serve you better... unless you want to do the threading in C.
Here's a bit of info I scrounged up on the main VMs: Ruby MRI, Ruby 1.9 (YARV), JRuby, XRuby, Rubinius, and IronRuby
There was a performance benchmark last year that compared the major VMs, but with how quickly VM development has been it probably is not as relevant today. Ruby 1.9 was generally the fastest, and still has the edge over JRuby for now, I believe.
Four VMs are currently capable of running Ruby on Rails: Ruby MRI, Ruby 1.9, JRuby, and Rubinius.
XRuby runs on the JVM, as does JRuby, and compiles the Ruby source files to a Java .class.
IronRuby runs on .NET, making use of their DLR, and allows you to integrate Ruby with the .NET libraries and infrastructure. It cannot yet run Ruby on Rails.
There is also a VM called HotRuby that lets you run Ruby source code in the browser or in Flash.

Resources