Disabling Ruby 1.9.x's YARV compiler - ruby

There is a very noticeable difference in application initiation time between running my specs from the command line with ruby 1.9.x vs. 1.8.7. My application initiates much faster with ruby 1.8.7 than with ruby 1.9.1 or 1.9.2. The application initiation difference is approximately 18 seconds. It takes about 5 seconds for my app to initialize with 1.8.7 and 23 seconds with 1.9.1 and 1.9.2.
Application initialization time is not a big deal for production, but it is a very big deal for BDD development. Every time I change my code and run my specs, I have to wait an additional 18 seconds per iteration.
I assume this application initialization time is attributed to YARV compiling bytecode as my application initializes.
Am I right about my YARV slowing down my application initialization, and is there a way to disable YARV on the command line. It would be very nice to be able to disable YARV only when I am running my specs.

YARV is a pure Ruby compiler. If you disable it, there's nothing left.
More precisely: YARV is a multi-phase implementation, where each of the phases is single-mode. It consists of a Ruby-to-YARV compiler and a YARV interpreter. If you remove the compiler, the only thing you are left with is the YARV bytecode interpreter. Unless you want to start writing your app in YARV bytecode, that interpreter is not going to be of much use to you.
This is in contrast to mixed-mode implementations such as JRuby and IronRuby which implement multiple execution modes (in particular, both a compiler and an interpreter) within a single phase. If you turn off the compiler in JRuby or IronRuby, you are still left with a usable execution engine, because they both also contain an interpreter. In fact, JRuby actually started out as a pure interpreter and added the compiler later and IronRuby started out as pure compiler and they added an interpreter exactly because of the same problem that you are seeing: compiling unit tests is simply a waste of time.
The only interpreted implementation of Ruby 1.9 right now is JRuby. Of course, there you have the whole JVM overhead to deal with. The best thing you can do is try how fast you can get JRuby to start up (use the nightly 1.6.0.dev builds from http://CI.JRuby.Org/snapshots/ since both 1.9 support and startup time are heavily worked on right this very moment) using either some very fast starting desktop-oriented JVM like IBM J9 or try JRuby's Nailgun support, which keeps a JVM running in the background.
You could also try to get rid of RubyGems, which generally eats up quite a lot of startup time, especially on YARV. (Use the --disable-gem commandline option to truly get rid of it.)

There's currently no way to disable YARV, simply because MRI 1.9 only includes the virtual machine, and not an interpreter. Maintaining both would be way too much job for the core team.
In the future there will probably be ways to cache the bytecode YARV generates (like Rubinius does). At the moment there is no way to load such bytecode through Ruby (see #971), but you could easily write a C extension which accomplishes it.
However, I would say that 18 seconds is way too much and it's probably a bug. I know there are some threads at ruby-core which discusses the slowness of require; maybe you find something interesting there!

the next RC of 1.9.2 out might be faster as it doesn't preload $: with all your gem lib dirs.

Related

Does the Ruby VM define the file format that runs on it?

Like the ".class" file that runs on JVM, does the Ruby VM(MRI, or YARV) define the file format that rns on it?
I have read in some articles that YARV bytecode was considered a internal format, which means that there are little documentations or specifications about it, is it true?
Thanks!
tl;dr No, ruby does not define a runable format other than the language itself.
It is true that ruby doesn't have a standard bytecode format like java, etherium, and others. Every ruby implementation JRuby, MRI Ruby, TruffleRuby, etc. gets to decide how exactly they run ruby code.
Ruby implementations can choose to implement their own bytecode format. Those formats would likely be private to that ruby implementation and unlikely to be seen outside of that eco-system. (however as we've seen with languages popping up that run on the JVM it's possible for other people to piggy back off your language's VM if they'd like)
It's completely valid for a ruby implementation to run inside the JVM run on java byte code JVM jruby, to introduce a compilation step to produce native machine code ruby-llvm, or even to sometimes produce native machine code at runtime with a JIT and sometimes use a more traditional interpreter MRI Ruby.

What kind of interpreter is the Ruby MRI?

Is it a language interpreter? Or a bytecode interpreter / JIT compiler? Where can I learn more about the implementation (other than by browsing the source code)?
Note: the term "MRI" is confusing. It means "Matz's Ruby/Reference Implementation/Interpreter". However, MRI has been retired and isn't developed or maintained anymore.
MRI was a pure AST-walking interpreter, with no compilation involved anywhere.
The confusing thing is: Matz has written a new implementation, but that's called MRuby, not MRI. And the implementation that is now called MRI wasn't written by Matz. So, really, it is best to simply not use that term at all, and be specific about which implementation you are talking about.
The name of the implementation that people now call MRI is actually YARV (for Yet Another Ruby VM), and it was written by Koichi Sasada. It consists of an Ahead-Of-Time compiler which compiles Ruby source code to YARV byte code and an interpreter which interprets said byte code. Thus, it is a completely typical byte code VM, exactly like CPython for Python, Zend Engine for PHP, the Lua VM, older versions of Rubinius, older versions of SpiderMonkey for ECMAScript, and so on.
There is talk about adding a JIT compiler from YARV bytecode to native machine code to the VM for YARV 3, which would then make the VM a mixed-mode execution engine.
Matz's current implementation, MRuby, is also a bytecoded VM.
For completeness' sake, here are a couple of other Ruby implementations, first the currently production-ready ones, and then a couple of historically interesting ones:
Rubinius: compiles Ruby source code to Rubinius byte code ahead-of-time, then hands that bytecode off to a mixed-mode execution engine consisting of a bytecode interpreter and an LLVM-based JIT compiler; they have recently introduced or are currently in the process of introducing a separate Intermediate Representation (IR) for the JIT compiler, so the interpreter works off Rubinius bytecode, but the JIT compiler works off Compiler IR. Rubinius also belongs into the "historically interesting" category, because it was the first successful Ruby implementation a significant part of which was implemented in Ruby; there had been other projects before, but Rubinius was the first to be production-ready.
JRuby: the main mode is a mixed-mode execution engine consisting of an AST-walking interpreter, and a JIT compiler that first translates the AST into IR, which it then further compiles to JVM bytecode. The other mode is an AOT compiler which compiles Ruby sourcecode to JVM bytecode ahead-of-time.
Opal: an Ahead-Of-Time compiler that compiles Ruby sourcecode to ECMAScript sourcecode.
MagLev: an implementation based on the GemStone/S Smalltalk VM. Unfortunately, I don't know much about it, I believe it compiles Ruby sourcecode to GemStone/S bytecode, the GemStone/S VM then is a standard mixed-mode VM with a bytecode interpreter and a JIT compiler.
Some no longer maintained but historically interesting implementations:
Topaz: an implementation using the RPython/PyPy VM framework; the PyPy framework is interesting because it includes a tracing JIT compiler that unlike other JIT compilers doesn't work besides the interpreter and compiles the user program, instead it compiles the interpreter while it is interpreting the user programs. What this basically means is that the JIT has to be written only once by the PyPy developers, and every language implementor using the PyPy framework only has to write a simple bytecode interpreter gets an optimizing native JIT compiler for free.
XRuby: the first static AOT compiler for Ruby, implemented for the JVM.
IronRuby: it started out as a pure JIT compiler without an interpreter, but an interpreter was later added, because it turned out that this actually improved performance (which is contrary to the popular myth that interpreters are slow).
unholy: a proof-of-concept AOT compiler that compiles YARV bytecode to CPython bytecode; this was hacked up by _why the lucky stiff when the Google App Engine first came out and only supported Python, the idea was that you could compile your Ruby sourcecode to YARV bytecode using YARV, compile the YARV bytecode to CPython bytecode using unholy, compile the CPython bytecode to Python sourcecode using decompyle, and then upload the Python sourcecode to GAE to run your shiny new Ruby app.
Honorable mentions go to: tinyrb, metaruby, Ruby.NET, Red Sun, HotRuby, BlueRuby, SmallRuby
A couple of interesting current research projects are:
JRuby+Truffle: this project is re-implementing JRuby's internals using the Truffle AST interpreter framework from Oracle Labs; this version, when run on a Graal-enable JVM (another Oracle Labs research project) is able to attain performance similar to Java and sometimes even reaching (and overtaking) C.
Ruby+OMR: IBM has broken up its J9 JVM into independently re-usable, language-independent building blocks for VM implementors and released it under an open source license under the Eclipse umbrella as the Eclipse Open Managed Runtime. It's not an academic project: the Java 8 version of IBM J9 is actually implemented using OMR. The Ruby+OMR project is a proof-of-concept by the OMR developers, replacing YARV's garbage collector with OMR's, and adding OMR's JIT compiler, profiler, and debugger to YARV. It is fairly impressive just how language-independent all the stuff really is, the entire patch is less than 10000 lines, and that is not just the glue code, it actually includes all the required OMR components as well. (There's also an equivalent Python+OMR project, but that's still non-public.)
Last but not least, you may sometimes hear about "Rite". Rite was used as a codename for a complete re-write of MRI for over a decade. Matz said that when he wrote MRI he didn't actually know anything about language implementation, so he wanted to do it "right" (get it?) a second time. At the same time, there was also a lot of talk about Ruby 2.0, wanting to fix some long-standing design deficiencies in the language. The two were lumped together, so Rite was talked about as the new implementation of Ruby 2.0. However, YARV came along and was so good that Matz decided he didn't need to write his own VM after all, and he basically decided that "YARV is Rite".
But now, he did write his own VM nonetheless, which is why you will sometimes hear MRuby (or its VM component) referred to as "Rite".
It's a bytecode interpreter called YARV, written by Sasada Koichi.
Here's one example of how it looks:
puts RubyVM::InstructionSequence.compile("1+1").disasm
== disasm: #<ISeq:<compiled>#<compiled>>================================
0000 trace 1 ( 1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 putobject_OP_INT2FIX_O_1_C_
0004 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0007 leave
Further reading:
YARV instruction set
While MRI doesn't have a JIT yet, there's the Ruby+OMR project, that's trying to add a JIT compiler based on Eclipse OMR:
Ruby+OMR JIT Compiler: What’s next?
Ruby now has a VM-generated JIT compiler!
Since 2.6.0-preview1 branch was merged, Ruby now has a basic JIT compiler called "MJIT" (YARV MRI Jit). It is inspired by the works of Vladimir Makarov who proposed a RTL (Register Transfer Language) based instruction set instead of a stack based one. The speedups are not yet apparent, because not all instruction paths are handled by MJIT, but the branch contains a basis for future work.

Does Ruby Enterprise use Green threads?

I was wondering this and couldn't find anything except this
"Thread scheduler bug fixes and performance improvements. Threading on Ruby Enterprise Edition can be more than 10 times faster than official Ruby 1.8"
REE is derived from MRI 1.8.7. As such, it only used green threads. REE changes some parts of 1.8.7 (esp. in the areas memory management and garbage collection). But it still widely follows the design of the upstream MRI (the original Matz's Ruby Interpreter)
While YARV (1.9) switched to OS native threads, they still have a global interpreter lock making sure that only exactly one of these threads runs at a time.
There are a couple of Ruby implementations with OS native threads and without a GIL. The most prominent are JRuby (based on the JVM) and Rubinius (with its own VM). These implementations offer "real" concurrent threads.
Besides JRuby and Rubinius, who have got rid of an interpreter lock entirely, the state of affairs in CRuby/MRI has also made some progress with regard to concurrency.
One notable feature is that with the Bitmap Marking GC by Narihiro Nakamura, as of Ruby 2.0, another advantage of REE over CRuby will be gone: REE has a copy on write-friendly GC algorithm which made it attractive for achieving concurrency through processes (forking) rather than through threading. The new Bitmap Marking GC will have the same advantage of saving unnecessary copying of memory around when forking a new process.
The GIL (or GVL as it is officially called) is also not quite as bad as it sounds at first. For example, Ruby releases the interpreter lock when doing IO. Another feature that we see much more often lately is that C extension developers have the ability to manually release the lock by calling rb_thread_blocking_region, which will execute a C-level function with the GIL released. This can have huge effects if some operation in C is to be performed where we can rest assured that it will have no side effects. A nice example is RSA key generation - this runs completely in C with memory allocated by OpenSSL, so we can run it safely with the GIL released.
Fibers introduced in 1.9 or recent projects like Celluloid also cast a much more friendly light on the state of Ruby concurrency today as when compared to a few years ago.
Last not least, Koichi Sasada, the author of CRuby's VM, is actively working on the MVM technology, which will allow to run multiple VMs in a single Ruby process, and therefore achieving concurrency in yet another way.
Taking all the other performance improvements into account, there are less and less arguments for using REE, it's safe to switch to 1.9.3 or 2.0.0 once it's out, especially since the 1.8 series will no longer be actively developed and many popular projects have announced to quit their support for 1.8 sometime soon.
Edit:
As Holger pointed out, REE has also been EOLed, and there will be no port to 1.9 or further. So it's not only safe to switch, but also the right thing to do :)

Use JRuby for Ruby web applications? Is it worth it?

Background: I'm writing a 'standard' (nothing special) web application in Ruby (not Rails) and I need to start thinking about deployment.
So I've been hearing a lot of recommendations to use JRuby to deploy Ruby web applications, regardless of whether you actually need Java libraries or not. How true is this? Is it worth using the Java implementation just for speed? Would I gain anything else by doing so? Would I run into any issues?
PS: I don't know Java that well, so "you can write parts of it in Java" isn't very helpful.
JRuby is one of the most complete ruby implementations (there are a lot other ones out there such as IronRuby, Maglev, Rubinius, XRuby, YARV, MacRuby). It is very comprehensive, therefore, unless you use gems that use native C code, you will very very likely be just fine compatibility-wise.
JRuby is a bit faster than the actual C implementation, but it supports actual threads, whereas the official implementation is struggling a bit into getting it (it still uses Green Threads). Using Java threads from JRuby is quite trivial, even though it will require you to couple your code with Java (with a little DI, this coupling will only happen once, though).
Another benefit: runtime tools. Java, as a platform, instead of a language, has a lot of runtime tools to help you diagnose problems and check the state of the application (profilers, JConsole, and so on).
Twitter engineers also mentioned that the Ruby VM kinda has trouble being an environment for long lived processes, while the JVM is very good at that, because it’s been optimized for that over the last ten years.
Ruby also had a little security issue recently, which did not affect JRuby's implementation.
On the other hand, your project requires more artifacts (JVM, JRuby jars, etc). If you are using an application that will stay alive for long, and you want better runtime support, JRuby can be a great way to go. Otherwise, you can safely wait until you need these things to actually make the move (it is likely to go smoothly).
I use and love JRuby on daily basis, but I suggest you use MRI (a.k.a. C-Ruby) unless you have a actual need for JRuby.
Reasons for using JRuby:
Java integration
Restricted environment (your machine has Java installed by not ruby and you don't have root)
Restricted environment (you have ruby installed but don't have root so can't install gems you need)
You reached the limits of Ruby 1.8 performance and cannot use 1.9
From what you've described, you don't have any of the above reasons.
C-Ruby 1.9 has significant performance improvements over C-Ruby 1.8. I've yet to read (or find out for myself) how C-Ruby 1.9 compares with JRuby 1.8 or JRuby 1.9. In anycase, you don't have a performance problem (yet) so don't worry about it.
The good news is, you can start with either and convert later if needs be. It's all Ruby, and the Webrick and Mongrel gems work with both.
As mentioned above, ruby gems that have C extensions cannot be installed under JRuby. Hopefully this will change in the future if ruby C extensions utilize FFI.
http://kenai.com/projects/ruby-ffi/pages/Home
http://isitjruby.com/

Differences between Ruby VMs

What are the advantages/disadvantages of the major Ruby VMs (things like features, compatibility, performance, and quirks?) I know there are also some bonus features like being able to use Java interfaces through JRuby, too. Those would also be helpful to note. Does any VM have a clear advantage at this point, and in what contexts?
I've used both Matz's Ruby and JRuby, and they solve different tasks. If you are developing a straight Ruby or Rails app, then that will probably suffice, but if there are some powerful Java libraries that would help a lot, then JRuby might be worthwhile.
I haven't done anything overly complicated, but JRuby seemed to match up pretty well, at least as far as implementing the core language features (I haven't run into any differences yet, but they may exist).
One little anecdote I wish to share... I was writing a script to interact with a DB2 database. The DB2 support in Ruby is abysmal... you have to install the whole DB2 express version just to be able to compile the Ruby drivers, which didn't even work for me. I got fed up and switched to JRuby, using JDBC and a few small DB2 JDBC jars. It resolved my problem perfectly. The point? Well, if gaining access to some Java libraries will simplify the problem at hand, by all means go for it!
I hope this was helpful! Sorry I don't have any experience with other VMs....
One more caveat I have read about, but I don't know the details too well... JRuby I think supports threading via Java threads, instead of the "green" threads supported in Matz's implementation... so if you want multithreading on multicore systems, JRuby will probably serve you better... unless you want to do the threading in C.
Here's a bit of info I scrounged up on the main VMs: Ruby MRI, Ruby 1.9 (YARV), JRuby, XRuby, Rubinius, and IronRuby
There was a performance benchmark last year that compared the major VMs, but with how quickly VM development has been it probably is not as relevant today. Ruby 1.9 was generally the fastest, and still has the edge over JRuby for now, I believe.
Four VMs are currently capable of running Ruby on Rails: Ruby MRI, Ruby 1.9, JRuby, and Rubinius.
XRuby runs on the JVM, as does JRuby, and compiles the Ruby source files to a Java .class.
IronRuby runs on .NET, making use of their DLR, and allows you to integrate Ruby with the .NET libraries and infrastructure. It cannot yet run Ruby on Rails.
There is also a VM called HotRuby that lets you run Ruby source code in the browser or in Flash.

Resources