Are any of the ruby VM's done using the LLVM toolchain? - ruby

I like the LLVM idea. To be honest, I do not much care for Ruby, I'd rather use Perl, or Python, or .... ( it's a long list ).
Nothing personal, it's a great language, but I just prefer others.
However, Ruby has so many good ideas that I might need to STFU and just learn it, if nothing else to debug the tools.
Before I do so, I am wondering if there is a practical and usable implementation of Ruby done using the LLVM toolchain?

Well, you have llvmruby, RubyComp and more important, Rubinius, but MacRuby also uses LLVM for "optimization passes, JIT and AOT compilation of Ruby expressions".


What kind of interpreter is the Ruby MRI?

Is it a language interpreter? Or a bytecode interpreter / JIT compiler? Where can I learn more about the implementation (other than by browsing the source code)?
Note: the term "MRI" is confusing. It means "Matz's Ruby/Reference Implementation/Interpreter". However, MRI has been retired and isn't developed or maintained anymore.
MRI was a pure AST-walking interpreter, with no compilation involved anywhere.
The confusing thing is: Matz has written a new implementation, but that's called MRuby, not MRI. And the implementation that is now called MRI wasn't written by Matz. So, really, it is best to simply not use that term at all, and be specific about which implementation you are talking about.
The name of the implementation that people now call MRI is actually YARV (for Yet Another Ruby VM), and it was written by Koichi Sasada. It consists of an Ahead-Of-Time compiler which compiles Ruby source code to YARV byte code and an interpreter which interprets said byte code. Thus, it is a completely typical byte code VM, exactly like CPython for Python, Zend Engine for PHP, the Lua VM, older versions of Rubinius, older versions of SpiderMonkey for ECMAScript, and so on.
There is talk about adding a JIT compiler from YARV bytecode to native machine code to the VM for YARV 3, which would then make the VM a mixed-mode execution engine.
Matz's current implementation, MRuby, is also a bytecoded VM.
For completeness' sake, here are a couple of other Ruby implementations, first the currently production-ready ones, and then a couple of historically interesting ones:
Rubinius: compiles Ruby source code to Rubinius byte code ahead-of-time, then hands that bytecode off to a mixed-mode execution engine consisting of a bytecode interpreter and an LLVM-based JIT compiler; they have recently introduced or are currently in the process of introducing a separate Intermediate Representation (IR) for the JIT compiler, so the interpreter works off Rubinius bytecode, but the JIT compiler works off Compiler IR. Rubinius also belongs into the "historically interesting" category, because it was the first successful Ruby implementation a significant part of which was implemented in Ruby; there had been other projects before, but Rubinius was the first to be production-ready.
JRuby: the main mode is a mixed-mode execution engine consisting of an AST-walking interpreter, and a JIT compiler that first translates the AST into IR, which it then further compiles to JVM bytecode. The other mode is an AOT compiler which compiles Ruby sourcecode to JVM bytecode ahead-of-time.
Opal: an Ahead-Of-Time compiler that compiles Ruby sourcecode to ECMAScript sourcecode.
MagLev: an implementation based on the GemStone/S Smalltalk VM. Unfortunately, I don't know much about it, I believe it compiles Ruby sourcecode to GemStone/S bytecode, the GemStone/S VM then is a standard mixed-mode VM with a bytecode interpreter and a JIT compiler.
Some no longer maintained but historically interesting implementations:
Topaz: an implementation using the RPython/PyPy VM framework; the PyPy framework is interesting because it includes a tracing JIT compiler that unlike other JIT compilers doesn't work besides the interpreter and compiles the user program, instead it compiles the interpreter while it is interpreting the user programs. What this basically means is that the JIT has to be written only once by the PyPy developers, and every language implementor using the PyPy framework only has to write a simple bytecode interpreter gets an optimizing native JIT compiler for free.
XRuby: the first static AOT compiler for Ruby, implemented for the JVM.
IronRuby: it started out as a pure JIT compiler without an interpreter, but an interpreter was later added, because it turned out that this actually improved performance (which is contrary to the popular myth that interpreters are slow).
unholy: a proof-of-concept AOT compiler that compiles YARV bytecode to CPython bytecode; this was hacked up by _why the lucky stiff when the Google App Engine first came out and only supported Python, the idea was that you could compile your Ruby sourcecode to YARV bytecode using YARV, compile the YARV bytecode to CPython bytecode using unholy, compile the CPython bytecode to Python sourcecode using decompyle, and then upload the Python sourcecode to GAE to run your shiny new Ruby app.
Honorable mentions go to: tinyrb, metaruby, Ruby.NET, Red Sun, HotRuby, BlueRuby, SmallRuby
A couple of interesting current research projects are:
JRuby+Truffle: this project is re-implementing JRuby's internals using the Truffle AST interpreter framework from Oracle Labs; this version, when run on a Graal-enable JVM (another Oracle Labs research project) is able to attain performance similar to Java and sometimes even reaching (and overtaking) C.
Ruby+OMR: IBM has broken up its J9 JVM into independently re-usable, language-independent building blocks for VM implementors and released it under an open source license under the Eclipse umbrella as the Eclipse Open Managed Runtime. It's not an academic project: the Java 8 version of IBM J9 is actually implemented using OMR. The Ruby+OMR project is a proof-of-concept by the OMR developers, replacing YARV's garbage collector with OMR's, and adding OMR's JIT compiler, profiler, and debugger to YARV. It is fairly impressive just how language-independent all the stuff really is, the entire patch is less than 10000 lines, and that is not just the glue code, it actually includes all the required OMR components as well. (There's also an equivalent Python+OMR project, but that's still non-public.)
Last but not least, you may sometimes hear about "Rite". Rite was used as a codename for a complete re-write of MRI for over a decade. Matz said that when he wrote MRI he didn't actually know anything about language implementation, so he wanted to do it "right" (get it?) a second time. At the same time, there was also a lot of talk about Ruby 2.0, wanting to fix some long-standing design deficiencies in the language. The two were lumped together, so Rite was talked about as the new implementation of Ruby 2.0. However, YARV came along and was so good that Matz decided he didn't need to write his own VM after all, and he basically decided that "YARV is Rite".
But now, he did write his own VM nonetheless, which is why you will sometimes hear MRuby (or its VM component) referred to as "Rite".
It's a bytecode interpreter called YARV, written by Sasada Koichi.
Here's one example of how it looks:
puts RubyVM::InstructionSequence.compile("1+1").disasm
== disasm: #<ISeq:<compiled>#<compiled>>================================
0000 trace 1 ( 1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 putobject_OP_INT2FIX_O_1_C_
0004 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0007 leave
Further reading:
YARV instruction set
While MRI doesn't have a JIT yet, there's the Ruby+OMR project, that's trying to add a JIT compiler based on Eclipse OMR:
Ruby+OMR JIT Compiler: What’s next?
Ruby now has a VM-generated JIT compiler!
Since 2.6.0-preview1 branch was merged, Ruby now has a basic JIT compiler called "MJIT" (YARV MRI Jit). It is inspired by the works of Vladimir Makarov who proposed a RTL (Register Transfer Language) based instruction set instead of a stack based one. The speedups are not yet apparent, because not all instruction paths are handled by MJIT, but the branch contains a basis for future work.

rubysdl vs. ruby-sdl-ffi

Could anyone here tell me the difference between the Ruby gems rubysdl and ruby-sdl-ffi, like speed variances? If so, which would you prefer? I'm wondering for the sake of my gem that I'm writing, Rubydraw (located here).
Thanks in advance!
I am the author of ruby-sdl-ffi. This question was brought to my attention today, so I am answering for the benefit of anyone who is still curious.
The main difference is that ruby-sdl-ffi is pure Ruby code that accesses SDL (and related libraries) via FFI (foreign function interface), whereas rubysdl is an extension written in C that links to SDL (and related libraries). There are pros and cons to each approach. (Obviously, I feel that FFI is the better approach, or I would not have bothered to write ruby-sdl-ffi.)
Both libraries offer similar feature sets, although there are some differences (noted below). They can both do 2D games with images, sounds/music, and user input from keyboard, mouse, and/or joystick/gamepad. They can also both be used with OpenGL (via ruby-opengl or ffi-opengl) to create hardware-accelerated 3D games. They can both be used on Windows, MacOS X, and Linux (and perhaps other platforms), although rubysdl only works on MacOS X if you use a special Ruby interpreter wrapper called "rsdl".
I have not run any serious benchmarks, so I can't provide any definitive data about raw performance. My general impression is that rubysdl might have a slight performance advantage, but they are close enough that performance isn't the main factor when deciding between the two libraries.
Here is how I would summarize the pros and cons of the two libraries:
Easier for users to install the gem. It does not need to be compiled, so users don't need to install a C compiler or toolchain.
Works with MRI (the "usual" Ruby interpreter), JRuby, and probably Rubinius.
No special interpreter is required on MacOS X. However, the MacOS X support may need to be updated to get it working totally right on the latest versions of MacOS X. (Apple keeps changing things.)
Lower-level API, more closely mirrors the C libraries. This may be good or bad depending on your perspective.
Currently has bindings for SDL, SDL_gfx, SDL_image, SDL_mixer, and SDL_ttf libraries. (Compared to rubysdl, it adds SDL_gfx but lacks SGE and SMPEG.) Adding bindings for other libraries is quite easy.
Not actively developed or maintained anymore. I don't have the time or interest anymore, but someone is welcome to take over, and I can provide guidance.
Somewhat experimental, and has some rough edges.
More mature and polished, has withstood the test of time.
Better support for Japanese text input and rendering.
Higher-level, more abstract API.
Binds SDL, SGE, SMPEG, SDL_image, SDL_mixer, and SDL_ttf libraries.
Requires users to have a C compiler to install the gem. This can be quite a headache on Windows and MacOS X.
Requires MacOS X users to run your game using the special "rsdl" Ruby interpreter. Thus, to my knowledge, it will not work with JRuby or Rubinius on MacOS X.
Does not seem to be actively developed or maintained anymore either.

How do you write a compiler for a language in that language? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How can a language's compiler be written in that language?
implementing a compiler in “itself”
I was looking at Rubinius, a Ruby implementation that compiles to bytecode using a compiler written in Ruby. I cannot get my head around this. How do you write a compiler for a language in the language itself? It seems like it would be just text without anything to compile it into an executable that could then compile the future code written in Ruby. I get confused just typing that sentence. Can anyone help explain this?
To simplify: you first write a compiler for the compiler, in a different language. Then, you compile the compiler, and voila!
So, you need some sort of language which already has a compiler - but since there are many such, you can write the Ruby compiler compiler (!) e.g. in C, which will then compile the Ruby compiler, which can then compile Ruby programs, even further versions of itself.
Of course, the original compilers were written in machine code, compiled compilers for assembly, which in turn compiled compilers for e.g. C or Fortran, which compiled compilers for...pretty much everything. Iterative development in action.
The process is called bootstrapping - possibly named after Baron Munchhausen's story in which he pulled himself out of a swamp by his own bootstraps :)
Regarding the bootstrapping of a compiler it's worth reading about this devilishly clever hack.
I get confused just reading that sentence.
It may help to think of the compiler as a translator, which compilers are often called. Its purpose is to take source code that humans can read and translate it into binary code that computers can read. In the case of Rubinius, the code that it reads happens to be Ruby code, and the code that it converts it into is machine code (actually LLVM machine code which is itself further compiled into Intel machine code, but that's just a background detail). Rubinius itself could have been written in just about any programming language. It just happened to have been written in the same language that it compiles.
Of course, you need something to run Rubinius in the first place, and this most likely a regular Ruby interpreter. Note, however, that once you are able to run Rubinius on an interpreter, you can pass it its own source code, and it will create and run a compiled version of itself. This is called bootstrapping, from the old phrase, "pulling yourself up by the bootstraps".
One final note: Ruby programs can't invoke arbitrary machine code. That part of Rubinius is actually written in C++.
Well it is possible to do it in the following order:
Write a compiler in any language, say C for your Ruby code.
Now that you can compile Ruby code, you can write a compiler that compiles ruby code and compile this compiler with the C compiler you wrote in step 1. wahh this sentence is strange!
From now on you can compile all your ruby code with the compiler written in 2. :)
Have fun! :)
A compiler is just something that transforms source code into an executable. So it doen't matter what it is written in - it can be the same language it is compiling or any other language of sufficient power.
The fun comes when you are writing a compiler for a language for a platform, written in the same language, that doesn't yet have a compiler for your implementation language. Your choices here are to compile on another platform for which you do have a compiler, or write a compiler in another language, and use that to compile the "real" compiler.
It's a 2 step process:
write a Ruby compiler in some other lanaguage like C, assuming a Ruby compiler doesn't yet exist
since you now have a Ruby compiler, you can write a Ruby program that is a (new) Ruby compiler
Since somebody already wrote a Ruby compiler (Matz), you "only" have to do the second part. Easier said than done.
All of the answers so far have explained how to bootstrap the compiler by using a different compiler. However, there is an alternative: compiling the compiler by hand. There's no reason why the compiler has to be executed by a machine, it can just as well be executed by a human.

What is the state of Ruby as a compiled language?

Ruby has been around for a while now so I was wondering if there was any work being done on a compiler for it? I know that compiler design is hindered by things like Eval() so I would not expect implementations to be 100 percent accurate? My own searches have turned up sparse results.
MacRuby offers Ahead-of-Time Compilation as of v0.5. It uses LLVM to compile binaries that will run on the Objective-C runtime.
Rubinius is a JIT compiler for Ruby. A pure compiler will never exist for Ruby because the language is far too dynamic for a static compiler to work. Whatever it did internally would be incredibly ugly and would evolve towards a JIT as they tried to optimize it anyway.
There's Mirah, for compiling Ruby code into Java bytecode:
I believe you could obfuscate your code this way.

Is it possible to compile Ruby to byte code as with Python?

In Python, if I want to give out an application without sources I can compile it into bytecode .pyc, is there a way to do something like it in Ruby?
I wrote a much more detailed answer to this question in the question "Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?"
The answer is: it depends. The Ruby Language has no provisions for compiling to bytecode and/or running bytecode. It also has no specfication of a bytecode format. The reason for this is simple: it would be much too restricting for language implementors if they were forced to use a specific bytecode format, or even bytecodes at all. For example, XRuby and JRuby compile to JVM bytecode, Ruby.NET and IronRuby compile to CIL bytecode, Cardinal compiles to PAST, SmallRuby compiles to Smalltalk/X bytecode, MagLev compiles to GemStone/S bytecode. For all of these implementations it would be plain stupid to use any other bytecode format than the one they currently use, since their whole point is interoperating with other language implementations that use the same bytecode format.
Simlar for MacRuby: it compiles to native code, not bytecode. Again, using bytecode would be stupid, since one of the goals is to run Ruby on the iPhone, which pretty much requires native code.
And of course there is MRI, which is a pure AST-walking script interpreter and thus doesn't have a bytecode format.
That being said, there are some Ruby Implementations which allow compiling to and loading from bytecode. Rubinius allows that, for example. (Indeed, it has to have that functionality since its Ruby compiler is written in Ruby, and thus the compiler must be compiled to Rubinius bytecode first, in order to solve the Catch-22.)
YARV also can save and load bytecode, although the loading functionality is currently disabled until a bytecode verifier is implemented that prevents users from loading manipulated bytecode that could crash or otherwise subvert the interpreter.
But, of course, both of these have their own bytecode formats and don't understand each other's (nor tinyrb's or RubyGoLightly's or ...) Also, neither of those formats is understood by a JVM or a CLR and vice versa.
However, the whole point is irrelevant because, as Mark points out, you can always reverse engineer the byte code anyway, especially in cases like CPython, PyPy, Rubinius, YARV, tinyrb, RubyGoLightly, where the bytecode format was specifically designed to be very close to the source language.
In general it is simply impossible to protect code that way. The reason is simple: you want the machine to be able to execute the code. (Otherwise what's the point in writing it in the first place?) However, in order to execute the code, the machine must understand the code. Since machines are much dumber than humans, it follows that any code that can be understood by a machine can just as well be understood by a human, no matter whether that code happens to be in source form, bytecode, assembly, native code or a deck of punch cards.
There is only one workable technical solution: if you control the entire execution pipeline, i.e. build your own CPU, your own computer, your own operating system, your own compiler, your own interpreter, and so forth and use strong cryptography to protect all of those, then and only then might you be able to protect your code. However, as e.g. Microsoft found out the hard way with the XBox 360, even doing all of that and hiring some of the smartest cryptographers and mathematicians on the planet, doesn't guarantee success.
The only real solution is not a technical but a social one: as soon as you have written your code, it is automatically fully protected by copyright law, without you having to do one single thing. That's it. Your code is protected.
The short answer is "YES",
It will solve your problem.
Here is how to compile ruby code:
Although Ruby's 1.9 YARV VM is a byte-code compiler I don't believe it can dump the byte-code to disk. You might want to look at the alternative compiler, Rubinius, I believe it has this ability. You should note though that byte-code pyc files (and I imagine the ruby equivalent) can be pretty easily "decompiled".
Not with the MRI interpretter, no.
Some newer VM's are being worked on where this is on the table, but these aren't widely used (or even ready to be used) at this point.
If you use Jruby, you can compile your Ruby code into Java .class files (including your Rails stuff) to execute them with (open)jdk out of the box!
You can even compile your complete stuff into a .war file to deploy it on Apache Tomcat or Jboss with a tool called "warbler"
Depends on your ruby.
JRuby -
MRuby -
MRI (C)Ruby -
