What is the state of Ruby as a compiled language? - ruby

Ruby has been around for a while now so I was wondering if there was any work being done on a compiler for it? I know that compiler design is hindered by things like Eval() so I would not expect implementations to be 100 percent accurate? My own searches have turned up sparse results.

MacRuby offers Ahead-of-Time Compilation as of v0.5. It uses LLVM to compile binaries that will run on the Objective-C runtime.

Rubinius is a JIT compiler for Ruby. A pure compiler will never exist for Ruby because the language is far too dynamic for a static compiler to work. Whatever it did internally would be incredibly ugly and would evolve towards a JIT as they tried to optimize it anyway.

There's Mirah, for compiling Ruby code into Java bytecode:
http://www.mirah.org/
I believe you could obfuscate your code this way.

Related

Ruby precompiled libraries

For example I installed devise gem in my ruby project and I can see all it's source code. Is it possible to have a library without source code in a form of precompiled binary? Like assembly in .Net? And how to add it to the project without gem package manager, manually?
No, this is not possible in Ruby. The closest you'll come to in Ruby is extensions that wrap precompiled libraries. For example Nokogiri or bcrypt-ruby.
The short answer is no.
Ruby is not a compiled language. Although YARV compiles the source code on the fly, it does not generate byte code. The only compiled implementation of Ruby, Rubinius, does not promise byte code compatibility among different versions (even among minor versions).
Ruby does not have a portable format for code other than the Ruby Language itself. The only other portable format it has, is the Marshal format, but that is only for data, it cannot serialize code, i.e. all methods, Procs, lambdas, and blocks will be left out and/or cause an error.
Note that this is actually no different from other languages. E.g. the Java Language and the JVM bytecode language are two distinct languages defined in two distinct specifications. There is no guarantee that an implementation of Java also includes an implementation of JVML and vice versa. For example, Avian only implements the JVML, it does not implement Java. And GWT only implements Java, but not the JVML.
For example, Java applications that rely on being able to execute JVM bytecode on the fly (e.g. JRuby with its JIT compiler or the Kilim concurrency framework) won't work on Android. JRuby solves this by disabling the JIT on Android and running purely interpreted.
JRuby and IronRuby both have Ahead-Of-Time compilers that compile Ruby to JVML bytecode and CLI CIL bytecode, respectively. Opal has an Ahead-Of-Time compiler that compiles Ruby to ECMAScript.
YARV has an Ahead-Of-Time compiler that compiles Ruby to YARV bytecode, however, that bytecode is usually fed directly to the YARV bytecode VM and never persisted or exposed anywhere. And there is a good reason for that: YARV bytecode is unsafe, the YARV VM implicitly trusts that the compiler will only generate code which doesn't corrupt the VM. That's a reasonable assumption to make if the compiler is part of the VM, but if you allow bytecode to be read in from external sources, then you don't know what compiler produced it, and you can get the VM in an inconsistent state.
In order to prevent this, either the bytecode has to be changed to be safe, or the VM needs a bytecode verifier.
You can actually access the bytecode, and, with some work, it is possible to read it from a file and execute, but for the reasons I outlined, that is unsafe.
Rubinius supports writing and reading bytecode to and from files, but that is not really intended for distributing bytecode archives. Rubinius uses it for caching the compiled bytecode as a latency optimization (similar to how CPython does). There used to be a feature in Rubinius similar to JVM .class and .jar files (.rbc and .rba), where you could load code from an .rba archive, but I'm not sure it still exists.
So, several Ruby implementations have several degrees of support of some form of bytecode compilation, but none that work robustly, and none that are portable across Ruby implementations.

What are some compiled programming languages that compile fast?

I think I finally know what I want in a compiled programming language, a fast compiler. I get the feeling that this is a really superficial thing to care about but some time after switching from Java to Scala for a while I realized that being able to make a small change in code and immediately run the program is actually quite important to me. Besides Java and Go I don't know of any languages that really value compile speed.
Delphi/Object Pascal. Make a change, press F9 and it runs - you don't even notice the compile time. A full rebuild of a fairly substantial project that we run takes of the order of 10-20 seconds, even on a fairly wimpy machine
There's an open source variant available at www.freepascal.org. I've not messed with it but it reportedly is just as fast - it's the design of the Pascal language that allows this.
Java isn't fast for compiling. The feature you a looking for is probably a hot replacement/redeployment while coding. Eclipse recompiles just the files you changed.
You could try some interpreted languages. They usually don't require compiling at all.
I wouldn't choose a language based on compilation speed...
Java is not the fastest compiler out there.
Pascal (and its close relatives) is designed to be fast - it can be compiled in a single pass. Objective Caml is known for its compilation speed (and there is a REPL too).
On the other hand, what you really need is REPL, not a fast recompilation and re-linking of everything. So you may want to try a language which supports an incremental compilation. Clojure fits well (and it is built on top of the same JVM you're used to). Common Lisp is another option.
I'd like to add that there official compilers for languages and unofficial ones made by different people. Obviously because of this the performance changes per compiler.
If you were to talk just about the official compiler I'd say it's probably Fortran. It's very old but it's still used in most science and engineering projects because it is one of the fastest languages. C and C++ come probably tied in second because there also used in science and engineering.

Are any of the ruby VM's done using the LLVM toolchain?

I like the LLVM idea. To be honest, I do not much care for Ruby, I'd rather use Perl, or Python, or .... ( it's a long list ).
Nothing personal, it's a great language, but I just prefer others.
However, Ruby has so many good ideas that I might need to STFU and just learn it, if nothing else to debug the tools.
Before I do so, I am wondering if there is a practical and usable implementation of Ruby done using the LLVM toolchain?
Well, you have llvmruby, RubyComp and more important, Rubinius, but MacRuby also uses LLVM for "optimization passes, JIT and AOT compilation of Ruby expressions".

How do you write a compiler for a language in that language? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How can a language's compiler be written in that language?
implementing a compiler in “itself”
I was looking at Rubinius, a Ruby implementation that compiles to bytecode using a compiler written in Ruby. I cannot get my head around this. How do you write a compiler for a language in the language itself? It seems like it would be just text without anything to compile it into an executable that could then compile the future code written in Ruby. I get confused just typing that sentence. Can anyone help explain this?
To simplify: you first write a compiler for the compiler, in a different language. Then, you compile the compiler, and voila!
So, you need some sort of language which already has a compiler - but since there are many such, you can write the Ruby compiler compiler (!) e.g. in C, which will then compile the Ruby compiler, which can then compile Ruby programs, even further versions of itself.
Of course, the original compilers were written in machine code, compiled compilers for assembly, which in turn compiled compilers for e.g. C or Fortran, which compiled compilers for...pretty much everything. Iterative development in action.
The process is called bootstrapping - possibly named after Baron Munchhausen's story in which he pulled himself out of a swamp by his own bootstraps :)
Regarding the bootstrapping of a compiler it's worth reading about this devilishly clever hack.
http://catb.org/jargon/html/B/back-door.html
I get confused just reading that sentence.
It may help to think of the compiler as a translator, which compilers are often called. Its purpose is to take source code that humans can read and translate it into binary code that computers can read. In the case of Rubinius, the code that it reads happens to be Ruby code, and the code that it converts it into is machine code (actually LLVM machine code which is itself further compiled into Intel machine code, but that's just a background detail). Rubinius itself could have been written in just about any programming language. It just happened to have been written in the same language that it compiles.
Of course, you need something to run Rubinius in the first place, and this most likely a regular Ruby interpreter. Note, however, that once you are able to run Rubinius on an interpreter, you can pass it its own source code, and it will create and run a compiled version of itself. This is called bootstrapping, from the old phrase, "pulling yourself up by the bootstraps".
One final note: Ruby programs can't invoke arbitrary machine code. That part of Rubinius is actually written in C++.
Well it is possible to do it in the following order:
Write a compiler in any language, say C for your Ruby code.
Now that you can compile Ruby code, you can write a compiler that compiles ruby code and compile this compiler with the C compiler you wrote in step 1. wahh this sentence is strange!
From now on you can compile all your ruby code with the compiler written in 2. :)
Have fun! :)
A compiler is just something that transforms source code into an executable. So it doen't matter what it is written in - it can be the same language it is compiling or any other language of sufficient power.
The fun comes when you are writing a compiler for a language for a platform, written in the same language, that doesn't yet have a compiler for your implementation language. Your choices here are to compile on another platform for which you do have a compiler, or write a compiler in another language, and use that to compile the "real" compiler.
It's a 2 step process:
write a Ruby compiler in some other lanaguage like C, assuming a Ruby compiler doesn't yet exist
since you now have a Ruby compiler, you can write a Ruby program that is a (new) Ruby compiler
Since somebody already wrote a Ruby compiler (Matz), you "only" have to do the second part. Easier said than done.
All of the answers so far have explained how to bootstrap the compiler by using a different compiler. However, there is an alternative: compiling the compiler by hand. There's no reason why the compiler has to be executed by a machine, it can just as well be executed by a human.

Is it possible to compile Ruby to byte code as with Python?

In Python, if I want to give out an application without sources I can compile it into bytecode .pyc, is there a way to do something like it in Ruby?
I wrote a much more detailed answer to this question in the question "Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?"
The answer is: it depends. The Ruby Language has no provisions for compiling to bytecode and/or running bytecode. It also has no specfication of a bytecode format. The reason for this is simple: it would be much too restricting for language implementors if they were forced to use a specific bytecode format, or even bytecodes at all. For example, XRuby and JRuby compile to JVM bytecode, Ruby.NET and IronRuby compile to CIL bytecode, Cardinal compiles to PAST, SmallRuby compiles to Smalltalk/X bytecode, MagLev compiles to GemStone/S bytecode. For all of these implementations it would be plain stupid to use any other bytecode format than the one they currently use, since their whole point is interoperating with other language implementations that use the same bytecode format.
Simlar for MacRuby: it compiles to native code, not bytecode. Again, using bytecode would be stupid, since one of the goals is to run Ruby on the iPhone, which pretty much requires native code.
And of course there is MRI, which is a pure AST-walking script interpreter and thus doesn't have a bytecode format.
That being said, there are some Ruby Implementations which allow compiling to and loading from bytecode. Rubinius allows that, for example. (Indeed, it has to have that functionality since its Ruby compiler is written in Ruby, and thus the compiler must be compiled to Rubinius bytecode first, in order to solve the Catch-22.)
YARV also can save and load bytecode, although the loading functionality is currently disabled until a bytecode verifier is implemented that prevents users from loading manipulated bytecode that could crash or otherwise subvert the interpreter.
But, of course, both of these have their own bytecode formats and don't understand each other's (nor tinyrb's or RubyGoLightly's or ...) Also, neither of those formats is understood by a JVM or a CLR and vice versa.
However, the whole point is irrelevant because, as Mark points out, you can always reverse engineer the byte code anyway, especially in cases like CPython, PyPy, Rubinius, YARV, tinyrb, RubyGoLightly, where the bytecode format was specifically designed to be very close to the source language.
In general it is simply impossible to protect code that way. The reason is simple: you want the machine to be able to execute the code. (Otherwise what's the point in writing it in the first place?) However, in order to execute the code, the machine must understand the code. Since machines are much dumber than humans, it follows that any code that can be understood by a machine can just as well be understood by a human, no matter whether that code happens to be in source form, bytecode, assembly, native code or a deck of punch cards.
There is only one workable technical solution: if you control the entire execution pipeline, i.e. build your own CPU, your own computer, your own operating system, your own compiler, your own interpreter, and so forth and use strong cryptography to protect all of those, then and only then might you be able to protect your code. However, as e.g. Microsoft found out the hard way with the XBox 360, even doing all of that and hiring some of the smartest cryptographers and mathematicians on the planet, doesn't guarantee success.
The only real solution is not a technical but a social one: as soon as you have written your code, it is automatically fully protected by copyright law, without you having to do one single thing. That's it. Your code is protected.
The short answer is "YES",
check rubini.us
It will solve your problem.
Here is how to compile ruby code:
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Although Ruby's 1.9 YARV VM is a byte-code compiler I don't believe it can dump the byte-code to disk. You might want to look at the alternative compiler, Rubinius, I believe it has this ability. You should note though that byte-code pyc files (and I imagine the ruby equivalent) can be pretty easily "decompiled".
Not with the MRI interpretter, no.
Some newer VM's are being worked on where this is on the table, but these aren't widely used (or even ready to be used) at this point.
If you use Jruby, you can compile your Ruby code into Java .class files (including your Rails stuff) to execute them with (open)jdk out of the box!
You can even compile your complete stuff into a .war file to deploy it on Apache Tomcat or Jboss with a tool called "warbler"
https://rubygems.org/gems/warbler/
Depends on your ruby.
JRuby - https://github.com/jruby/jruby/wiki/JRubyCompiler
MRuby - http://mruby.org/docs/articles/executing-ruby-code-with-mruby.html
MRI (C)Ruby - https://devtechnica.com/ruby-language/compile-ruby-code-to-binary-and-execute-it

Resources