Is it possible to compile Ruby to byte code as with Python? - ruby

In Python, if I want to give out an application without sources I can compile it into bytecode .pyc, is there a way to do something like it in Ruby?

I wrote a much more detailed answer to this question in the question "Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?"
The answer is: it depends. The Ruby Language has no provisions for compiling to bytecode and/or running bytecode. It also has no specfication of a bytecode format. The reason for this is simple: it would be much too restricting for language implementors if they were forced to use a specific bytecode format, or even bytecodes at all. For example, XRuby and JRuby compile to JVM bytecode, Ruby.NET and IronRuby compile to CIL bytecode, Cardinal compiles to PAST, SmallRuby compiles to Smalltalk/X bytecode, MagLev compiles to GemStone/S bytecode. For all of these implementations it would be plain stupid to use any other bytecode format than the one they currently use, since their whole point is interoperating with other language implementations that use the same bytecode format.
Simlar for MacRuby: it compiles to native code, not bytecode. Again, using bytecode would be stupid, since one of the goals is to run Ruby on the iPhone, which pretty much requires native code.
And of course there is MRI, which is a pure AST-walking script interpreter and thus doesn't have a bytecode format.
That being said, there are some Ruby Implementations which allow compiling to and loading from bytecode. Rubinius allows that, for example. (Indeed, it has to have that functionality since its Ruby compiler is written in Ruby, and thus the compiler must be compiled to Rubinius bytecode first, in order to solve the Catch-22.)
YARV also can save and load bytecode, although the loading functionality is currently disabled until a bytecode verifier is implemented that prevents users from loading manipulated bytecode that could crash or otherwise subvert the interpreter.
But, of course, both of these have their own bytecode formats and don't understand each other's (nor tinyrb's or RubyGoLightly's or ...) Also, neither of those formats is understood by a JVM or a CLR and vice versa.
However, the whole point is irrelevant because, as Mark points out, you can always reverse engineer the byte code anyway, especially in cases like CPython, PyPy, Rubinius, YARV, tinyrb, RubyGoLightly, where the bytecode format was specifically designed to be very close to the source language.
In general it is simply impossible to protect code that way. The reason is simple: you want the machine to be able to execute the code. (Otherwise what's the point in writing it in the first place?) However, in order to execute the code, the machine must understand the code. Since machines are much dumber than humans, it follows that any code that can be understood by a machine can just as well be understood by a human, no matter whether that code happens to be in source form, bytecode, assembly, native code or a deck of punch cards.
There is only one workable technical solution: if you control the entire execution pipeline, i.e. build your own CPU, your own computer, your own operating system, your own compiler, your own interpreter, and so forth and use strong cryptography to protect all of those, then and only then might you be able to protect your code. However, as e.g. Microsoft found out the hard way with the XBox 360, even doing all of that and hiring some of the smartest cryptographers and mathematicians on the planet, doesn't guarantee success.
The only real solution is not a technical but a social one: as soon as you have written your code, it is automatically fully protected by copyright law, without you having to do one single thing. That's it. Your code is protected.

The short answer is "YES",
check rubini.us
It will solve your problem.
Here is how to compile ruby code:
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/

Although Ruby's 1.9 YARV VM is a byte-code compiler I don't believe it can dump the byte-code to disk. You might want to look at the alternative compiler, Rubinius, I believe it has this ability. You should note though that byte-code pyc files (and I imagine the ruby equivalent) can be pretty easily "decompiled".

Not with the MRI interpretter, no.
Some newer VM's are being worked on where this is on the table, but these aren't widely used (or even ready to be used) at this point.

If you use Jruby, you can compile your Ruby code into Java .class files (including your Rails stuff) to execute them with (open)jdk out of the box!
You can even compile your complete stuff into a .war file to deploy it on Apache Tomcat or Jboss with a tool called "warbler"
https://rubygems.org/gems/warbler/

Depends on your ruby.
JRuby - https://github.com/jruby/jruby/wiki/JRubyCompiler
MRuby - http://mruby.org/docs/articles/executing-ruby-code-with-mruby.html
MRI (C)Ruby - https://devtechnica.com/ruby-language/compile-ruby-code-to-binary-and-execute-it

Related

Does the Ruby VM define the file format that runs on it?

Like the ".class" file that runs on JVM, does the Ruby VM(MRI, or YARV) define the file format that rns on it?
I have read in some articles that YARV bytecode was considered a internal format, which means that there are little documentations or specifications about it, is it true?
Thanks!
tl;dr No, ruby does not define a runable format other than the language itself.
It is true that ruby doesn't have a standard bytecode format like java, etherium, and others. Every ruby implementation JRuby, MRI Ruby, TruffleRuby, etc. gets to decide how exactly they run ruby code.
Ruby implementations can choose to implement their own bytecode format. Those formats would likely be private to that ruby implementation and unlikely to be seen outside of that eco-system. (however as we've seen with languages popping up that run on the JVM it's possible for other people to piggy back off your language's VM if they'd like)
It's completely valid for a ruby implementation to run inside the JVM run on java byte code JVM jruby, to introduce a compilation step to produce native machine code ruby-llvm, or even to sometimes produce native machine code at runtime with a JIT and sometimes use a more traditional interpreter MRI Ruby.

What kind of interpreter is the Ruby MRI?

Is it a language interpreter? Or a bytecode interpreter / JIT compiler? Where can I learn more about the implementation (other than by browsing the source code)?
Note: the term "MRI" is confusing. It means "Matz's Ruby/Reference Implementation/Interpreter". However, MRI has been retired and isn't developed or maintained anymore.
MRI was a pure AST-walking interpreter, with no compilation involved anywhere.
The confusing thing is: Matz has written a new implementation, but that's called MRuby, not MRI. And the implementation that is now called MRI wasn't written by Matz. So, really, it is best to simply not use that term at all, and be specific about which implementation you are talking about.
The name of the implementation that people now call MRI is actually YARV (for Yet Another Ruby VM), and it was written by Koichi Sasada. It consists of an Ahead-Of-Time compiler which compiles Ruby source code to YARV byte code and an interpreter which interprets said byte code. Thus, it is a completely typical byte code VM, exactly like CPython for Python, Zend Engine for PHP, the Lua VM, older versions of Rubinius, older versions of SpiderMonkey for ECMAScript, and so on.
There is talk about adding a JIT compiler from YARV bytecode to native machine code to the VM for YARV 3, which would then make the VM a mixed-mode execution engine.
Matz's current implementation, MRuby, is also a bytecoded VM.
For completeness' sake, here are a couple of other Ruby implementations, first the currently production-ready ones, and then a couple of historically interesting ones:
Rubinius: compiles Ruby source code to Rubinius byte code ahead-of-time, then hands that bytecode off to a mixed-mode execution engine consisting of a bytecode interpreter and an LLVM-based JIT compiler; they have recently introduced or are currently in the process of introducing a separate Intermediate Representation (IR) for the JIT compiler, so the interpreter works off Rubinius bytecode, but the JIT compiler works off Compiler IR. Rubinius also belongs into the "historically interesting" category, because it was the first successful Ruby implementation a significant part of which was implemented in Ruby; there had been other projects before, but Rubinius was the first to be production-ready.
JRuby: the main mode is a mixed-mode execution engine consisting of an AST-walking interpreter, and a JIT compiler that first translates the AST into IR, which it then further compiles to JVM bytecode. The other mode is an AOT compiler which compiles Ruby sourcecode to JVM bytecode ahead-of-time.
Opal: an Ahead-Of-Time compiler that compiles Ruby sourcecode to ECMAScript sourcecode.
MagLev: an implementation based on the GemStone/S Smalltalk VM. Unfortunately, I don't know much about it, I believe it compiles Ruby sourcecode to GemStone/S bytecode, the GemStone/S VM then is a standard mixed-mode VM with a bytecode interpreter and a JIT compiler.
Some no longer maintained but historically interesting implementations:
Topaz: an implementation using the RPython/PyPy VM framework; the PyPy framework is interesting because it includes a tracing JIT compiler that unlike other JIT compilers doesn't work besides the interpreter and compiles the user program, instead it compiles the interpreter while it is interpreting the user programs. What this basically means is that the JIT has to be written only once by the PyPy developers, and every language implementor using the PyPy framework only has to write a simple bytecode interpreter gets an optimizing native JIT compiler for free.
XRuby: the first static AOT compiler for Ruby, implemented for the JVM.
IronRuby: it started out as a pure JIT compiler without an interpreter, but an interpreter was later added, because it turned out that this actually improved performance (which is contrary to the popular myth that interpreters are slow).
unholy: a proof-of-concept AOT compiler that compiles YARV bytecode to CPython bytecode; this was hacked up by _why the lucky stiff when the Google App Engine first came out and only supported Python, the idea was that you could compile your Ruby sourcecode to YARV bytecode using YARV, compile the YARV bytecode to CPython bytecode using unholy, compile the CPython bytecode to Python sourcecode using decompyle, and then upload the Python sourcecode to GAE to run your shiny new Ruby app.
Honorable mentions go to: tinyrb, metaruby, Ruby.NET, Red Sun, HotRuby, BlueRuby, SmallRuby
A couple of interesting current research projects are:
JRuby+Truffle: this project is re-implementing JRuby's internals using the Truffle AST interpreter framework from Oracle Labs; this version, when run on a Graal-enable JVM (another Oracle Labs research project) is able to attain performance similar to Java and sometimes even reaching (and overtaking) C.
Ruby+OMR: IBM has broken up its J9 JVM into independently re-usable, language-independent building blocks for VM implementors and released it under an open source license under the Eclipse umbrella as the Eclipse Open Managed Runtime. It's not an academic project: the Java 8 version of IBM J9 is actually implemented using OMR. The Ruby+OMR project is a proof-of-concept by the OMR developers, replacing YARV's garbage collector with OMR's, and adding OMR's JIT compiler, profiler, and debugger to YARV. It is fairly impressive just how language-independent all the stuff really is, the entire patch is less than 10000 lines, and that is not just the glue code, it actually includes all the required OMR components as well. (There's also an equivalent Python+OMR project, but that's still non-public.)
Last but not least, you may sometimes hear about "Rite". Rite was used as a codename for a complete re-write of MRI for over a decade. Matz said that when he wrote MRI he didn't actually know anything about language implementation, so he wanted to do it "right" (get it?) a second time. At the same time, there was also a lot of talk about Ruby 2.0, wanting to fix some long-standing design deficiencies in the language. The two were lumped together, so Rite was talked about as the new implementation of Ruby 2.0. However, YARV came along and was so good that Matz decided he didn't need to write his own VM after all, and he basically decided that "YARV is Rite".
But now, he did write his own VM nonetheless, which is why you will sometimes hear MRuby (or its VM component) referred to as "Rite".
It's a bytecode interpreter called YARV, written by Sasada Koichi.
Here's one example of how it looks:
puts RubyVM::InstructionSequence.compile("1+1").disasm
== disasm: #<ISeq:<compiled>#<compiled>>================================
0000 trace 1 ( 1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 putobject_OP_INT2FIX_O_1_C_
0004 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0007 leave
Further reading:
YARV instruction set
While MRI doesn't have a JIT yet, there's the Ruby+OMR project, that's trying to add a JIT compiler based on Eclipse OMR:
Ruby+OMR JIT Compiler: What’s next?
Ruby now has a VM-generated JIT compiler!
Since 2.6.0-preview1 branch was merged, Ruby now has a basic JIT compiler called "MJIT" (YARV MRI Jit). It is inspired by the works of Vladimir Makarov who proposed a RTL (Register Transfer Language) based instruction set instead of a stack based one. The speedups are not yet apparent, because not all instruction paths are handled by MJIT, but the branch contains a basis for future work.

Why does truffleruby need C extensions?

Current status of truffleruby says:
TruffleRuby is progressing fast but is currently probably not ready for you to try running your full Ruby application on. Support for critical C extensions such as OpenSSL and Nokogiri is missing.
Why does truffleruby need C extensions? It's built on GraalVM which is built on top of the JVM, it is in fact a fork of JRuby:
TruffleRuby is a fork of JRuby, combining it with code from the Rubinius project, and also containing code from the standard implementation of Ruby, MRI.
Can't they use JRuby world gems instead of depending on their C variants?
EDIT link to the issue on github
Running C extensions is hard because the C extension API is just the entire internals of MRI exposed as a header file. You aren't programming against a clean API - you're programming against all the implementation details and internal design decisions of MRI.
JRuby's Java extensions have exactly the same problem! The JRuby Java extension API is just the entire internals of JRuby, and you aren't programming against an API, instead you're programming against all the implementations details and design decisions of JRuby.
We plan to eventually tackle both problems in the same way - which is to add another level of abstraction over the C or Java code using an interpreter which we can intercept and redirect when required, so that it believes it is running against MRI or JRuby internals, but really we redirect these to our internals.
We think C extensions are more important, so we're tackling those first. We haven't really started on Java extensions yet, but we have started the underlying interpreter for Java that we'll use.
This video explains all
https://youtu.be/YLtjkP9bD_U?t=1562
You have already gotten a good answer by the project lead himself, but I want to offer a different point of view:
Why does truffleruby need C extensions?
It doesn't need them. But they do exist and there is code out there which uses them, and it would sure be nice to be able to run that code.

Ruby precompiled libraries

For example I installed devise gem in my ruby project and I can see all it's source code. Is it possible to have a library without source code in a form of precompiled binary? Like assembly in .Net? And how to add it to the project without gem package manager, manually?
No, this is not possible in Ruby. The closest you'll come to in Ruby is extensions that wrap precompiled libraries. For example Nokogiri or bcrypt-ruby.
The short answer is no.
Ruby is not a compiled language. Although YARV compiles the source code on the fly, it does not generate byte code. The only compiled implementation of Ruby, Rubinius, does not promise byte code compatibility among different versions (even among minor versions).
Ruby does not have a portable format for code other than the Ruby Language itself. The only other portable format it has, is the Marshal format, but that is only for data, it cannot serialize code, i.e. all methods, Procs, lambdas, and blocks will be left out and/or cause an error.
Note that this is actually no different from other languages. E.g. the Java Language and the JVM bytecode language are two distinct languages defined in two distinct specifications. There is no guarantee that an implementation of Java also includes an implementation of JVML and vice versa. For example, Avian only implements the JVML, it does not implement Java. And GWT only implements Java, but not the JVML.
For example, Java applications that rely on being able to execute JVM bytecode on the fly (e.g. JRuby with its JIT compiler or the Kilim concurrency framework) won't work on Android. JRuby solves this by disabling the JIT on Android and running purely interpreted.
JRuby and IronRuby both have Ahead-Of-Time compilers that compile Ruby to JVML bytecode and CLI CIL bytecode, respectively. Opal has an Ahead-Of-Time compiler that compiles Ruby to ECMAScript.
YARV has an Ahead-Of-Time compiler that compiles Ruby to YARV bytecode, however, that bytecode is usually fed directly to the YARV bytecode VM and never persisted or exposed anywhere. And there is a good reason for that: YARV bytecode is unsafe, the YARV VM implicitly trusts that the compiler will only generate code which doesn't corrupt the VM. That's a reasonable assumption to make if the compiler is part of the VM, but if you allow bytecode to be read in from external sources, then you don't know what compiler produced it, and you can get the VM in an inconsistent state.
In order to prevent this, either the bytecode has to be changed to be safe, or the VM needs a bytecode verifier.
You can actually access the bytecode, and, with some work, it is possible to read it from a file and execute, but for the reasons I outlined, that is unsafe.
Rubinius supports writing and reading bytecode to and from files, but that is not really intended for distributing bytecode archives. Rubinius uses it for caching the compiled bytecode as a latency optimization (similar to how CPython does). There used to be a feature in Rubinius similar to JVM .class and .jar files (.rbc and .rba), where you could load code from an .rba archive, but I'm not sure it still exists.
So, several Ruby implementations have several degrees of support of some form of bytecode compilation, but none that work robustly, and none that are portable across Ruby implementations.

How do you write a compiler for a language in that language? [duplicate]

This question already has answers here:
Closed 12 years ago.
Possible Duplicates:
How can a language's compiler be written in that language?
implementing a compiler in “itself”
I was looking at Rubinius, a Ruby implementation that compiles to bytecode using a compiler written in Ruby. I cannot get my head around this. How do you write a compiler for a language in the language itself? It seems like it would be just text without anything to compile it into an executable that could then compile the future code written in Ruby. I get confused just typing that sentence. Can anyone help explain this?
To simplify: you first write a compiler for the compiler, in a different language. Then, you compile the compiler, and voila!
So, you need some sort of language which already has a compiler - but since there are many such, you can write the Ruby compiler compiler (!) e.g. in C, which will then compile the Ruby compiler, which can then compile Ruby programs, even further versions of itself.
Of course, the original compilers were written in machine code, compiled compilers for assembly, which in turn compiled compilers for e.g. C or Fortran, which compiled compilers for...pretty much everything. Iterative development in action.
The process is called bootstrapping - possibly named after Baron Munchhausen's story in which he pulled himself out of a swamp by his own bootstraps :)
Regarding the bootstrapping of a compiler it's worth reading about this devilishly clever hack.
http://catb.org/jargon/html/B/back-door.html
I get confused just reading that sentence.
It may help to think of the compiler as a translator, which compilers are often called. Its purpose is to take source code that humans can read and translate it into binary code that computers can read. In the case of Rubinius, the code that it reads happens to be Ruby code, and the code that it converts it into is machine code (actually LLVM machine code which is itself further compiled into Intel machine code, but that's just a background detail). Rubinius itself could have been written in just about any programming language. It just happened to have been written in the same language that it compiles.
Of course, you need something to run Rubinius in the first place, and this most likely a regular Ruby interpreter. Note, however, that once you are able to run Rubinius on an interpreter, you can pass it its own source code, and it will create and run a compiled version of itself. This is called bootstrapping, from the old phrase, "pulling yourself up by the bootstraps".
One final note: Ruby programs can't invoke arbitrary machine code. That part of Rubinius is actually written in C++.
Well it is possible to do it in the following order:
Write a compiler in any language, say C for your Ruby code.
Now that you can compile Ruby code, you can write a compiler that compiles ruby code and compile this compiler with the C compiler you wrote in step 1. wahh this sentence is strange!
From now on you can compile all your ruby code with the compiler written in 2. :)
Have fun! :)
A compiler is just something that transforms source code into an executable. So it doen't matter what it is written in - it can be the same language it is compiling or any other language of sufficient power.
The fun comes when you are writing a compiler for a language for a platform, written in the same language, that doesn't yet have a compiler for your implementation language. Your choices here are to compile on another platform for which you do have a compiler, or write a compiler in another language, and use that to compile the "real" compiler.
It's a 2 step process:
write a Ruby compiler in some other lanaguage like C, assuming a Ruby compiler doesn't yet exist
since you now have a Ruby compiler, you can write a Ruby program that is a (new) Ruby compiler
Since somebody already wrote a Ruby compiler (Matz), you "only" have to do the second part. Easier said than done.
All of the answers so far have explained how to bootstrap the compiler by using a different compiler. However, there is an alternative: compiling the compiler by hand. There's no reason why the compiler has to be executed by a machine, it can just as well be executed by a human.

Resources