Ruby precompiled libraries - ruby

For example I installed devise gem in my ruby project and I can see all it's source code. Is it possible to have a library without source code in a form of precompiled binary? Like assembly in .Net? And how to add it to the project without gem package manager, manually?

No, this is not possible in Ruby. The closest you'll come to in Ruby is extensions that wrap precompiled libraries. For example Nokogiri or bcrypt-ruby.

The short answer is no.
Ruby is not a compiled language. Although YARV compiles the source code on the fly, it does not generate byte code. The only compiled implementation of Ruby, Rubinius, does not promise byte code compatibility among different versions (even among minor versions).

Ruby does not have a portable format for code other than the Ruby Language itself. The only other portable format it has, is the Marshal format, but that is only for data, it cannot serialize code, i.e. all methods, Procs, lambdas, and blocks will be left out and/or cause an error.
Note that this is actually no different from other languages. E.g. the Java Language and the JVM bytecode language are two distinct languages defined in two distinct specifications. There is no guarantee that an implementation of Java also includes an implementation of JVML and vice versa. For example, Avian only implements the JVML, it does not implement Java. And GWT only implements Java, but not the JVML.
For example, Java applications that rely on being able to execute JVM bytecode on the fly (e.g. JRuby with its JIT compiler or the Kilim concurrency framework) won't work on Android. JRuby solves this by disabling the JIT on Android and running purely interpreted.
JRuby and IronRuby both have Ahead-Of-Time compilers that compile Ruby to JVML bytecode and CLI CIL bytecode, respectively. Opal has an Ahead-Of-Time compiler that compiles Ruby to ECMAScript.
YARV has an Ahead-Of-Time compiler that compiles Ruby to YARV bytecode, however, that bytecode is usually fed directly to the YARV bytecode VM and never persisted or exposed anywhere. And there is a good reason for that: YARV bytecode is unsafe, the YARV VM implicitly trusts that the compiler will only generate code which doesn't corrupt the VM. That's a reasonable assumption to make if the compiler is part of the VM, but if you allow bytecode to be read in from external sources, then you don't know what compiler produced it, and you can get the VM in an inconsistent state.
In order to prevent this, either the bytecode has to be changed to be safe, or the VM needs a bytecode verifier.
You can actually access the bytecode, and, with some work, it is possible to read it from a file and execute, but for the reasons I outlined, that is unsafe.
Rubinius supports writing and reading bytecode to and from files, but that is not really intended for distributing bytecode archives. Rubinius uses it for caching the compiled bytecode as a latency optimization (similar to how CPython does). There used to be a feature in Rubinius similar to JVM .class and .jar files (.rbc and .rba), where you could load code from an .rba archive, but I'm not sure it still exists.
So, several Ruby implementations have several degrees of support of some form of bytecode compilation, but none that work robustly, and none that are portable across Ruby implementations.

Related

Does the Ruby VM define the file format that runs on it?

Like the ".class" file that runs on JVM, does the Ruby VM(MRI, or YARV) define the file format that rns on it?
I have read in some articles that YARV bytecode was considered a internal format, which means that there are little documentations or specifications about it, is it true?
Thanks!
tl;dr No, ruby does not define a runable format other than the language itself.
It is true that ruby doesn't have a standard bytecode format like java, etherium, and others. Every ruby implementation JRuby, MRI Ruby, TruffleRuby, etc. gets to decide how exactly they run ruby code.
Ruby implementations can choose to implement their own bytecode format. Those formats would likely be private to that ruby implementation and unlikely to be seen outside of that eco-system. (however as we've seen with languages popping up that run on the JVM it's possible for other people to piggy back off your language's VM if they'd like)
It's completely valid for a ruby implementation to run inside the JVM run on java byte code JVM jruby, to introduce a compilation step to produce native machine code ruby-llvm, or even to sometimes produce native machine code at runtime with a JIT and sometimes use a more traditional interpreter MRI Ruby.

What kind of interpreter is the Ruby MRI?

Is it a language interpreter? Or a bytecode interpreter / JIT compiler? Where can I learn more about the implementation (other than by browsing the source code)?
Note: the term "MRI" is confusing. It means "Matz's Ruby/Reference Implementation/Interpreter". However, MRI has been retired and isn't developed or maintained anymore.
MRI was a pure AST-walking interpreter, with no compilation involved anywhere.
The confusing thing is: Matz has written a new implementation, but that's called MRuby, not MRI. And the implementation that is now called MRI wasn't written by Matz. So, really, it is best to simply not use that term at all, and be specific about which implementation you are talking about.
The name of the implementation that people now call MRI is actually YARV (for Yet Another Ruby VM), and it was written by Koichi Sasada. It consists of an Ahead-Of-Time compiler which compiles Ruby source code to YARV byte code and an interpreter which interprets said byte code. Thus, it is a completely typical byte code VM, exactly like CPython for Python, Zend Engine for PHP, the Lua VM, older versions of Rubinius, older versions of SpiderMonkey for ECMAScript, and so on.
There is talk about adding a JIT compiler from YARV bytecode to native machine code to the VM for YARV 3, which would then make the VM a mixed-mode execution engine.
Matz's current implementation, MRuby, is also a bytecoded VM.
For completeness' sake, here are a couple of other Ruby implementations, first the currently production-ready ones, and then a couple of historically interesting ones:
Rubinius: compiles Ruby source code to Rubinius byte code ahead-of-time, then hands that bytecode off to a mixed-mode execution engine consisting of a bytecode interpreter and an LLVM-based JIT compiler; they have recently introduced or are currently in the process of introducing a separate Intermediate Representation (IR) for the JIT compiler, so the interpreter works off Rubinius bytecode, but the JIT compiler works off Compiler IR. Rubinius also belongs into the "historically interesting" category, because it was the first successful Ruby implementation a significant part of which was implemented in Ruby; there had been other projects before, but Rubinius was the first to be production-ready.
JRuby: the main mode is a mixed-mode execution engine consisting of an AST-walking interpreter, and a JIT compiler that first translates the AST into IR, which it then further compiles to JVM bytecode. The other mode is an AOT compiler which compiles Ruby sourcecode to JVM bytecode ahead-of-time.
Opal: an Ahead-Of-Time compiler that compiles Ruby sourcecode to ECMAScript sourcecode.
MagLev: an implementation based on the GemStone/S Smalltalk VM. Unfortunately, I don't know much about it, I believe it compiles Ruby sourcecode to GemStone/S bytecode, the GemStone/S VM then is a standard mixed-mode VM with a bytecode interpreter and a JIT compiler.
Some no longer maintained but historically interesting implementations:
Topaz: an implementation using the RPython/PyPy VM framework; the PyPy framework is interesting because it includes a tracing JIT compiler that unlike other JIT compilers doesn't work besides the interpreter and compiles the user program, instead it compiles the interpreter while it is interpreting the user programs. What this basically means is that the JIT has to be written only once by the PyPy developers, and every language implementor using the PyPy framework only has to write a simple bytecode interpreter gets an optimizing native JIT compiler for free.
XRuby: the first static AOT compiler for Ruby, implemented for the JVM.
IronRuby: it started out as a pure JIT compiler without an interpreter, but an interpreter was later added, because it turned out that this actually improved performance (which is contrary to the popular myth that interpreters are slow).
unholy: a proof-of-concept AOT compiler that compiles YARV bytecode to CPython bytecode; this was hacked up by _why the lucky stiff when the Google App Engine first came out and only supported Python, the idea was that you could compile your Ruby sourcecode to YARV bytecode using YARV, compile the YARV bytecode to CPython bytecode using unholy, compile the CPython bytecode to Python sourcecode using decompyle, and then upload the Python sourcecode to GAE to run your shiny new Ruby app.
Honorable mentions go to: tinyrb, metaruby, Ruby.NET, Red Sun, HotRuby, BlueRuby, SmallRuby
A couple of interesting current research projects are:
JRuby+Truffle: this project is re-implementing JRuby's internals using the Truffle AST interpreter framework from Oracle Labs; this version, when run on a Graal-enable JVM (another Oracle Labs research project) is able to attain performance similar to Java and sometimes even reaching (and overtaking) C.
Ruby+OMR: IBM has broken up its J9 JVM into independently re-usable, language-independent building blocks for VM implementors and released it under an open source license under the Eclipse umbrella as the Eclipse Open Managed Runtime. It's not an academic project: the Java 8 version of IBM J9 is actually implemented using OMR. The Ruby+OMR project is a proof-of-concept by the OMR developers, replacing YARV's garbage collector with OMR's, and adding OMR's JIT compiler, profiler, and debugger to YARV. It is fairly impressive just how language-independent all the stuff really is, the entire patch is less than 10000 lines, and that is not just the glue code, it actually includes all the required OMR components as well. (There's also an equivalent Python+OMR project, but that's still non-public.)
Last but not least, you may sometimes hear about "Rite". Rite was used as a codename for a complete re-write of MRI for over a decade. Matz said that when he wrote MRI he didn't actually know anything about language implementation, so he wanted to do it "right" (get it?) a second time. At the same time, there was also a lot of talk about Ruby 2.0, wanting to fix some long-standing design deficiencies in the language. The two were lumped together, so Rite was talked about as the new implementation of Ruby 2.0. However, YARV came along and was so good that Matz decided he didn't need to write his own VM after all, and he basically decided that "YARV is Rite".
But now, he did write his own VM nonetheless, which is why you will sometimes hear MRuby (or its VM component) referred to as "Rite".
It's a bytecode interpreter called YARV, written by Sasada Koichi.
Here's one example of how it looks:
puts RubyVM::InstructionSequence.compile("1+1").disasm
== disasm: #<ISeq:<compiled>#<compiled>>================================
0000 trace 1 ( 1)
0002 putobject_OP_INT2FIX_O_1_C_
0003 putobject_OP_INT2FIX_O_1_C_
0004 opt_plus <callinfo!mid:+, argc:1, ARGS_SIMPLE>, <callcache>
0007 leave
Further reading:
YARV instruction set
While MRI doesn't have a JIT yet, there's the Ruby+OMR project, that's trying to add a JIT compiler based on Eclipse OMR:
Ruby+OMR JIT Compiler: What’s next?
Ruby now has a VM-generated JIT compiler!
Since 2.6.0-preview1 branch was merged, Ruby now has a basic JIT compiler called "MJIT" (YARV MRI Jit). It is inspired by the works of Vladimir Makarov who proposed a RTL (Register Transfer Language) based instruction set instead of a stack based one. The speedups are not yet apparent, because not all instruction paths are handled by MJIT, but the branch contains a basis for future work.

Why does truffleruby need C extensions?

Current status of truffleruby says:
TruffleRuby is progressing fast but is currently probably not ready for you to try running your full Ruby application on. Support for critical C extensions such as OpenSSL and Nokogiri is missing.
Why does truffleruby need C extensions? It's built on GraalVM which is built on top of the JVM, it is in fact a fork of JRuby:
TruffleRuby is a fork of JRuby, combining it with code from the Rubinius project, and also containing code from the standard implementation of Ruby, MRI.
Can't they use JRuby world gems instead of depending on their C variants?
EDIT link to the issue on github
Running C extensions is hard because the C extension API is just the entire internals of MRI exposed as a header file. You aren't programming against a clean API - you're programming against all the implementation details and internal design decisions of MRI.
JRuby's Java extensions have exactly the same problem! The JRuby Java extension API is just the entire internals of JRuby, and you aren't programming against an API, instead you're programming against all the implementations details and design decisions of JRuby.
We plan to eventually tackle both problems in the same way - which is to add another level of abstraction over the C or Java code using an interpreter which we can intercept and redirect when required, so that it believes it is running against MRI or JRuby internals, but really we redirect these to our internals.
We think C extensions are more important, so we're tackling those first. We haven't really started on Java extensions yet, but we have started the underlying interpreter for Java that we'll use.
This video explains all
https://youtu.be/YLtjkP9bD_U?t=1562
You have already gotten a good answer by the project lead himself, but I want to offer a different point of view:
Why does truffleruby need C extensions?
It doesn't need them. But they do exist and there is code out there which uses them, and it would sure be nice to be able to run that code.

Rubinius in RubySL

I use Ruby from a user level and really don't deal with the internals. I have know Rubinius as 'Ruby in Ruby' which I assumed was a generalization. Recently, I got an error with Rubinius in the RubySL (no, I don't have error msg).
I started looking at RubySL and was a little surprised to see Rubinius everywhere. I really like Ruby and was just curious why Rubinius is in most of the RubySL? It seems to be used with things like locks / unlocks (such as https://github.com/rubysl/rubysl-thread/blob/2.0/lib/rubysl/thread/thread.rb ). Definitely not questioning it, just curious.
RubySL is short for Ruby Standard Library. It is a basic part of the shipped code bundle which forms what is generally known as Ruby. The standards library provides rather basic stuff you often need but which doesn't need to be part of the core language.
For example, the implementation of the Hash or Array, the language keywords, how assignment works, ... are part of the core language. These are often implemented in a language other than Ruby. MRI (the common C-Ruby) implements this mostly in C, JRuby implements this in Java. Rubinius implements this patly in C++ but mostly in Ruby itself. It can do this by bootstrapping itself from a very simple base VM and gradually adding more stuff with Ruby.
The standards library however is mostly implemented in Ruby in all implementations (with some exceptions mostly for performance reasons). Now, all Ruby implementations right now have their own implementation of the Ruby standards library which can thus differ in minor details.
Rubinius' approach to implementing a standards library was to implement it as separate gems. The idea was to one day provide a common standards library which could be used by other implementations (including MRI). This is in line with the efforts of esp. the Rubinius community to drive the RubySpec project in order to provide a common language specification and test suite for all Ruby implementations.
The RubySpec project was eventually abandoned and right now, it doesn't seem as if other Ruby implementations seem to be moving to the RubySL gems for implementing their standards library.
Thus, (and this is the TL;DR), the RubySL gems implement the Ruby Standard Library for the Rubinius project. Thus, it is expected to see the Rubinius project all over the place there: it is their code which is generally not used by other Ruby implementations.

Is it possible to compile Ruby to byte code as with Python?

In Python, if I want to give out an application without sources I can compile it into bytecode .pyc, is there a way to do something like it in Ruby?
I wrote a much more detailed answer to this question in the question "Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?"
The answer is: it depends. The Ruby Language has no provisions for compiling to bytecode and/or running bytecode. It also has no specfication of a bytecode format. The reason for this is simple: it would be much too restricting for language implementors if they were forced to use a specific bytecode format, or even bytecodes at all. For example, XRuby and JRuby compile to JVM bytecode, Ruby.NET and IronRuby compile to CIL bytecode, Cardinal compiles to PAST, SmallRuby compiles to Smalltalk/X bytecode, MagLev compiles to GemStone/S bytecode. For all of these implementations it would be plain stupid to use any other bytecode format than the one they currently use, since their whole point is interoperating with other language implementations that use the same bytecode format.
Simlar for MacRuby: it compiles to native code, not bytecode. Again, using bytecode would be stupid, since one of the goals is to run Ruby on the iPhone, which pretty much requires native code.
And of course there is MRI, which is a pure AST-walking script interpreter and thus doesn't have a bytecode format.
That being said, there are some Ruby Implementations which allow compiling to and loading from bytecode. Rubinius allows that, for example. (Indeed, it has to have that functionality since its Ruby compiler is written in Ruby, and thus the compiler must be compiled to Rubinius bytecode first, in order to solve the Catch-22.)
YARV also can save and load bytecode, although the loading functionality is currently disabled until a bytecode verifier is implemented that prevents users from loading manipulated bytecode that could crash or otherwise subvert the interpreter.
But, of course, both of these have their own bytecode formats and don't understand each other's (nor tinyrb's or RubyGoLightly's or ...) Also, neither of those formats is understood by a JVM or a CLR and vice versa.
However, the whole point is irrelevant because, as Mark points out, you can always reverse engineer the byte code anyway, especially in cases like CPython, PyPy, Rubinius, YARV, tinyrb, RubyGoLightly, where the bytecode format was specifically designed to be very close to the source language.
In general it is simply impossible to protect code that way. The reason is simple: you want the machine to be able to execute the code. (Otherwise what's the point in writing it in the first place?) However, in order to execute the code, the machine must understand the code. Since machines are much dumber than humans, it follows that any code that can be understood by a machine can just as well be understood by a human, no matter whether that code happens to be in source form, bytecode, assembly, native code or a deck of punch cards.
There is only one workable technical solution: if you control the entire execution pipeline, i.e. build your own CPU, your own computer, your own operating system, your own compiler, your own interpreter, and so forth and use strong cryptography to protect all of those, then and only then might you be able to protect your code. However, as e.g. Microsoft found out the hard way with the XBox 360, even doing all of that and hiring some of the smartest cryptographers and mathematicians on the planet, doesn't guarantee success.
The only real solution is not a technical but a social one: as soon as you have written your code, it is automatically fully protected by copyright law, without you having to do one single thing. That's it. Your code is protected.
The short answer is "YES",
check rubini.us
It will solve your problem.
Here is how to compile ruby code:
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Although Ruby's 1.9 YARV VM is a byte-code compiler I don't believe it can dump the byte-code to disk. You might want to look at the alternative compiler, Rubinius, I believe it has this ability. You should note though that byte-code pyc files (and I imagine the ruby equivalent) can be pretty easily "decompiled".
Not with the MRI interpretter, no.
Some newer VM's are being worked on where this is on the table, but these aren't widely used (or even ready to be used) at this point.
If you use Jruby, you can compile your Ruby code into Java .class files (including your Rails stuff) to execute them with (open)jdk out of the box!
You can even compile your complete stuff into a .war file to deploy it on Apache Tomcat or Jboss with a tool called "warbler"
https://rubygems.org/gems/warbler/
Depends on your ruby.
JRuby - https://github.com/jruby/jruby/wiki/JRubyCompiler
MRuby - http://mruby.org/docs/articles/executing-ruby-code-with-mruby.html
MRI (C)Ruby - https://devtechnica.com/ruby-language/compile-ruby-code-to-binary-and-execute-it

Resources