Does the Ruby VM define the file format that runs on it? - ruby

Like the ".class" file that runs on JVM, does the Ruby VM(MRI, or YARV) define the file format that rns on it?
I have read in some articles that YARV bytecode was considered a internal format, which means that there are little documentations or specifications about it, is it true?
Thanks!

tl;dr No, ruby does not define a runable format other than the language itself.
It is true that ruby doesn't have a standard bytecode format like java, etherium, and others. Every ruby implementation JRuby, MRI Ruby, TruffleRuby, etc. gets to decide how exactly they run ruby code.
Ruby implementations can choose to implement their own bytecode format. Those formats would likely be private to that ruby implementation and unlikely to be seen outside of that eco-system. (however as we've seen with languages popping up that run on the JVM it's possible for other people to piggy back off your language's VM if they'd like)
It's completely valid for a ruby implementation to run inside the JVM run on java byte code JVM jruby, to introduce a compilation step to produce native machine code ruby-llvm, or even to sometimes produce native machine code at runtime with a JIT and sometimes use a more traditional interpreter MRI Ruby.

Related

Ruby Application with metaprogramming - How to get bytecode or binary or something which can be ported

I have a ruby application with meta programming. Application job is to read multiple input files (which can have user defined functions and data). Application parses and then executes functions after parsing.
I am wondering, is there a way I can get an executable or bytecode or something after parsing is done. So I can export this exe/bytecode and run "execute" part of processing on a different machine(same configuration).
Env detail:
Ruby 2.7.2
OS: Red Hat Enterprise Linux (7.7)
Not really. Because of how dynamic Ruby is, the only way to meaningfully know that you have valid code is to run it. The C API almost supports this through the ruby_exec_node function, but last I checked there is not much support for this style of running Ruby. Even if that was more convenient to use, I don't know how much skipping the parsing step would really save you.
Most of the Ruby VMs that produce bytecode are JIT-style, meaning they only generate byte code after they have executed the code once via the VM. This works best for Ruby because again it is so dynamic that it's hard to say what it is going to do without actually running it.

Ruby precompiled libraries

For example I installed devise gem in my ruby project and I can see all it's source code. Is it possible to have a library without source code in a form of precompiled binary? Like assembly in .Net? And how to add it to the project without gem package manager, manually?
No, this is not possible in Ruby. The closest you'll come to in Ruby is extensions that wrap precompiled libraries. For example Nokogiri or bcrypt-ruby.
The short answer is no.
Ruby is not a compiled language. Although YARV compiles the source code on the fly, it does not generate byte code. The only compiled implementation of Ruby, Rubinius, does not promise byte code compatibility among different versions (even among minor versions).
Ruby does not have a portable format for code other than the Ruby Language itself. The only other portable format it has, is the Marshal format, but that is only for data, it cannot serialize code, i.e. all methods, Procs, lambdas, and blocks will be left out and/or cause an error.
Note that this is actually no different from other languages. E.g. the Java Language and the JVM bytecode language are two distinct languages defined in two distinct specifications. There is no guarantee that an implementation of Java also includes an implementation of JVML and vice versa. For example, Avian only implements the JVML, it does not implement Java. And GWT only implements Java, but not the JVML.
For example, Java applications that rely on being able to execute JVM bytecode on the fly (e.g. JRuby with its JIT compiler or the Kilim concurrency framework) won't work on Android. JRuby solves this by disabling the JIT on Android and running purely interpreted.
JRuby and IronRuby both have Ahead-Of-Time compilers that compile Ruby to JVML bytecode and CLI CIL bytecode, respectively. Opal has an Ahead-Of-Time compiler that compiles Ruby to ECMAScript.
YARV has an Ahead-Of-Time compiler that compiles Ruby to YARV bytecode, however, that bytecode is usually fed directly to the YARV bytecode VM and never persisted or exposed anywhere. And there is a good reason for that: YARV bytecode is unsafe, the YARV VM implicitly trusts that the compiler will only generate code which doesn't corrupt the VM. That's a reasonable assumption to make if the compiler is part of the VM, but if you allow bytecode to be read in from external sources, then you don't know what compiler produced it, and you can get the VM in an inconsistent state.
In order to prevent this, either the bytecode has to be changed to be safe, or the VM needs a bytecode verifier.
You can actually access the bytecode, and, with some work, it is possible to read it from a file and execute, but for the reasons I outlined, that is unsafe.
Rubinius supports writing and reading bytecode to and from files, but that is not really intended for distributing bytecode archives. Rubinius uses it for caching the compiled bytecode as a latency optimization (similar to how CPython does). There used to be a feature in Rubinius similar to JVM .class and .jar files (.rbc and .rba), where you could load code from an .rba archive, but I'm not sure it still exists.
So, several Ruby implementations have several degrees of support of some form of bytecode compilation, but none that work robustly, and none that are portable across Ruby implementations.

Compile ruby script for faster use

I have a ruby script of around 2200 lines which is being used repeatedly, so is there a way to convert it into binary or compile it so it runs faster.
It seems that only JRuby has a compiler which is good news if Java is your target platform and no help if not.
Perhaps you could re-architect your solution to include the Ruby interpreter in a pipeline so that your script can be launched once and run continually as it receives input?
If you are using MRI, your best bet is to optimize your code as the JIT compilation already delivers proven performance. You can also switch to 1.9 version as it is faster in various cases.

Disabling Ruby 1.9.x's YARV compiler

There is a very noticeable difference in application initiation time between running my specs from the command line with ruby 1.9.x vs. 1.8.7. My application initiates much faster with ruby 1.8.7 than with ruby 1.9.1 or 1.9.2. The application initiation difference is approximately 18 seconds. It takes about 5 seconds for my app to initialize with 1.8.7 and 23 seconds with 1.9.1 and 1.9.2.
Application initialization time is not a big deal for production, but it is a very big deal for BDD development. Every time I change my code and run my specs, I have to wait an additional 18 seconds per iteration.
I assume this application initialization time is attributed to YARV compiling bytecode as my application initializes.
Am I right about my YARV slowing down my application initialization, and is there a way to disable YARV on the command line. It would be very nice to be able to disable YARV only when I am running my specs.
YARV is a pure Ruby compiler. If you disable it, there's nothing left.
More precisely: YARV is a multi-phase implementation, where each of the phases is single-mode. It consists of a Ruby-to-YARV compiler and a YARV interpreter. If you remove the compiler, the only thing you are left with is the YARV bytecode interpreter. Unless you want to start writing your app in YARV bytecode, that interpreter is not going to be of much use to you.
This is in contrast to mixed-mode implementations such as JRuby and IronRuby which implement multiple execution modes (in particular, both a compiler and an interpreter) within a single phase. If you turn off the compiler in JRuby or IronRuby, you are still left with a usable execution engine, because they both also contain an interpreter. In fact, JRuby actually started out as a pure interpreter and added the compiler later and IronRuby started out as pure compiler and they added an interpreter exactly because of the same problem that you are seeing: compiling unit tests is simply a waste of time.
The only interpreted implementation of Ruby 1.9 right now is JRuby. Of course, there you have the whole JVM overhead to deal with. The best thing you can do is try how fast you can get JRuby to start up (use the nightly 1.6.0.dev builds from http://CI.JRuby.Org/snapshots/ since both 1.9 support and startup time are heavily worked on right this very moment) using either some very fast starting desktop-oriented JVM like IBM J9 or try JRuby's Nailgun support, which keeps a JVM running in the background.
You could also try to get rid of RubyGems, which generally eats up quite a lot of startup time, especially on YARV. (Use the --disable-gem commandline option to truly get rid of it.)
There's currently no way to disable YARV, simply because MRI 1.9 only includes the virtual machine, and not an interpreter. Maintaining both would be way too much job for the core team.
In the future there will probably be ways to cache the bytecode YARV generates (like Rubinius does). At the moment there is no way to load such bytecode through Ruby (see #971), but you could easily write a C extension which accomplishes it.
However, I would say that 18 seconds is way too much and it's probably a bug. I know there are some threads at ruby-core which discusses the slowness of require; maybe you find something interesting there!
the next RC of 1.9.2 out might be faster as it doesn't preload $: with all your gem lib dirs.

Is it possible to compile Ruby to byte code as with Python?

In Python, if I want to give out an application without sources I can compile it into bytecode .pyc, is there a way to do something like it in Ruby?
I wrote a much more detailed answer to this question in the question "Can Ruby, PHP, or Perl create a pre-compiled file for the code like Python?"
The answer is: it depends. The Ruby Language has no provisions for compiling to bytecode and/or running bytecode. It also has no specfication of a bytecode format. The reason for this is simple: it would be much too restricting for language implementors if they were forced to use a specific bytecode format, or even bytecodes at all. For example, XRuby and JRuby compile to JVM bytecode, Ruby.NET and IronRuby compile to CIL bytecode, Cardinal compiles to PAST, SmallRuby compiles to Smalltalk/X bytecode, MagLev compiles to GemStone/S bytecode. For all of these implementations it would be plain stupid to use any other bytecode format than the one they currently use, since their whole point is interoperating with other language implementations that use the same bytecode format.
Simlar for MacRuby: it compiles to native code, not bytecode. Again, using bytecode would be stupid, since one of the goals is to run Ruby on the iPhone, which pretty much requires native code.
And of course there is MRI, which is a pure AST-walking script interpreter and thus doesn't have a bytecode format.
That being said, there are some Ruby Implementations which allow compiling to and loading from bytecode. Rubinius allows that, for example. (Indeed, it has to have that functionality since its Ruby compiler is written in Ruby, and thus the compiler must be compiled to Rubinius bytecode first, in order to solve the Catch-22.)
YARV also can save and load bytecode, although the loading functionality is currently disabled until a bytecode verifier is implemented that prevents users from loading manipulated bytecode that could crash or otherwise subvert the interpreter.
But, of course, both of these have their own bytecode formats and don't understand each other's (nor tinyrb's or RubyGoLightly's or ...) Also, neither of those formats is understood by a JVM or a CLR and vice versa.
However, the whole point is irrelevant because, as Mark points out, you can always reverse engineer the byte code anyway, especially in cases like CPython, PyPy, Rubinius, YARV, tinyrb, RubyGoLightly, where the bytecode format was specifically designed to be very close to the source language.
In general it is simply impossible to protect code that way. The reason is simple: you want the machine to be able to execute the code. (Otherwise what's the point in writing it in the first place?) However, in order to execute the code, the machine must understand the code. Since machines are much dumber than humans, it follows that any code that can be understood by a machine can just as well be understood by a human, no matter whether that code happens to be in source form, bytecode, assembly, native code or a deck of punch cards.
There is only one workable technical solution: if you control the entire execution pipeline, i.e. build your own CPU, your own computer, your own operating system, your own compiler, your own interpreter, and so forth and use strong cryptography to protect all of those, then and only then might you be able to protect your code. However, as e.g. Microsoft found out the hard way with the XBox 360, even doing all of that and hiring some of the smartest cryptographers and mathematicians on the planet, doesn't guarantee success.
The only real solution is not a technical but a social one: as soon as you have written your code, it is automatically fully protected by copyright law, without you having to do one single thing. That's it. Your code is protected.
The short answer is "YES",
check rubini.us
It will solve your problem.
Here is how to compile ruby code:
http://rubini.us/2011/03/17/running-ruby-with-no-ruby/
Although Ruby's 1.9 YARV VM is a byte-code compiler I don't believe it can dump the byte-code to disk. You might want to look at the alternative compiler, Rubinius, I believe it has this ability. You should note though that byte-code pyc files (and I imagine the ruby equivalent) can be pretty easily "decompiled".
Not with the MRI interpretter, no.
Some newer VM's are being worked on where this is on the table, but these aren't widely used (or even ready to be used) at this point.
If you use Jruby, you can compile your Ruby code into Java .class files (including your Rails stuff) to execute them with (open)jdk out of the box!
You can even compile your complete stuff into a .war file to deploy it on Apache Tomcat or Jboss with a tool called "warbler"
https://rubygems.org/gems/warbler/
Depends on your ruby.
JRuby - https://github.com/jruby/jruby/wiki/JRubyCompiler
MRuby - http://mruby.org/docs/articles/executing-ruby-code-with-mruby.html
MRI (C)Ruby - https://devtechnica.com/ruby-language/compile-ruby-code-to-binary-and-execute-it

Resources