About the assertion "Your ruby code is never translated into machine language" - ruby

I'm reading the book Ruby under a Microscope, and I don't understand the quoted part in the second chapter:
From what I've understood, the process to run a ruby program is roughly the following
Read the file and tokenize it
Using grammar rules, transform those tokens into instructions in an Abstract Syntax Tree.
Walking through the nodes, transform them into YARV bytecode (this step is called compiling?).
And this last one is the one that troubles me
"Ruby never compiles your Ruby code all the way to machine language.
[...] Ruby interprets the bytecode instructions."
My question is, for those bytecode instructions to be understood and executed, shouldn't I need to translate them into assembly/machine code before? If not, how does the machine understand them?

"Ruby never compiles your Ruby code all the way to machine language. [...] Ruby interprets the bytecode instructions."
That quote is – if not outright wrong – terribly misleading.
First off, Ruby is a programming language. Programming languages don't compile or interpret anything. Compilers and interpreters do that. Programming languages are just a set of abstract mathematical rules and restrictions.
Secondly, there are many different implementations of Ruby: Rubinius, JRuby, IronRuby, MacRuby, MRuby, Topaz, Cardinal, Opal, MagLev, YARV, … And they all work very differently. For example, Rubinius compiles Ruby to Rubinius bytecode, then it collects some statistics while interpreting that bytecode, and then it uses those statistics to compile the bytecode to efficient, performant, native machine code. JRuby interprets the JRuby AST and at the same time collects statistics, then it compiles JRuby AST to JRuby compiler IR, uses the statistics to optimize it, then compiles it further to JVM bytecode. What the JVM then does with that bytecode is up to the specific JVM implementation, but most JVMs will eventually compile JVM bytecode to efficient, performant, native machine code. Opal compiles Ruby code to ECMAScript code, and most ECMAScript implementations will eventually compile ECMAScript source code to efficient, performant, native machine code.
Thirdly, what does "machine language" even mean? YARV bytecode is the machine language of the YARV Machine, is it not? There are CPUs which can execute JVM bytecode directly, does that mean that JVM bytecode is machine language? There are interpreters running on the JVM that can interpret x86 object code, does that mean x86 object code is not machine language? What if I run an x86 interpreter on top of IKVM (a JVM running on top of .NET) on top of .NET on an ARM machine? What is machine language then?
So, to recap:
Ruby is a language, not an implementation, the statement doesn't even make sense.
Most Ruby implementations do (Rubinius, Topaz, MacRuby, MagLev) or at least can (JRuby, IronRuby, Opal, Cardinal) end up with native machine code.
The term "machine language" is ill-defined anyway.
My question is, for those bytecode instructions to be understood and executed, shouldn't I need to translate them into assembly/machine code before?
No, the interpreter understands and executes them. If it translated them into something else, it would be a compiler, not an interpreter.
A compiler translates, but doesn't run. An interpreter runs, but doesn't translate. You need an interpreter somewhere, you cannot get a program running with just a compiler. The compiler simply translates the program from one language to another language. Period. If you want to actually run the program, you need an interpreter. That interpreter may be implemented in hardware, in which case we call it a "CPU", but it's still just an interpreter.
See also Understanding the differences: traditional interpreter, JIT compiler, JIT interpreter and AOT compiler over on Programmers.SE.
If not, how does the machine understand them?
It doesn't. The interpreter understands them. It understands them in the same way that a compiler understands them, except that instead of generating code that corresponds to the semantics of the input program, it runs code that corresponds to the semantics of the input program.
See also Does an interpreter produce machine code? and How Does An Interpreter Work? over on Programmers.SE.

No, the statement you are in a quandary about alone speaks for itself. The Ruby byte code interpreter executes evaluations on behalf of the byte code and passes results (in most cases) to the next byte code set to be evaluated.
It is more complicated than that but think of it like a processing layer between the Ruby byte code and the native machine.

Related

Why don't most interpreted languages like ruby provide an optional compiler?

Are all interpreted languages not eventually machine code? I'm curious if the reason is because companies don't think it's worth the effort, if there is an inherit conflict that makes it impossible, or some other reason. Is there a way to capture the script's executed machine code myself?
Edit
I was speaking loosely because I didn't want the title to be too long. I understand there are no "interpreted languages". I'm talking about languages that are generally interpreted ( not c++, not rust, etc.. ).
"Are all interpreted languages not eventually machine code?" - this was a rhetorical question. The answer is a simple "yes". Because that's how computers work.
I am curious why most companies who create a language with an interpreter don't also supplement it with a compiler (that compiles to native machine code). And I'm curious if I can record the executed machine code myself.
Also, Jörg W Mittag's answer is misleading (and arrogant). Out of "all" these compilers I don't see one that compiles to native machine code. I don't think one of them even exists anymore (go to the Rubinius website). I've also worked with some of them and they have limitations (I can't arbitrarily take a non-trivial script that works with the standard ruby interpreter and compile it).
Why don't most interpreted languages like ruby provide an optional compiler?
There is no such thing as an "interpreted language". Interpretation and compilation are traits of the interpreter or compiler (duh!) not the language. A language is just a set of abstract mathematical rules and restrictions. It is neither interpreted nor compiled. It just is.
Those two terms belong to two completely different levels of abstraction. If English were a typed language, the term "interpreted language" would be a Type Error. The term "interpreted language" is not even wrong, it is non-sensical.
Every language can be implemented by a compiler, and every language can be implemented by an interpreter. Most languages have both interpreted and compiled implementations. Many modern high-performance language implementations combine compilers and interpreters.
Are all interpreted languages not eventually machine code?
In some sense, every language is machine code for an abstract machine corresponding to that language, yes. I.e. Ruby is machine language for the "Ruby Abstract Machine" which is the machine whose execution semantics match exactly the execution semantics of Ruby and whose machine language syntax matches exactly the syntax of Ruby.
if there is an inherit conflict that makes it impossible
All currently existing Ruby implementations (with one caveat) have at least one compiler. Most have more than one. At least one has no interpreter at all.
Opal is purely compiled. It never interprets. There is no interpreter in Opal, only a compiler.
YARV compiles Ruby to YARV byte code. This byte code then gets interpreted by the YARV VM. Code that has been executed more than a certain number of times then gets compiled to native machine code for the underlying architecture (i.e. when running the AMD64 version of YARV, it gets compiled to AMD64 machine code, when running the ARM version, it gets compiled to ARM machine code, and so on).
Artichoke is … somewhat complicated, but suffice to say, it does not interpret Ruby.
MRuby compiles Ruby to MRuby byte code. This byte code then gets interpreted by the MRuby VM.
Rubinius compiles Ruby to Rubinius byte code. This byte code then gets interpreted by the Rubinius VM. Code that has been executed more than a certain number of times then gets compiled to native machine code for the underlying architecture (i.e. when running the AMD64 version of YARV, it gets compiled to AMD64 machine code, when running the ARM version, it gets compiled to ARM machine code, and so on). [Note: there are a couple of different versions of Rubinius. The original version had a native code compiler. This was then removed, and is in the process of being rewritten.]
JRuby compiles Ruby to JRuby IR. This IR then gets interpreted by the JRuby IR interpreter. Code that has been executed more than a certain number of times then gets compiled to JRuby compiler IR. This compiler IR then gets further compiled to JVM byte code. What happens to this JVM byte code depends on the JVM. On the HotSpot JVM, the JVM byte code will be interpreted by the HotSpot interpreter, which will profile the code, and then compile the code that is executed often to native machine code.
TruffleRuby parses Ruby to Truffle AST. This AST then gets interpreted by the Truffle AST interpreter framework. The Truffle AST interpreter framework will then specialize the AST nodes while it is interpreting them, including possibly compiling them to native machine code using Graal.
The last major, mainstream Ruby implementation that was purely interpreted and didn't have a compiler, was the original MRI, which was abandoned years ago.

Is Ruby a scripting language or an interpreted language?

I just noticed that in the wikipedia page of Ruby, this language is defined as interpreted language.
I understood that probably there's something missing in my background. I have always known the difference between an interpreted language that doesn't need a compiler and a compiled language (who require to be compiled before the execution of programs), but what characterize a scripting language ?
Is Ruby definable as a scripting language ?
Thank you and forgive me for the black out
Things aren't just black and white. At the very least, they're also big and small, loud and quiet, blue and orange, grey and gray, long and short, right and wrong, etc.
Interpreted/compiled is just one way to categorize languages, and it's completely independent from (among countless other things) whether you call the same language a "scripting language" or not. To top it off, it's also a broken classification:
Interpreted/compiled depends on the language implementation, not on the language (this is not just theory, there are indeed quite a few languages for which both interpreters and compilers exist)
There are language implementations (lots of them, including most Ruby implementations) that are compilers, but "only" compile to bytecode and interpret that bytecode.
There are also implementations that switch between interpreting and compiling to native code (JIT compilers).
You see, reality is a complex beast ;) Ruby is, as mentioned above, frequently compiled. The output of that compilation is then interpreted, at least in some cases - there are also implementations that JIT-compile (Rubinius, and IIRC JRuby compiles to Java bytecode after a while). The reference implementation has been a compiler for a long time, and IIRC still is. So is Ruby interpreted or compiled? Neither term is meaningful unless you define it ;)
But back to the question: "Scripting language" isn't a property of the language either, it depends on how the language is used - namely, whether the language is used for scripting tasks. If you're looking for a definition, the Wikipedia page on "Scripting language" may help (just don't let them confuse you with the notes on implementation details such as that scripts are usually interpreted). There are indeed a few programs that use Ruby for scripting tasks, and there are doubtless numerous free-standing Ruby programs that would likely qualify as scripts (web scraping, system administration, etc).
So yes, I guess one can call Ruby a scripting language. Of course that doesn't mean a ruby on rails web app is just a script.
Yes.
Detailed response:
A scripting language is typically used to control applications that are often not written in this language. For example, shell scripts etc. can call arbitrary console applications.
Ruby is a general purpose dynamic language that is frequently used for scripting.
You can make arbitrary system calls using backtick notation like below.
`<system command>`
There are also many excellent Ruby gems such as Watir and RAutomation for automating web and native GUIs.
For definition of scripting language, see here.
The term 'scripting language' is very broad, and it can include both interpreted and compiled languages. Ruby in particular, might compiled or interpreted depending on what particular implementation we're using - for instance, JRuby gets compiled to bytecode, whereas CRuby (Ruby's reference implementation) is interpreted

Convert Ruby to low level languages?

I have all kind of scripting with Ruby:
rails (symfony)
ruby (php, bash)
rb-appscript (applescript)
Is it possible to replace low level languages with Ruby too?
I write in Ruby and it converts it to java, c++ or c.
Cause People say that when it comes to more performance critical tasks in Ruby, you could extend it with C. But the word extend means that you write C files that you just call in your Ruby code. I wonder, could I instead use Ruby and convert it to C source code which will be compiled to machine code. Then I could "extend" it with C but in Ruby code.
That is what this post is about. Write everything in Ruby but get the performance of C (or Java).
The second advantage is that you don't have to learn other languages.
Just like HipHop for PHP.
Are there implementations for this?
Such a compiler would be an enormous piece of work. Even if it works, it still has to
include the ruby runtime
include the standard library (which wasn't built for performance but for usability)
allow for metaprogramming
do dynamic dispatch
etc.
All of these inflict tremendous runtime penalties, because a C compiler can neither understand nor optimize such abstractions. Ruby and other dynamic languages are not only slower because they are interpreted (or compiled to bytecode which is then interpreted), but also because they are dynamic.
Example
In C++, a method call can be inlined in most cases, because the compiler knows the exact type of this. If a subtype is passed, the method still can't change unless it is virtual, in which case a still very efficient lookup table is used.
In Ruby, classes and methods can change in any way at any time, thus a (relatively expensive) lookup is required every time.
Languages like Ruby, Python or Perl have many features that simply are expensive, and most if not all relevant programs rely heavily on these features (of course, they are extremely useful!), so they cannot be removed or inlined.
Simply put: Dynamic languages are very hard to optimize, simply doing what an interpreter would do and compiling that to machine code doesn't cut it. It's possible to get incredible speed out of dynamic languages, as V8 proves, but you have to throw huge piles of money and offices full of clever programmers at it.
There is https://github.com/seattlerb/ruby_to_c Ruby To C compiler. It actually only takes in a subset of Ruby though. I believe the main missing part is the Meta Programming features
In a recent interview (November 16th, 2012) Yukihiro "Matz" Matsumoto (creator of Ruby) talked about compiling Ruby to C
(...) In University of Tokyo a research student is working on an academic research project that compiles Ruby code to C code before compiling the binary code. The process involves techniques such as type inference, and in optimal scenarios the speed could reach up to 90% of typical hand-written C code. So far there is only a paper published, no open source code yet, but I’m hoping next year everything will be revealed... (from interview)
Just one student is not much, but it might be an interesting project. Probably a long way to go to full support of Ruby.
"Low level" is highly subjective. Many people draw the line differently, so for the sake of this argument, I'm just going to assume you mean compiling Ruby down to an intermediate form which can then be turned into machine code for your particular platform. I.e., compiling ruby to C or LLVM IR, or something of that nature.
The short answer is yes this is possible.
The longer answer goes something like this:
Several languages (Objective-C most notably) exist as a thin layer over other languages. ObjC syntax is really just a loose wrapper around the objc_*() libobjc runtime calls, for all practical purposes.
Knowing this, then what does the compiler do? Well, it basically works as any C compiler would, but also takes the objc-specific stuff, and generates the appropriate C function calls to interact with the objc runtime.
A ruby compiler could be implemented in similar terms.
It should also be noted however, that just by converting one language to a lower level form does not mean that language is instantly going to perform better, though it does not mean it will perform worse either. You really have to ask yourself why you're wanting to do it, and if that is a good reason.
There is also JRuby, if you still consider Java a low level language. Actually, the language itself has little to do here: it is possible to compile to JVM bytecode, which is independent of the language.
Performance doesn't come solely from "low level" compiled languages. Cross-compiling your Ruby program to convoluted, automatically generated C code isn't going to help either. This will likely just confuse things, include long compile times, etc. And there are much better ways.
But you first say "low level languages" and then mention Java. Java is not a low-level language. It's just one step below Ruby in terms of high- or low-level languages. But if you look at how Java works, the JVM, bytecode and just-in-time compilation, you can see how high level languages can be fast(er). Ruby is currently doing something similar. MRI 1.8 was an interpreted language, and had some performance problems. 1.9 is much faster, it's using a bytecode interpreter. I'm not sure if it'll ever happen on MRI, but Ruby is just one step away from JIT on MRI.
I'm not sure about the technologies behind jRuby and IronRuby, but they may already be doing this. However, both have their own advantages and disadvantages. I tend to stick with MRI, it's fast enough and it works just fine.
It is probably feasible to design a compiler that converts Ruby source code to C++. Ruby programs can be compiled to Python using the unholy compiler, so they could be compiled from Python to C++ using the Nuitka compiler.
The unholy compiler was developed more than a decade ago, but it might still work with current versions of Python.
Ruby2Cextension is another compiler that translates a subset of Ruby to C++, though it hasn't been updated since 2008.

Is Ruby really an interpreted language if all of its implementations are compiled into bytecode?

In the chosen answer for this question about Blue Ruby, Chuck says:
All of the current Ruby
implementations are compiled to
bytecode. Contrary to SAP's claims, as
of Ruby 1.9, MRI itself includes a
bytecode compiler, though the ability
to save the compiled bytecode to disk
disappeared somewhere in the process
of merging the YARV virtual machine.
JRuby is compiled into Java .class
files. I don't have a lot of details
on MagLev, but it seems safe to say it
will take that road as well.
I'm confused about this compilation/interpretation issue with respect to Ruby.
I learned that Ruby is an interpreted language and that's why when I save changes to my Ruby files I don't need to re-build the project.
But if all of the Ruby implementations now are compiled, is it still fair to say that Ruby is an interpreted language? Or am I misunderstanding something?
Nearly every language is "compiled" nowadays, if you count bytecode as being compiled. Even Emacs Lisp is compiled. Ruby was a special case because until recently, it wasn't compiled into bytecode.
I think you're right to question the utility of characterizing languages as "compiled" vs. "interpreted." One useful distinction, though, is whether the language creates machine code (e.g. x86 assembler) directly from user code. C, C++, many Lisps, and Java with JIT enabled do, but Ruby, Python, and Perl do not.
People who don't know better will call any language that has a separate manual compilation step "compiled" and ones that don't "interpreted."
Yes, Ruby's still an interpreted language, or more precisely, Matz's Ruby Interpreter (MRI), which is what people usually talk about when they talk about Ruby, is still an interpreter. The compilation step is simply there to reduce the code to something that's faster to execute than interpreting and reinterpreting the same code time after time.
A subtle question indeed...
It used to be that "interpreted" languages were parsed and transformed into an intermediate form which was faster to execute, but the "machine" executing them was a pretty language specific program. "Compiled" languages were translated instead into the machine code instructions supported by the computer on which it was run. An early distinction was very basic--static vs. dynamic scope. In a statically typed language, a variable reference could pretty much be resolved to a memory address in a few machine instructions--you knew exactly where in the calling frame the variable referred. In dynamically typed languages you had to search (up an A-list or up a calling frame) for the reference. With the advent of object oriented programming, the non-immediate nature of a reference expanded to many more concepts--classes(types), methods(functions),even syntactical interpretation (embedded DSLs like regex).
The distinction, in fact, going back to maybe the late 70's was not so much between compiled and interpreted languages, but whether they were run in a compiled or interpreted environment.
For example, Pascal (the first high-level language I studied) ran at UC Berkeley first on Bill Joy's pxp interpreter, and later on the compiler he wrote pcc. Same language, available in both compiled and interpreted environments.
Some languages are more dynamic than others, the meaning of something--a type, a method, a variable--is dependent on the run-time environment. This means that compiled or not there is substantial run-time mechanism associated with executing a program. Forth, Smalltalk, NeWs, Lisp, all were examples of this. Initially, these languages required so much mechanism to execute (versus a C or a Fortran) that they were a natural for interpretation.
Even before Java, there were attempts to speed up execution of complex, dynamic languages with tricks, techniques which became threaded compilation, just-in-time compilation, and so on.
I think it was Java, though, which was the first wide-spread language that really muddied the compiler/interpreter gap, ironically not so that it would run faster (though, that too) but so that it would run everywhere. By defining their own machine language and "machine" the java bytecode and VM, Java attempted to become a language compiled into something close to any basic machine, but not actually any real machine.
Modern languages marry all these innovations. Some have the dynamic, open-ended, you-don't-know-what-you-get-until-runtime nature of traditional "interpreted languages (ruby, lisp, smalltalk, python, perl(!)), some try to have the rigor of specification allowing deep type-based static error detection of traditional compiled languages (java, scala). All compile to actual machine-independent representations (JVM) to get write once-run anywhere semantics.
So, compiled vs. interpreted? Best of both, I'd say. All the code's around in source (with documentation), change anything and the effect is immediate, simple operations run almost as fast as the hardware can do them, complex ones are supported and fast enough, hardware and memory models are consistent across platforms.
The bigger polemic in languages today is probably whether they are statically or dynamically typed, which is to say not how fast will they run, but will the errors be found by the compiler beforehand (at the cost of the programmer having to specify pretty complex typing information) or will the errors come up in testing and production.
You can run Ruby programs interactively using irb, the Interactive Ruby Shell. While it may generate intermediate bytecode, it's certainly not a "compiler" in the traditional sense.
A compiled language is generally compiled into machine code, as opposed to just byte code. Some byte code generators can actually further compile the byte code into machine code though.
Byte code itself is just an intermediate step between the literal code written by the user and the virtual machine, it still needs to be interpreted by the virtual machine though (as it's done with Java in a JVM and PHP with an opcode cache).
This is possibly a little off topic but...
Iron Ruby is a .net based implementation of ruby and therefore is usually compiled to byte code and then JIT compiled to machine language at runtime (i.e. not interpreted). Also (at least with other .net languages, so I assume with ruby) ngen can be used to generate a compiled native binary ahead of time, so that's effectively a machine code compiled version of ruby code.
As for the information I got from RubyConf 2011 in Shanghai, Matz is developing a 'MRuby'(stands for Matz's Ruby) to targeting running on embedded devices. And Matz said the the MRuby, will provide the ability to compile the ruby code into machine code to boost the speed and decrease the usage of the (limited) resources on the embedded devices. So, there're various kind of Ruby implementation and definitely not all of them are just interpreted during the runtime.

Interpreted languages - leveraging the compiled language behind the interpreter

If there are any language designers out there (or people simply in the know), I'm curious about the methodology behind creating standard libraries for interpreted languages. Specifically, what seems to be the best approach? Defining standard functions/methods in the interpreted language, or performing the processing of those calls in the compiled language in which the interpreter is written?
What got me to thinking about this was the SO question about a stripslashes()-like function in Python. My first thought was "why not define your own and just call it when you need it", but it raised the question: is it preferable, for such a function, to let the interpreted language handle that overhead, or would it be better to write an extension and leverage the compiled language behind the interpreter?
The line between "interpreted" and "compiled" languages is really fuzzy these days. For example, the first thing Python does when it sees source code is compile it into a bytecode representation, essentially the same as what Java does when compiling class files. This is what *.pyc files contain. Then, the python runtime executes the bytecode without referring to the original source. Traditionally, a purely interpreted language would refer to the source code continuously when executing the program.
When building a language, it is a good approach to build a solid foundation on which you can implement the higher level functions. If you've got a solid, fast string handling system, then the language designer can (and should) implement something like stripslashes() outside the base runtime. This is done for at least a few reasons:
The language designer can show that the language is flexible enough to handle that kind of task.
The language designer actually writes real code in the language, which has tests and therefore shows that the foundation is solid.
Other people can more easily read, borrow, and even change the higher level function without having to be able to build or even understand the language core.
Just because a language like Python compiles to bytecode and executes that doesn't mean it is slow. There's no reason why somebody couldn't write a Just-In-Time (JIT) compiler for Python, along the lines of what Java and .NET already do, to further increase the performance. In fact, IronPython compiles Python directly to .NET bytecode, which is then run using the .NET system including the JIT.
To answer your question directly, the only time a language designer would implement a function in the language behind the runtime (eg. C in the case of Python) would be to maximise the performance of that function. This is why modules such as the regular expression parser are written in C rather than native Python. On the other hand, a module like getopt.py is implemented in pure Python because it can all be done there and there's no benefit to using the corresponding C library.
There's also an increasing trend of reimplementing languages that are traditionally considered "interpreted" onto a platform like the JVM or CLR -- and then allowing easy access to "native" code for interoperability. So from Jython and JRuby, you can easily access Java code, and from IronPython and IronRuby, you can easily access .NET code.
In cases like these, the ability to "leverage the compiled language behind the interpreter" could be described as the primary motivator for the new implementation.
See the 'Papers' section at www.lua.org.
Especially The Implementation of Lua 5.0
Lua defines all standard functions in the underlying (ANSI C) code. I believe this is mostly for performance reasons. Recently, i.e. the 'string.*' functions got an alternative implementation in pure Lua, which may prove vital for subprojects where Lua is run on top of .NET or Java runtime (where C code cannot be used).
As long as you are using a portable API for the compiled code base like the ANSI C standard library or STL in C++, then taking advantage of those functions would keep you from reinventing the wheel and likely provide a smaller, faster interpreter. Lua takes this approach and it is definitely small and fast as compared to many others.

Resources