I'm doing some research into how different Ruby interpreters do method execution (e.g. when you call a method in ruby, what steps does the interpreter take to find and execute it, and which structures are involved in this). I am trying to compare the performance of the different approaches being used.
The interpreters I'm looking into are: MRI, YARV, JRuby, Rubinius, Ruby EE
I am looking for any general pointers about which files in the interpreter source I should check out, and any other general information about this topic that you guys can provide.
Thanks!
This article is a really good description of method dispatching in JRuby. It is nicely complemented by the JRuby Wiki page describing its internals.
Related
I found one webpage that describes how Ruby's execution stack looks like. It says that Ruby has seven stacks:
Is this article true?
This article focuses on the way ruby works in versions from 1.7 to 1.8. With introduction of YARV things have changed a lot. To better understand how Ruby works internally I'd recommend Ruby Under a Microscope. There are chapters on how Ruby execution stack works
No, this does not describe how Ruby works. This describes how MRI works. MRI is only one of many implementations of Ruby. The Ruby Programming Language does not specify any particular implementation strategy for memory management. It is perfectly valid to implement Ruby without any stack at all.
There are many implementations of Ruby. The most widely-used one currently is YARV, but there's also MRuby, JRuby, MagLev, Ruby+OMR, TruffleRuby, Rubinius (those last three are the most interesting IMO). MRI isn't even maintained any more. In the past, there were also IronRuby, IronRuby (yes, actually, there were two different implementations with that name), Ruby.NET, tinyrb, XRuby, SmallRuby, BlueRuby, Cardinal, and many others.
AFAIK, none of those works in the way that is described here, only MRI does.
So, I have a few questions that I have to ask, I did browse the internet, but there weren't too many reliable answers. Mostly blog posts that would cancel each-other out because they both praised different things and had benchmarks to "prove their viewpoint" (I have never seen so many contradicting benchmarks in my life).
Anyway, my questions are:
Is Rubinius really faster? I was pretty impressed by this apparently honest pro-Rubinius presentation. Another thing that confuses me a little is that a lot of Rubinius is written in Ruby itself, yet somehow it is faster than C-Ruby? It must be a pretty damn good implementation of the language, then!
Does EventMachine work with Ruinius? As far as I know, EventMachine partially relies on Fibers (correct me if I'm wrong) which weren't implemented until 1.9. I know Rubinius will eventually support 1.9, too; I don't mind waiting a little.
Do C extensions work in Rubinius? I have written a C extension which "serializes" binary messages received from a TCP stream into Ruby Objects and vice-versa (I suppose the details are not important, but if it helps answer this question I will update the post). This can be a lot of messages! I managed to write the same code in Ruby (although, it made little sense after a month), but it proved to be a real bottle-neck in the application. So, I had to use C as a "solution" to my problem.
EDIT: I just remembered, I use C for another task, it is a hit-test method for Arrays. Basically it just checks if a "point" is inside an a polygon, it was impossibly slow in CRuby.
If the previous answer was a "No," is there then an alternative for C extensions in Rubinus? I gather the VM is written in C++, so that then.
A few "bonus" questions:
Will C-Ruby (2.0+, YARV) ever get rid of GIL? Or at least modify it so CRuby supports true parallelism?
What is exactly mruby? I see matz is working on it, and as far as the description goes it seems pretty awesome. How different is it from CRuby (performance-wise)?
I apologize for this text-storm I unleashed upon you! ♥
Is Rubinius really faster?
In most benchmarks, yes.
But benchmarks are... dumb. Apps are what we really care about. So the best thing to do is benchmark your app & see how well it performs. The 2 areas where Rubinius will real shine over MRI are parallelism & memory usage. Rubinius has no GIL, so you can utilize all available threads. It also has a much more sophisticated GC, so in general it could perform better with respect to GC.
I did those benchmarks back in Oct '11 for my talk on MagLev at RubyConf
Does EventMachine work with Rubinius?
Yes, and if there are parts that don't work, then the issue should be reported. With that said, currently the EM tests don't pass on any Ruby implementation.
Do C extensions work in Rubinius?
Yes. I maintain the compatibility issue for C-exts, so if there is one you have that is tested on Travis, Rubinius would like to see it pass against rbx. Rubinius has historically had good support for the C-api and C-exts, though it would be nice if someday Rubinius could run Ruby so fast one would not need C-exts or the C-api.
Will C-Ruby (2.0+, YARV) ever get rid of GIL? Or at least modify it so CRuby supports true parallelism?
No, most likely not. Jesse Storimer has a succinct writeup of Matz's opinion (or lack thereof) on threads from RubyConf 2012. Koichi Sasada tried to remove the GIL once and MRI perf just tanked. Evan Phoenix also tried once, before he created Rubinius, but didn't have good results.
What is exactly mruby?
An embeddable Ruby interpreter, akin to Lua. Matt Aimonetti has a few articles that might shed some light for you.
I am not too much into Ruby but I might be able to answer the first question.
Is Rubinius really faster?
I've seen different Benchmarks telling different things. However, the fact that Rubinius is partially written in Ruby does not have to mean that it is slower. I thought the same about PyPy which is Python in Python. After some research and the right classes in college I knew why.
As far as I know both are written in a subset of their language which should be much simpler. An (e.g. C) interpreter can be be optimized much easier for such a subset than the whole language.
Writing the Ruby/Python interpreter in its own language allows much more flexibility and quicker prototyping of new interpretation algorithms. The whole point of the existence of Ruby and Python are among others that algorithms can be implemented much quicker than in e.g. C or even assembler. A faster algorithm outweighs the little overhead of an interpreter a lot of the time.
Btw. writing an interpreter for a language in the same language is also a common academic practice to show how mighty the language is. In one class we've written Lisp in Lisp in Lisp.
I was just listening to a lecture at coursera (https://www.coursera.org/saas/) and the professor was saying that everything in Ruby is an object and that every method call is calling a send method on an object, passing some params to it. This includes numbers, arrays and other basic classes.
I went on Google and looked for efficiency benchmarks and I found the following: http://benchmarksgame.alioth.debian.org/u32/which-programs-are-fastest.html
While it's not shocking that a compiled language is faster than an interpreted one, the performance difference between (Ruby, Python) and Java for instance is shocking.
Even if there's a way to compile ruby code (I have not researched this topic), I think the efficiency problem would still be there due to the core "problem" in the language:
Basic operations are being too heavy: 1+1 takes many more CPU cycles to complete.
I love Ruby. I love the high level aspect of meta programming and I think this is where the future should be heading, and I agree, sometimes we need to compromise certain thing in order to be more effective: I don't see myself optimizing my code in assembly in order to save a few extra milliseconds. However, when we do 1+1 in C, it's not exponentially increasing the amount of time a basic operation is taking!
My question is how are you guys dealing with operation intensive programs? We have a Ruby on Rails project we have been developing for about a year now and we're at a point we'll start doing some machine learning with geolocation traversal and prioritization.
I hope you understand my concerns and offer reasonable suggestions :-)
This is not such a bad question as it seems looking at the comments. The only problem is the wrongly used word exponential, but the described problem is real.
I have similar Ruby usage patterns as you describe - I'm doing a lot of natural language processing in Ruby, which involves machine learning as well.
I use the following techniques to overcome Ruby performance issues:
Use C libraries with Ruby interface whenever applicable. E.g. I would not use Ruby implementation of SVM or decision trees, since there are much faster implementations in C available.
Write my own Ruby wrappers for C implementation if they are not available. Usually this is not much problematic - I use RubyInline gem extensively to glue Ruby with C code.
Patch Ruby memory management or manually control its garbage collector.
Consider JRuby as your Ruby platform - you would get easy access to fast Java libraries for machine learning (Weka) and the like and general performance of your application would be preserved or even better.
Was curious, but are any NoSQL DBMS written in Ruby?
And if not, would it be unwise to create one in Ruby?
Was curious, but are any NoSQL DBMS written in Ruby?
In 2007, Anthony Eden played around with RDDB, a CouchDB-inspired document-oriented database. He still keeps a copy of the code in his GitHub account.
I vaguely remember that at or around the same time, someone else was also playing around with a database in Ruby. I think it was either inspired by or a reaction to RDDB.
Last but not least, there is the PStore library in the stdlib, which – depending on your definition – may or may not count as a database.
And if not, would it be unwise to create one in Ruby?
The biggest problem I see in Ruby are its concurrency primitives. Threads and locks are so 1960s. If you want to support multiple concurrent users, then you obviously need concurrency, although if you want to build an embedded in-process database, then this is much less of a concern.
Other than that, there are some not-so-stellar implementations of Ruby, but that is not a limitation of Ruby but of those particular implementations, and it applies to pretty much every other programming language as well. Rubinius (especially the current development trunk, which adds Ruby 1.9 compatibility and removes the Global Interpreter Lock) and JRuby would both be fine choices.
As an added bonus, Rubinius comes with a built-in actors library for concurrency and JRuby gives you access to e.g. Clojure's concurrency libraries or the Akka actors library.
Performance isn't really much of a concern, I think. Rubinius's Hash class, which is written in 100% pure Ruby, performs comparably to YARV's Hash class, which is written in 100% hand-optimized C. This shows you that Ruby code, at least when it is carefully written, can be just as fast as C, especially since databases tend to be long-running and thus Rubinius's or JRuby's (and in the latter case specifically also the JVM's) dynamic optimizers (which C compilers typically do not have) can really get to work.
Ruby is just too slow for any type of DBMS
c/c++/erlang are generally the best choice.
You generally shouldn't care in what programming language was a DBMS implemented as long it has all the features and is available for use from your application programming language of choice.
So, the real question here is do you need one written in Ruby or available for use in Ruby.
In first case, I doubt you'll find a DBMS natively written in Ruby (any correction of this statement will be appreciated).
In second case, you should be able to find Ruby bindings/wrappers for any decent DBMS relational or not.
I have all kind of scripting with Ruby:
rails (symfony)
ruby (php, bash)
rb-appscript (applescript)
Is it possible to replace low level languages with Ruby too?
I write in Ruby and it converts it to java, c++ or c.
Cause People say that when it comes to more performance critical tasks in Ruby, you could extend it with C. But the word extend means that you write C files that you just call in your Ruby code. I wonder, could I instead use Ruby and convert it to C source code which will be compiled to machine code. Then I could "extend" it with C but in Ruby code.
That is what this post is about. Write everything in Ruby but get the performance of C (or Java).
The second advantage is that you don't have to learn other languages.
Just like HipHop for PHP.
Are there implementations for this?
Such a compiler would be an enormous piece of work. Even if it works, it still has to
include the ruby runtime
include the standard library (which wasn't built for performance but for usability)
allow for metaprogramming
do dynamic dispatch
etc.
All of these inflict tremendous runtime penalties, because a C compiler can neither understand nor optimize such abstractions. Ruby and other dynamic languages are not only slower because they are interpreted (or compiled to bytecode which is then interpreted), but also because they are dynamic.
Example
In C++, a method call can be inlined in most cases, because the compiler knows the exact type of this. If a subtype is passed, the method still can't change unless it is virtual, in which case a still very efficient lookup table is used.
In Ruby, classes and methods can change in any way at any time, thus a (relatively expensive) lookup is required every time.
Languages like Ruby, Python or Perl have many features that simply are expensive, and most if not all relevant programs rely heavily on these features (of course, they are extremely useful!), so they cannot be removed or inlined.
Simply put: Dynamic languages are very hard to optimize, simply doing what an interpreter would do and compiling that to machine code doesn't cut it. It's possible to get incredible speed out of dynamic languages, as V8 proves, but you have to throw huge piles of money and offices full of clever programmers at it.
There is https://github.com/seattlerb/ruby_to_c Ruby To C compiler. It actually only takes in a subset of Ruby though. I believe the main missing part is the Meta Programming features
In a recent interview (November 16th, 2012) Yukihiro "Matz" Matsumoto (creator of Ruby) talked about compiling Ruby to C
(...) In University of Tokyo a research student is working on an academic research project that compiles Ruby code to C code before compiling the binary code. The process involves techniques such as type inference, and in optimal scenarios the speed could reach up to 90% of typical hand-written C code. So far there is only a paper published, no open source code yet, but I’m hoping next year everything will be revealed... (from interview)
Just one student is not much, but it might be an interesting project. Probably a long way to go to full support of Ruby.
"Low level" is highly subjective. Many people draw the line differently, so for the sake of this argument, I'm just going to assume you mean compiling Ruby down to an intermediate form which can then be turned into machine code for your particular platform. I.e., compiling ruby to C or LLVM IR, or something of that nature.
The short answer is yes this is possible.
The longer answer goes something like this:
Several languages (Objective-C most notably) exist as a thin layer over other languages. ObjC syntax is really just a loose wrapper around the objc_*() libobjc runtime calls, for all practical purposes.
Knowing this, then what does the compiler do? Well, it basically works as any C compiler would, but also takes the objc-specific stuff, and generates the appropriate C function calls to interact with the objc runtime.
A ruby compiler could be implemented in similar terms.
It should also be noted however, that just by converting one language to a lower level form does not mean that language is instantly going to perform better, though it does not mean it will perform worse either. You really have to ask yourself why you're wanting to do it, and if that is a good reason.
There is also JRuby, if you still consider Java a low level language. Actually, the language itself has little to do here: it is possible to compile to JVM bytecode, which is independent of the language.
Performance doesn't come solely from "low level" compiled languages. Cross-compiling your Ruby program to convoluted, automatically generated C code isn't going to help either. This will likely just confuse things, include long compile times, etc. And there are much better ways.
But you first say "low level languages" and then mention Java. Java is not a low-level language. It's just one step below Ruby in terms of high- or low-level languages. But if you look at how Java works, the JVM, bytecode and just-in-time compilation, you can see how high level languages can be fast(er). Ruby is currently doing something similar. MRI 1.8 was an interpreted language, and had some performance problems. 1.9 is much faster, it's using a bytecode interpreter. I'm not sure if it'll ever happen on MRI, but Ruby is just one step away from JIT on MRI.
I'm not sure about the technologies behind jRuby and IronRuby, but they may already be doing this. However, both have their own advantages and disadvantages. I tend to stick with MRI, it's fast enough and it works just fine.
It is probably feasible to design a compiler that converts Ruby source code to C++. Ruby programs can be compiled to Python using the unholy compiler, so they could be compiled from Python to C++ using the Nuitka compiler.
The unholy compiler was developed more than a decade ago, but it might still work with current versions of Python.
Ruby2Cextension is another compiler that translates a subset of Ruby to C++, though it hasn't been updated since 2008.