Ruby Performance - ruby

I'm pretty keen to develop my first Ruby app, as my company has finally blessed its use internally.
In everything I've read about Ruby up to v1.8, there is never anything positive said about performance, but I've found nothing about version 1.9. The last figures I saw about 1.8 had it drastically slower than just about everything out there, so I'm hoping this was addressed in 1.9.
Has performance drastically improved? Are there some concrete things that can be done with Ruby apps (or things to avoid) to keep performance at the best possible level?

There are some benchmarks of 1.8 vs 1.9 at http://www.rubychan.de/share/yarv_speedups.html. Overall, it looks like 1.9 is a lot faster in most cases.

If scalability and performance are really important to you you can also check out Ruby Enterprise Edition. It's a custom implementation of the Ruby interpreter that's supposed to be much better about memory allocation and garbage collection. I haven't seen any objective metrics comparing it directly to JRuby, but all of the anectdotal evidence I've heard has been very very good.
This is from the same company that created Passenger (aka mod_rails) which you should definitely check out as a rails deployment solution if you decide not to go the JRuby route.

Matz ruby 1.8.6 is much slower when it comes to performance and 1.9 and JRuby do alot to speed it up. But the performance isn't such that it will prevent you from doing anything you want in a web application. There are many large Ruby on Rails sites that do just fine with the "slower interpreted" language. When you get to scaling out web apps there are many more pressing performance issues than the speed of the language you are writing it in.

I've actually heard really good things performance with about the JVM implementation, JRuby. Completly anecdotal, but perhaps worth looking into.
See also http://en.wikipedia.org/wiki/JRuby#Performance

Check out "Writing Efficient Ruby Code" from Addison Wesley Professional:
http://safari.oreilly.com/9780321540034
I found some very helpful and interesting insights in this short work. And if you sign up for the free 10-day trial you could read it for free. (It's 50 pages and the trial gets you (AFAIR) 100 page views.)
https://ssl.safaribooksonline.com/promo

I am not a Ruby programmer but I have been pretty tightly involved in a JRuby deployment lately and can thus draw some conclusions. Do not expect to much from JRuby's performance. In interpreted mode, it seems to be somewhere in the range of C Ruby. JIT mode might be faster, but only in theory. In practice, we tried JIT mode on Glassfish for a decently-sized Rails application on a medium-sized server (dual core, 8GB RAM). And the truth is, the JITting took so freakingly much time, that the server needed 20-30 minutes before it answered the first request. Memory usage was astronomic, profiling did not work because the whole system grinded to halt with a profiler attached.
Bottom line: JRuby has its merits (multithreading, solid platform, easy Java integration), but given that interpreted mode is the only mode that worked for us in practice, it may be expected to be no better performance-wise than C Ruby.

I'd second the recommendation of the use of Passenger - it makes deployment and management of Rails applications trivial

Related

Are there still benefits to running JRuby vs. the latest MRI with Puma?

I'm considering updating our ruby interpreter to JRuby, it's been quite a headache because we've had to remove any 2.x specific syntax from our app and resort to ruby 1.9.3 compatibility. Which isn't the end of the world.
When it came time to run the app, I found out that we cannot use Puma in clustered mode. The question is, given all the fixes and changes to MRI in the past few years, are the benefits of having "real threads" still valid?
update
To make this more objective, the question is, "Does the latest version of MRI negate the need to adopt JRuby to achieve the same benefits that native threads give you?"
Does the latest version of MRI negate the need to adopt JRuby to
achieve the same benefits that native threads give you?
The answer is no. It does not negate the need, and it depends on your application as mentioned in other answers.
Also, JRuby does not allow you to run in cluster mode, but that is not really a problem in regards to your question, because it is multithreaded and parallel.
Simply run in Single mode with as many threads as you need. It should be perfectly fine, if not even more lightweight.
Let me give you some references that give more insight and allow you to dig further.
This answer discusses experiments with MRI and JRuby testing concurrent requests using Puma (up to 40 threads). It is quite comprehensive.
The experiments are available on GitHub, MRI and JRuby.
The caveat is that it only tests concurrent requests, but does not have a race condition in the controller. However, I think you could implement the test from this article Removing config.threadsafe! without too much effort.
The difference between JRuby and MRI is that JRuby can execute code in parallel. MRI is limited by the GIL and only one thread at a time can be executed. You can read more information about the GIL in this article Nobody understands the GIL.
The results are quite surprising. MRI is faster than JRuby. Feel free to improve and add race conditions.
Note that both are multi-threaded and not thread safe. The difference really is that MRI cannot execute code in parallel and JRuby can.
You might be tempted to say why I answer "No" if the experiment shows that MRI is faster.
I think we need more experiments and in particular real world applications.
If you believe that JRuby should be faster because it can execute code in parallel then reasons could be:
The experiments should be executed in a highly parallel environment
to be able leverage the potential of JRuby.
It could be the web server itself. Maybe Puma does not leverage the full potential of JRuby. MRI has a GIL, so why is it faster than JRuby in handling requests?
Other factors might be relevant that are more in depth and we did not discover yet...
Really depends on your scenario with the web-server (which you should have the very best understanding) ... case you feel your production is just serving about fine under MRI than you probably do not have that much concurrency around. puma's README pretty much explains what you get under MRI compared to Rubinius/JRuby :
On MRI, there is a Global Interpreter Lock (GIL) that ensures only one thread can be run at a time. But if you're doing a lot of blocking IO (such as HTTP calls to external APIs like Twitter), Puma still improves MRI's throughput by allowing blocking IO to be run concurrently (EventMachine-based servers such as Thin turn off this ability, requiring you to use special libraries). Your mileage may vary. In order to get the best throughput, it is highly recommended that you use a Ruby implementation with real threads like Rubinius or JRuby
... so in one sentence: ** you can have multiple threads under MRI, but you have no parallelism **
IMHO It depends on what your application does.
I've tested both MRI/YARV and JRuby on my Rails application.
Since most of what the app does is route HTTP requests, fetch from DB, apply simple business logic and write to the DB, parallelism isn't much of an issue. Puma on MRI does handle multi-threading for blocking IO operations (DB, API). Tasks that fall off this scope (image processing, crunching report data, calls to external APIs, etc.) should probably be handled by background jobs anyway (I recommend https://github.com/brandonhilkert/sucker_punch).
Depending on your deployment needs memory consumption might be more of an issue and JRuby is very hungry for memory. Almost 2x memory in my case.
If you're deploying your application on Heroku you might find that you get more bang for the buck by being able to run 2 instances concurrently on 1 dyno.

Slow loading of rails environment

Similar issue to Slow loading rails environment
Loading the rails environment takes quite a bit of time and I'm not sure exactly why.
time ruby -r./config/environment.rb -e ""
real 0m18.590s
user 0m17.200s
sys 0m1.320s
Are there any tools/ways that can help me find why it is spending so much time to load the environment?
The project is fairly large, so I am assuming that it is coming from all the gem dependencies, but I would think that it would be able to be improved somehow.
If you are using Ruby 1.9 then see this blog post it may be the issue you are experiencing. If it is it has to do with the amount of requires in your project and the way that method is implemented in 1.9. There is a patch available to improve this performance.
I tried patching my ruby with the rhnh patch cited above as well as the rvm-patchsets (on independent ruby installs of course) but didn't pick up a lot of performance. But some do it seems so maybe it's a ruby version or lower level issue.
My current workaround, at least in my dev environment, is to use rails-sh to preload the environment one time and then reuse it in your rails/rake commands. It's a big performance pickup. Wrote more details on it in this answer.

It's a good idea use ruby for socket programming?

My language of choice is Ruby, but I know because of twitter that Ruby can't handle a lot of requests. It is a good idea using it for socket development? or Should I use a functional language like erlang or haskell or scala like twitter developers did?
The company I work for uses Ruby for our web site. We have so far handled a little over 34,000,000,000 hits. We have no problem handling around 10,000,000 hits per day. Peak hits have exceeded 40,000,000 hits per day.
Scalability depends on a lot of factors. Our databases do a disproportionately high percentage of writes compared to reads, for example. While most websites do about 90% reads to 10% writes, we are closer to 50%-50%. My point is that scalability is affected by a lot of factors. If you are database-limited, as is often the case for web apps, it won't matter what language you use, you'll be waiting on your database.
There's a lot to think about if you are looking at handling large scales. Sharding databases, memcached, etc. etc. etc. etc. The language you use for your application is just one aspect, and often, though not always, a small aspect of scalability.
Ruby may be a good option for you, but there's a lot to like in other languages. Erlang tries hard to make it easier to recover from errors, for example.
I'm not sure that any "lessons" that the Twitter team has learned about Ruby (more specifically, Rails) and scaling would apply to your project. They're looking at WAY more traffic than most people can reasonably expect to see.
As far as sockets and Ruby go, check out I like Unicorn because it's Unix. It's quite an interesting read about doing sockets in Ruby.
I'd like to provide a bit of context first. I'm pretty active with the Scala community, and I would choose Scala over Ruby for any project.
So, having said that, keep with Ruby unless you actually hit barrier. If Ruby is your language of choice, it might just be that you'll never be happy with the choices you mention, particularly the statically typed ones.
It might be good to learn a new language, to have something to fall back on if you need an alternative. In your case, I'd recommend Clojure or Erlang. Scala is a good statically typed, OO language with functional programming perks. It might be easier to learn than the others, but people who really like dynamic typing don't convert to static typing easy.
As for Haskell, it's one of the most awesome languages out there (and much more well support and popular than the equally awesome alternatives), and can open your mind like nothing else. It's also tough to master.
If ruby is your favorite language, yes it is a good idea. It is always better to use what you know and what you like
Whereas you may get better performance from a functional language such as Erlang the suitability of Ruby will really depend on what you are trying to achieve. For example how many requests are you going to be handling is probably the first question, if the performance benefits of using Erlang don't make much difference use something you are comfortable with, why learn a new language if you don't have to?
You at least have the option of staying in your favorite high level language if you use a fast, concurrent language like Haskell, Erlang or Scala. With Ruby, performance bottlenecks will mean switching to compiled C (or Haskell, or ...) for speed anyway.
Ruby has the advantage of good frontend frameworks.
I have also used Ruby for many projects though I've recently moved to Scala and like it quite a bit. One thing that I've heard good things about (but never tried myself) for network stuff in Ruby is EventMachine. It uses the Reactor Pattern just like twisted and it seems quite solid.
The key is to have a low level library in C/C++ that does the socket multiplexing for you. Socket multiplexing is what makes a TCP server process truly multi-user. such libraries in C (which is what you want) could be libevent/libev... and in c++ boost::asio. Python has twisted that does it behind the scenes.
If you get such a library and use it in ruby you should be able to implement most socket programs fairly well. This is especially true on UNIX oses which favour multi-process to multi-threading.
Having recently written (actually still doing so now), a project using sockets with Ruby and Java I would say no. The ruby socket implementation is poorly documented unless you plan on writing a basic blocking chat server. I found writing in C or Java simpler, Ruby wraps up native sockets and your kinda left wondering how the hell to use it now. I have previously written plenty of socket code on windows, Linux and other platforms in C, with less stress.
My Ruby code now is very small and works well, getting to that point was a real pain.

MS Velocity vs Memcached for Windows?

I've been paying some attention to Microsoft's fairly recent promoting of Velocity as a distributed caching solution that would compete with the likes of Memcached.
I've been looking for a 64bit version of Memcached for Windows for some time now with no luck, and since everything about the ASP.Net MVC project I'm working on is 64bit, it doesn't make sense to use anything but 64bit.
Now we're already hedging our bets with ASP.NET MVC in Beta (RTM soon hopefully), but StackOverflow doesn't seem to be doing too badly, so I have limited concerns there. But Velocity is still very much an unknown quantity and will still be Beta (or CTP) for ages - but it does have 64bit!
Does anyone have relevant experience or point of view to offer in this situation? Should we bide our time for Velocity - is it even anywhere near good enough to compete with a giant like Memcached, or should we invest time trying to get a 64bit version of Memcached going?
We have done recently a fair amount of comparing of Velocity and Memcached. In the nutshell, we found Velocity to be 3x - 5x slower than Memcached, and (even more crucially) it does not have currently support for a multi-get operation. So at the moment, I would recommend going with Memcached. Also, another lesson we have learned was that the slowest operation in distributed caching is serialization and deserialization (at least in ASP.NET). The in-process ASP.NET cache is order of magnitudes faster. So you have to choose caching strategies much more carefully.
If you don't mind paying for a license, you can use Scale Out State Server, which I talk about in my answer to a similar question here. They have both 32- and 64-bit versions.
EDIT: Despite the name of the product, it handles both Session State and distributed caching.
Memcached has some open source libraries if I'm not mistaken so if you want to go the 64bit route can you not just recompile?
I evaluated Velocity when it first arrived but came to the conclusion it was a bit undeveloped at that stage. Being able to run memcached on non-windows servers is also a bonus.

Why does Ruby seem to have fewer projects than other programming languages?

I've found Ruby to be very attractive; I like the fact that everything is an object and its syntax is very appealing.
I was hoping that it would gain a lot of popularity this year, but I don't see lot of activity in Ruby.
For instance if we take the number of tags added in SO there are only about 700 questions tagged as "ruby." This may be because:
Ruby is so easy, noone has any questions.
This site attracts more from the.Net community and Ruby developers ignore its existence.
There are not as many Ruby projects as there projects in other programming languages.
Other resources show Ruby is not as popular as other programming languages.
What reasons do you think are behind this?
Links:
TIOBE Programming Community Index for October 2008
StackOverflow tags
Ohloh
You're mistakenly attributing something to Ruby. RubyForge alone reports over 1,000 open-source projects, let alone all Ruby on Rails apps that exist, and the projects hosted on Github, Sourceforge, and elsewhere.
Unless you spend a lot of time on other sites (Reddit is a good example) you will be unaware of just how .NET/Oracle/SQL Server/etc.-centric Stack Overflow is. (I use a Greasemonkey plugin to hide a broad swathe of these Windows- and "Enterprise"-centric technologies, because they don't interest me.)
I actually had the complementary experience to you: I started spending time on Stack Overflow, and had something of a "woah" moment when I realized just how many people spend their days futzing with ASP.NET. That's not a world in which I'd spent any time, so I had underestimated its size.
Some parts of the internet (e.g., Reddit) are primarily concerned with free software and its associated languages: Perl, Python, Ruby, PHP.
Some parts (e.g., Lambda the Ultimate) are concerned with more esoteric languages: Haskell, Lisp, Joy, Coq.
Other parts (e.g., Stack Overflow) are more mainstream: Java, .NET.
You cannot draw any conclusions about the popularity of a language by sampling just one of these 'pools'.
Ruby had its moment in the sun in 2005 - 2006 when Rails was making its way through the community and Apple decided that it would package it with OS X. So to pick 2008 as the year for Ruby to gain a lot of popularity seems amiss to me.
The Ruby language itself is, as you stated, very attractive. Its syntax and OOP model are what make it a hit with developers. You get equivocally the same product as you would with another language, but with what feels like less time wasted on internals.
Rails is really what I think is holding Ruby in the mainstream right now, more or less because of its ease-of-use and database handling. Web developers love it for that.
If you really want to see sites that have a lot of Ruby(on Rails) chatter on them, you could check out http://refactormycode.com or http://pastie.org. Those websites are built on RoR and are used very frequently by Ruby(on Rails) users.
Regardless of any real numbers, one thing I do know: When I go to look for a Ruby library for something I'm working on, I find something satisfactory over 90% of the time. And for some of the remaining 10%, it isn't that hard to write something myself. I do believe that 90% figure will rise over the next few years, too.
If I get what I need, I don't really care whether PHP or Python or C# has sixty bajillion applications and libraries written for them. :)
I find Ruby very attractive in several ways, but it has some issues holding it back.
The biggest I think is that Python already covers much of the same ground, has a larger library of projects, and thusfar better performance.
The other main problem I've had is also the thing that keeps it so popular: Rails. I think there are a lot of people that don't even think of Ruby as a standalone language. While I appreciate that Rails is supposed to be pretty great, it is not anything I deal with, and thus I get annoyed at having to wade through so much discussion of Rails to find an answer to a question in Ruby as a standalone language.
One last thing that has made me skittish about it is the 'more than one way to do it' philosophy it shares with Perl. I was not a fan of that.
It is really a matter of their already being a few hammers around, and Ruby's main distinguishing feature that most people tout currently is Rails.
There's a lot of activity with Ruby in web-based development. You just have to join the right communities and lists.
I don't think it will ever be as popular as C/C++ (because of the existence of already deployed code and a developer base) or Java (because I imagine it isn't quite as easy to understand at first).
2 - Not many rubyist come here. I you look there is a TONNE of Ruby projects. Just not here so much.
Check out what is happening on Github, rubyforge etc. I mean, Rails for starters is massive.
Here's my theory:
Industry Adoption - Although ruby is used in the real world, other languages (e.g. Java, C++, C#, etc.) have been accepted as "safe languages". No one ever got fired for picking Java, or C#, but CIOs' eyebrows have been known to raise when ruby is mentioned.
Talent Pool - When selecting a language, you want to know that you can find a good pool of talent. The more popular the language, the larger the pool, and the greater number of experts (statistically) (statistics do lie 50% of the time ;) ).
My hopes:
I believe the ruby talent pool will grow over time and the productivity offered by ruby will present a huge incentive for its adoption.
More and more colleges will teach it.
Please don't take TIOBE too seriously. Checking search engines for instances of "language-name programming" as some sort of indicator of popularity isn't very meaningful.
More than likely because it is younger than a lot of other languages and, on the web side of things, isn't as easy to implement as PHP and Python. Ruby has also gained notoriety as a web scripting language due to Rails which may be turning off some developers who are looking for client-based languages to work with.
Is Ruby not popular? I think it is but it hasn't really reached a critical mass yet to be widely accepted.

Resources