Clojure number crunching performance

Clojure number crunching performance - performance

I'm not sure whether this belongs on StackOverflow or in the Clojure Google group. But the group seems to be busy discussing numeric improvements for Clojure 1.2, so I'll try here:
http://shootout.alioth.debian.org/ has a number of performance benchmarks for various languages.
I noticed that Clojure was missing, so I made a Clojure version of the n-body problem.
The fastest code I was able to produce can be found here, and benchmarking it seems to be saying that for number crunching Clojure is
factor ~10 quicker than Python/Ruby/Perl
factor ~4 slower than C/Java/Scala/Ada
approximately on par with OCaml, Erlang and Go
I'm quite happy with that level of performance.
My question to the Clojure gurus is
Are there obvious improvements I have missed, either in terms of speed or in terms of code brevity or readability (without sacrificing speed)?
Do you consider this to be representative of Clojure performance vs Python/Ruby/Perl on one hand and Java/C on the other?
Update
More Clojure 1.1 benchmark programs for the shootout here, including the n-body problem.

Not a flood of responses here :) but apparently some interest, so I'll try to answer my own question with what I've learned over the past few days:
With the 1.1 optimization approach (Java primitives and mutable arrays) ~4x slower than optimized Java is about as fast as it goes.
The 1.2 constructs definterface and deftype are more than twice as fast, coming within ~1.7x (+70%) of Java with shorter, simpler and cleaner code than for 1.1.
Here are the implementations:
Clojure 1.1 approach
Clojure 1.2 approach
More details including "lessons learned", JVM version and profiling screenshots.
Subjectively speaking, optimizing the 1.2 code was a breeze compared to optimizing 1.1, so this is very good news for Clojure number crunching. (Actually close to amazing :)
The 1.2 testing used the current master branch, I did not try any of the new numeric branches. From what I can gather the new ideas currently being discussed
may make non-optimized numerics faster
may speed up the 1.1 version of this benchmark
will probably not speed up the 1.2 version, it is already as "close to the metal" as it is likely to get.
Disclaimers:
Clojure 1.2 is not released yet, so 1.2 benchmark results are preliminary.
This is one particular benchmark on physics calculations. It is relevant to floating point number crunching, but irrelevant to performance in areas like string parsing, concurrency or web request handling.

I wonder if Cantor might be of use to you -- it's a high performance math library for Clojure. Also see this thread on the Google group, which is about a similar project in the context of the new primitive arithmetic stuff.

This is a slightly old question and the existing answers are somewhat out of date, so I'd like to add an update as of mid-2013 for those interested in "number crunching" in Clojure
There has been a lot happening in the Clojure Numerical computing space:
Clojure 1.5 is now out, which has a lot of improved support for numerical operations. In most cases, it's now possible to get very close to pure Java speed
A dedicated newsgroup - Numerical Clojure
core.matrix now provides an idiomatic API for matrix maths / numerical computing that supports multiple backend implementations (including native BLAS libraries)
Disclaimer: I'm a maintainer / contributor to several of the above.

Related

why golang is slower than scala? [closed]

Closed. This question is opinion-based. It is not currently accepting answers.
Want to improve this question? Update the question so it can be answered with facts and citations by editing this post.
Closed 9 years ago.
Improve this question
In this test, we can see that the performance of golang is sometimes much slower than scala. In my opinion, since the code of golang is compiled directly to c/c++ compatible binary code, while the code of scala is compiled to JVM byte code, golang should have much better performance, especially in these computation-intensive algorithm the benchmark did. Is my understanding incorrect?
http://benchmarksgame.alioth.debian.org/u64/chartvs.php?r=eNoljskRAEEIAlPCA48ozD%2Bb1dkX1UIhzELXeGcih5BqXeksDvbs8Vgi9HFr23iGiD82SgxJqRWkKNctgkMVUfwlHXnZWDkut%2BMK1nGawoYeDLlYQ8eLG1tvF91Dd8NVGm4sBfGaYo0Pok0rWQ%3D%3D&m=eNozMFFwSU1WMDIwNFYoNTNRyAMAIvoEBA%3D%3D&w=eNpLz%2FcvTk7MSQQADkoDKg%3D%3D

Here's what I think's going on in the four benchmarks where the go solutions are the slowest compared to the scala solutions.
mandelbrot: the scala implementation has its internal loop unrolled one time. It may be also that the JVM can vectorise the calculation like this, which I think the go compiler doesn't yet do. This is good manual optimisation plus better JVM support for speeding arithmetic.
regex-dna: the scala implementation isn't doing what the benchmark requires: it's asked to """(one pattern at a time) match-replace the pattern in the redirect file, and record the sequence length""" but it's just calculating the length and printing that. The go version does the match-replace so is slower.
k-nucleotide: the scala implementation has been optimised by using bit-twiddling to pack nucleotides into a long rather than use chars. It's a good optimisation that could also be applied to the Go code.
binary-trees: this tests gc performance by filling RAM. It's true that java gc is much faster than the go gc, but the argument for this not being the top priority for go is that usually one can avoid gc in real programs by not producing garbage in the first place.

This chart is from the Programming Shootout. You should read the disclaimers on the Shootout page before taking the benchmarks as gospel. At best these benchmarks are only useful for indicating broad expectations of performance.
That said, the JVM has a decade of well-funded optimization and apart from startup time, provides excellent performance for running code. Go is still a young language. The fact that Go comes within spitting distance of a JVM language is impressive. If you enjoy programming in Go, you should not reject it over one benchmark.

This is discussed in the go FAQ:
One of Go's design goals is to approach the performance of C for comparable programs, yet on some benchmarks it does quite poorly, including several in test/bench/shootout. The slowest depend on libraries for which versions of comparable performance are not available in Go. For instance, pidigits.go depends on a multi-precision math package, and the C versions, unlike Go's, use GMP (which is written in optimized assembler). Benchmarks that depend on regular expressions (regex-dna.go, for instance) are essentially comparing Go's native regexp package to mature, highly optimized regular expression libraries like PCRE.
Benchmark games are won by extensive tuning and the Go versions of most of the benchmarks need attention. If you measure comparable C and Go programs (reverse-complement.go is one example), you'll see the two languages are much closer in raw performance than this suite would indicate.
Still, there is room for improvement. The compilers are good but could be better, many libraries need major performance work, and the garbage collector isn't fast enough yet. (Even if it were, taking care not to generate unnecessary garbage can have a huge effect.)
As an aside, consider the 10x (!) speed difference between the different versions of a benchmark for a given programming language. C gcc #7 is 8.3 times slower than C gcc #5, and Ada #3 almost 10 times slower than Ada #5. These benchmarks provide a rough idea of how language compare, but the difference between Go and Scala is within one order of magnitude, which means any 'intrinsic' variation between the runtimes is likely to be dwarfed by differences in the implementation: this post describes how they sped up a program 11x by performing smarter memory allocation. Maybe the compiler/runtime should be handling this kind of optimisations automatically (as the JVM does, to a certain level), but I am not sure you can really draw the conclusion that 'Go is slower (resp. faster) than Scala' in the general case from these figures. Just my opinion though :)

Since you seem to be keen in looking at these biased benchmarks. Let's take a real example for real scenario not some Fibonacci implementations.
Take a look at these rankings for web frameworks benchmarks, the testing was done using native client if available and sometimes using OSS web frameworks, they also use many packages for testing with the same language. The tests vary from requests for raw strings to using ORM to query a database.
It is clear that Scala performance is no where close to Go, in all of the tests Scala was below Go. Having said this, benchmarks are nothing close to reality and I suggest you look at a language from tools/features perspective or simply what would be best to solve your problem.

As Brad pointed out, these results are from one particular benchmark suite. This provides some information, but don't assume it's the whole picture. It would be helpful to know whether the source code is well enough written in each case to give the fastest speed, the least memory use, or some other target goal.
Perhaps we might compare with another website that ranks languages. Take a look at http://www.techempower.com/benchmarks/ in which web service codes are compared. In spite of being a young language, Go is one of the best in some of these benchmarks.
As in all benchmarks, it always depends what you strive for and how you measure it.

High Frequency Trading in the JVM with Scala/Akka

Let's imagine an hypothetical HFT system in Java, requiring (very) low-latency, with lots of short-lived small objects somewhat due to immutability (Scala?), thousands of connections per second, and an obscene number of messages passing around in an event-driven architecture (akka and amqp?).
For the experts out there, what would (hypothetically) be the best tuning for JVM 7? What type of code would make it happy? Would Scala and Akka be ready for this kind of systems?
Note: There has been some similar questions, like this one, but I've yet to find one covering Scala (which has its own idiosyncratic footprint in the JVM).

It is possible to achieve very good performance in Java. However the question needs to be more specific to provide a credible answer. Your main sources of latency will come from follow non-exhaustive list:
How much garbage you create and the work of the GC to collect and
promote it. Immutable designs in my experience do not fit well with
low-latency. GC tuning needs to be a big focus.
Warm up the JVM so that classes are loaded and the JIT has had time
to do its work.
Design your algorithms to be O(1) or at least O(log2 n), and have
performance tests that assert this.
Your design needs to be lock-free and follow the "Single Writer
Principle".
A significant effort needs to be put into understanding the whole
stack and showing mechanical sympathy in its use.
Design your algorithms and data structures to be cache friendly.
Cache misses these days are the biggest cost. This is closely
related to process affinity which if not set up correctly can result
and significant cache pollution. This will involve sympathy for the OS and even some JNI code in some cases.
Ensure you have sufficient cores so that any thread that needs to
run has a core available without having to wait.
I recently blogged about a case study of such an exercise.

On my laptop the average latency of ping messages between Akka 2.3.7 actors is ~300ns and it is much less than the latency expected due to GC pauses on JVMs.
Code (incl. JVM options) & test results for Akka and other actors on Intel Core i7-2640M here.
P.S. You can find lots of principles and tips for low-latency computing on Dmitry Vyukov's site and in Martin Thompson's blog.

You may find that use of a ring buffer for message passing will surpass what can be done with Akka. The main ring buffer implementation that people use on the JVM for financial applications is one called Disruptor which is carefully tuned for efficiency (power of two size), for the JVM (no GC, no locks) and for modern CPUs (no false sharing of cache lines).
Here is an intro presentation from a Scala point of view http://scala-phase.org/talks/jamie-allen-sdisruptor/index.html#1 and there are links on the last slide to the original LMAX stuff.

There are HFT systems written in Java, lots of them, but I've never heard of a single one written in Scala. Why?
JVM languages, where garbage collectors is a common place, are not axactly a good choice for HFT. But Java is a garbage collected language. So, what is the trick? The trick is that people code HFT in Java as if they were using C, I mean: they trade some niceties Java offers by higher control of memory allocations. The trick is basically replacing the JCF (Java Collections Framework) by mundane arrays pre-allocated in memory, or some other creative solutions.
Scala is a functional programming language which promotes heavy use of functional programming techniques, most of them backed by the collections framework. The point is: if you are going to get rid of the collections framework (by item 1 above), you are basically not using the basic building blocks of FP built in Scala. If you are not doing that in the first place, what is the point of using Scala at all?
Akka does not look to be a good option because... well... it is written in Scala. So, but items (1) and (2), we can rule out Akka too.
The point is: Technology evolved since the question was proposed in 2012. My answer is now 10 years from the future. We have far more interesting options than C or C++, the only possible options in 2012.
OK... maybe not options (in plural)... but one option: Rust.
Rust is not a garbage collected language, it is a safe programming language, it productive, it offers a rich asynchronous library, and it is pretty fast.
The only "problem" with Rust is that people responsible for steering technology in big banks and financial institutions are, in general, averse to innovation, preferring things they already know since the beginning of their long careers. So, in a nutshell, it will take some time until we see HFT systems written in Rust.

Performance of Google's Go?

So has anyone used Google's Go? I was wondering how the mathematical performance (e.g. flops) is compared to other languages with a garbage collector... like Java or .NET?
Has anyone investigated this?

Theoretical performance: The theoretical performance of pure Go programs is somewhere between C/C++ and Java. This assumes an advanced optimizing compiler and it also assumes the programmer takes advantage of all features of the language (be it C, C++, Java or Go) and refactors the code to fit the programming language.
Practical performance (as of July 2011): The standard Go compiler (5g/6g/8g) is currently unable to generate efficient instruction streams for high-performance numerical codes, so the performance will be lower than C/C++ or Java. There are multiple reasons for this: each function call has an overhead of a couple of additional instructions (compared to C/C++ or Java), no function inlining, average-quality register allocation, average-quality garbage collector, limited ability to erase bound checks, no access to vector instructions from Go, compiler has no support for SSE2 on 32-bit x86 CPUs, etc.
Bottom line: As a rule of thumb, expect the performance of numerical codes implemented in pure Go, compiled by 5g/6g/8g, to be 2 times lower than C/C++ or Java. Expect the performance to get better in the future.
Practical performance (September 2013): Compared to older Go from July 2011, Go 1.1.2 is capable of generating more efficient numerical codes but they remain to run slightly slower than C/C++ and Java. The compiler utilizes SSE2 instructions even on 32-bit x86 CPUs which causes 32-bit numerical codes to run much faster, most likely thanks to better register allocation. The compiler now implements function inlining and escape analysis. The garbage collector has also been improved but it remains to be less advanced than Java's garbage collector. There is still no support for accessing vector instructions from Go.
Bottom line: The performance gap seems sufficiently small for Go to be an alternative to C/C++ and Java in numerical computing, unless the competing implementation is using vector instructions.

The Go math package is largely written in assembler for performance.
Benchmarks are often unreliable and are subject to interpretation. For example, Robert Hundt's paper Loop Recognition in C++/Java/Go/Scala looks flawed. The Go blog post on Profiling Go Programs dissects Hundt's claims.

You're actually asking several different questions. First of all, Go's math performance is going to be about as fast as anything else. Any language that compiles down to native code (which arguably includes even JIT languages like .NET) is going to perform extremely well at raw math -- as fast as the machine can go. Simple math operations are very easy to compile into a zero-overhead form. This is the area where compiled (including JIT) languages have a advantage over interpreted ones.
The other question you asked was about garbage collection. This is, to a certain extent, a bit of a side issue if you're talking about heavy math. That's not to say that GC doesn't impact performance -- actually it impacts quite a bit. But the common solution for tight loops is to avoid or minimize GC sweeps. This is often quite simple if you're doing a tight loop -- you just re-use your old variables instead of constantly allocating and discarding them. This can speed your code by several orders of magnitude.
As for the GC implementations themselves -- Go and .NET both use mark-and-sweep garbage collection. Microsoft has put a lot of focus and engineering into their GC engine, and I'm obliged to think that it's quite good all things considered. Go's GC engine is a work in progress, and while it doesn't feel any slower than .NET's architecture, the Golang folks insist that it needs some work. The fact that Go's specification disallows destructors goes a long way in speeding things up, which may be why it doesn't seem that slow.
Finally, in my own anecdotal experience, I've found Go to be extremely fast. I've written very simple and easy programs that have stood up in my own benchmarks against highly-optimized C code from some long-standing and well-respected open source projects that pride themselves on performance.
The catch is that not all Go code is going to be efficient, just like not all C code is efficient. You've got to build it correctly, which often means doing things differently than what you're used to from other languages. The profiling blog post mentioned here several times is a good example of that.

Google did a study comparing Go to some other popular languages (C++, Java, Scala). They concluded it was not as strong performance-wise:
https://days2011.scala-lang.org/sites/days2011/files/ws3-1-Hundt.pdf
Quote from the Conclusion, about Go:
Go offers interesting language features, which also allow for a concise and standardized notation. The compilers for this language are still immature, which reﬂects in both performance and binary sizes.

What is the most performant lisp on the JVM

What is the most performant (fastest) lisp implementation on the JVM?
By lisp implementation I consider all implementations of any language in lisp family, like Common Lisp, Scheme, Clojure, ...
I know that Clojure can be made pretty fast using type hints, that ABCL is in general not considered to be fast. I don't have experience using any Scheme on JVM, but heard that Kawa is pretty fast too.

With Clojure you can get to the speed of Java (with type hints of course) and you can not get faster than java (exept in some very rare cases). I don't know about the other lisps the are maybe the same speed but not faster.
So that said about the standard speed of calls and so on.
Clojure has data structures are not always as fast as possible but the really make up for that with there other properties like thread safety, immutable and fast reading.
To make the data structures faster Rich invented transient with make them mutable in a way that they are still functional (and the are a LOT faster) and he is already working on the next big thing (read about the Emerging Languages camp talk of rich).
Its much easier to write concurrent code with clojure so that is really imported for to make fast programmes.
So the next thing is math. There are three levels of Speed on the JVM. Math with boxed Types,primitiv Types with overflow checking,or without overflow checking. Clojure provides all of those so no limit there.
So the next thing is how fast can you work with Java, if you have to use wrappers you wont perform as good and java calls are used often in most JVM languages. To implement clojure in clojure, clojure needed to add a low level construct so that you can interact with java without any overhead.
So clojure is as fast as it gets on the JVM.
P.S.
Protocols are like really fast multmethods witch are not that generic but the dispatch fast enough to use them in clojure core (and so not depend on java anymore). Look them up the are way cool.

not a whole lot of good data though this and some others seem to indicate the obvious. Immutible languages suffer slightly when doing non-immutable tasks, and non-immutable languages suffer when doing highly parallel tasks.
When considering these questions it helps to consider the "fail back option". Clojure can fall back to java for any part of your code that the profiler tells you is not pulling its wight.
in short: I vote clojure :)

I'd be surprised if it is not clojure. No other JVM lisp I know of has had as much attention paid to performance. The fastest Scheme is probably SISC - it's compiled to a FASL format, but still not "native" JVM instruction level.

I've been testing various lisps over the last few weeks. Kawa is the fastest JVM implementation I've tried so far in terms of startup time, REPL response time and running basic scripts. The author posted some performance statistics in 2010, which shows that Kawa outperforms clojure by a clear margin. YMMV.
http://per.bothner.com/blog/2010/Kawa-in-shootout/

What makes Ruby slow? [closed]

Closed. This question needs to be more focused. It is not currently accepting answers.
Want to improve this question? Update the question so it focuses on one problem only by editing this post.
Closed 8 years ago.
Improve this question
Ruby is slow at certain things. But what parts of it are the most problematic?
How much does the garbage collector affect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.
I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?
Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?
Has there been recent work that has significantly improved performance?

Ruby is slow. But what parts of it are the most problematic?
It does "late lookup" for methods, to allow for flexibility. This slows it down quite a bit. It also has to remember variable names per context to allow for eval, so its frames and method calls are slower. Also it lacks a good JIT compiler currently, though MRI 1.9 has a bytecode compiler (which is better), and jruby compiles it down to java bytecode, which then (can) compile via the HotSpot JVM's JIT compiler, but it ends up being about the same speed as 1.9.
How much does the garbage collector effect performance? I know I've had times when running the garbage collector alone took several seconds, especially when working with OpenGL libraries.
from some of the graphs at http://www.igvita.com/2009/06/13/profiling-ruby-with-googles-perftools/ I'd say it takes about 10% which is quite a bit--you can decrease that hit by increasing the malloc_limit in gc.c and recompiling.
I've used matrix math libraries with Ruby that were particularly slow. Is there an issue with how ruby implements basic math?
Ruby 1.8 "didn't" implement basic math it implemented Numeric classes and you'd call things like Fixnum#+ Fixnum#/ once per call--which was slow. Ruby 1.9 cheats a bit by inlining some of the basic math ops.
Are there any dynamic features in Ruby that simply cannot be implemented efficiently? If so, how do other languages like Lua and Python solve these problems?
Things like eval are hard to implement efficiently, though much work can be done, I'm sure. The kicker for Ruby is that it has to accomodate for somebody in another thread changing the definition of a class spontaneously, so it has to be very conservative.
Has there been recent work that has significantly improved performance?
1.9 is like a 2x speedup. It's also more space efficient. JRuby is constantly trying to improve speed-wise [and probably spends less time in the GC than KRI]. Besides that I'm not aware of much except little hobby things I've been working on. Note also that 1.9's strings are at times slower because of encoding friendliness.

Ruby is very good for delivering solutions quickly. Less so for delivering quick solutions. It depends what kind of problem you're trying to solve. I'm reminded of the discussions on the old CompuServe MSBASIC forum in the early 90s: when asked which was faster for Windows development, VB or C, the usual answer was "VB, by about 6 months".
In its MRI 1.8 form, Ruby is - relatively - slow to perform some types of computationally-intensive tasks. Pretty much any interpreted language suffers in that way in comparison to most mainstream compiled languages.
The reasons are several: some fairly easily addressable (the primitive garbage collection in 1.8, for example), some less so.
1.9 addresses some of the issues, although it's probably going to be some time before it becomes generally available. Some of the other implementation that target pre-existing runtimes, JRuby, IronRuby, MagLev for example, have the potential to be significantly quicker.
Regarding mathematical performance, I wouldn't be surprised to see fairly slow throughput: it's part of the price you pay for arbitrary precision. Again, pick your problem. I've solved 70+ of the Project Euler problems in Ruby with almost no solution taking more than a mintue to run. How fast do you need it to run and how soon do you need it?

The most problematic part is "everyone".
Bonus points if that "everyone" didn't really use the language, ever.
Seriously, 1.9 is much faster and now is on par with python, and jruby is faster than jython.
Garbage collectors are everywhere; for example, Java has one, and it's faster than C++ on dynamic memory handling. Ruby isn't suited well for number crunching; but few languages are, so if you have computational-intensive parts in your program in any language, you better rewrite them in C (Java is fast with math due to its primitive types, but it paid dearly for them, they're clearly #1 in ugliest parts of the language).
As for dynamic features: they aren't fast, but code without them in static languages can be even slower; for example, java would use a XML config instead of Ruby using a DSL; and it would likely be SLOWER since XML parsing is costly.

Hmm - I worked on a project a few years ago where I scraped the barrel with Ruby performance, and I'm not sure much has changed since. Right now it's caveat emptor - you have to know not to do certain things, and frankly games / realtime applications would be one of them (since you mention OpenGL).
The culprit for killing interactive performance is the garbage collector - others here mention that Java and other environments have garbage collection too, but Ruby's has to stop the world to run. That is to say, it has to stop running your program, scan through every register and memory pointer from scratch, mark the memory that's still in use, and free the rest. The process can't be interrupted while this happens, and as you might have noticed, it can take hundreds of milliseconds.
Its frequency and length of execution is proportional to the number of objects you create and destroy, but unless you disable it altogether, you have no control. My experience was there were several unsatisfactory strategies to smooth out my Ruby animation loop:
GC.disable / GC.enable around critical animation loops and maybe an opportunistic GC.start to force it to go when it can't do any harm. (because my target platform at the time was a 64MB Windows NT machine, this caused the system to run out of memory occasionally. But fundamentally it's a bad idea - unless you can pre-calculate how much memory you might need before doing this, you're risking memory exhaustion)
Reduce the number of objects you create so the GC has less work to do (reduces the frequency / length of its execution)
Rewrite your animation loop in C (a cop-out, but the one I went with!)
These days I would probably also see if JRuby would work as an alternative runtime, as I believe it relies on Java's more sophisticated garbage collector.
The other major performance issue I've found is basic I/O when trying to write a TFTP server in Ruby a while back (yeah I pick all the best languages for my performance-critical projects this was was just an experiment). The absolute simplest tightest loop to simply respond to one UDP packet with another, contaning the next piece of a file, must have been about 20x slower than the stock C version. I suspect there might have been some improvements to make there based around using low-level IO (sysread etc.) but the slowness might just be in the fact there is no low-level byte data type - every little read is copied out into a String. This is just speculation though, I didn't take this project much further but it warned me off relying on snappy I/O.
The main speed recent increase that has gone on, though I'm not fully up-to-date here, is that the virtual machine implementation was redone for 1.9, resulting in faster code execution. However I don't think the GC has changed, and I'm pretty sure there's nothing new on the I/O front. But I'm not fully up-to-date on bleeding-edge Ruby so someone else might want to chip in here.

I assume that you're asking, "what particular techniques in Ruby tend to be slow."
One is object instantiation. If you are doing large amounts of it, you want to look at (reasonable) ways of reducing that, such as using the flyweight pattern, even if memory usage is not a problem. In one library where I reworked it not to be creating a lot of very similar objects over and over again, I doubled the overall speed of the library.

Steve Dekorte: "Writing a Mandelbrot set calculator in a high level language is like trying to run the Indy 500 in a bus."
http://www.dekorte.com/blog/blog.cgi?do=item&id=4047
I recommend to learn various tools in order to use the right tool for the job. Doing matrix transformations could be done efficiently using high-level API which wraps around tight loops with arithmetic-intensive computations. See RubyInline gem for an example of embedding C or C++ code into Ruby script.
There is also Io language which is much slower than Ruby, but it efficiently renders movies in Pixar and outperforms raw C on vector arithmetics by using SIMD acceleration.
http://iolanguage.com
https://renderman.pixar.com/products/tools/it.html
http://iolanguage.com/scm/git/checkout/Io/docs/IoGuide.html#Primitives-Vector

Ruby 1.9.1 is about twice as fast as PHP, and a little bit faster than Perl, according to some benchmarks.
(Update: My source is this (screenshot). I don't know what his source is, though.)
Ruby is not slow. The old 1.8 is, but the current Ruby isn't.

Ruby is slow because it was designed to optimize the programmers experience, not the program's execution time. Slowness is just a symptom of that design decision. If you would prefer performance to pleasure, you should probably use a different language. Ruby's not for everything.

IMO, dynamic languages are all slow in general. They do something in runtime that static languages do in compiling time.
Syntax Check, Interpreting and Like type checking, converting. this is inevitable, therefore ruby is slower than c/c++/java, correct me if I am wrong.

Develop Reference

ruby bash windows laravel spring algorithm oracle macos go visual-studio