Ruby performance in .each loops - ruby

Consider the following two peices of ruby code
Example 1
name = user.first_name
round_number = rounds.count
users.each do |u|
puts "#{name} beat #{u.first_name} in round #{round_number}"
end
Example 2
users.each do |u|
puts "#{user.first_name} beat #{u.first_name} in #{rounds.count}"
end
For both pieces of code imagine
#user.rb
def first_name
name.split.first
end
So in a classical analysis of algorithms, the first piece of code would be more efficient, however in most modern compiled languages, modern compilers would optimize the second piece of code to make it look like the first, eliminating the need to optimize code in such maner.
Will ruby optimize or cache values for this code before execution? Should my ruby code look like example 1 or example 2?

Example 1 will run faster, as first_name() is only called once, and it's value stored in the variable.
In Example 2 Ruby will not memoize this value automatically, since the value could have changed between iterations for the each() loop.
Therefor expensive-to-calculate methods should be explicitly memoized if they are expected to be used more than once without the return value changing.
Making use of Ruby's Benchmark Module can be useful when making decisions like this. It will likely only be worth memoizing if there are a lot of values in users, or if first_name() is expensive to calculate.

A compiler can only perform this optimization if it can prove that the method has no side effects. This is even more difficult in Ruby than most languages, as everything is mutable and can be overridden at runtime. Whether it happens or not is implementation dependent, but since it's hard to do in Ruby, most do not. I actually don't know of any that do at the time of this posting.

Related

Bubble Sort method

I am just learning ruby and KevinC's response (in this link) makes sense to me with one exception. I don't understand why the code is encompassed in the arr.each do |i| #while... end That part seems redundant to me as the 'while' loop is already hitting each of the positions? Can someone explain?
The inner loop finds a bubble and carries it up; if it finds another, lighter bubble, it switches them around and carries the lighter one. So you need several passes through the array to find all the bubbles and carry them to the correct place, since you can't float several bubbles at the same time.
EDIT:
The each is really misused in KevinC's code, since it is not used for its normal purpose: yielding elements of the collection. Instead of arr.each, it would be better to use arr.size.times - as it would be more informative to the reader. Redefining the i within the block is adding insult to injury. While none of this will cause the code to be wrong as such, it is misleading.
The other problem with the code is the fact that it does not provide the early termination condition (swapped in most other answers on that question). In theory, bubble sort could find the array sorted in the first pass; the other size - 1 steps are unnecesary. KevinC's code would still dry-hump the already sorted array, never realising it is done.
As for rewrite into block-less code, it is certainly possible, but you need to understand that blocks syntax is very idiomatic in Ruby, and non-block loops are almost unheard of in Ruby world. While Ruby has for, it is pretty much never used in Ruby. But...
arr.each do |i|
...
end
is equivalent to
for i in arr
...
end
which is, again, at least for the array case, equivalent to
index = 0
while index < arr.size
i = arr[index]
...
index += 1
end

ruby tdd best practice in dealing with obsolete tests

I'm running through a very basic challenge at Code Wars. The challenge is to test-drive a method that returns the sum of an array of squares.
So far, my tests are:
describe "square method" do
it "should return the square of a number" do
Test.assert_equals(squareSum(4), [16])
end
it "should return the square of multiple numbers" do
Test.assert_equals(squareSum(4, 2, 3), [16, 4, 9])
end
end
and my code is:
def squareSum(*numbers)
numbers.map { |num| num ** 2 }
end
Now I'm at the point where I need to change it so that it adds the sum. Which, in my mind, necessarily negates the two previous tests. As far as TDD best practices go, was I being ridiculous testing those first two scenarios, given that they aren't what I'm trying to get the method to do? How should I proceed with my next test?
Should I:
delete the previous two tests, since they will fail once I change the method?
find a way to make it so that the two previous tests don't fail even once I've changed it?
In approaching this problem, should I have not worried about the first two tests? I am having a fair amount of difficulty phrasing this question. Basically, what I know I want to end up with is:
describe "squareSum method" do
it "should return the sum of the squares of the numbers passed" do
Test.assert_equals(squareSum(1, 2, 2), 9)
end
end
with the code to make it work. I'm just wondering what the best practices are in regards to test-driving this particular kind of problem, given that I wanted to test that I could return squares for multiple numbers before returning the sum. My "final" code will render the initial tests obsolete. This is a, "How much of my work should be present in the final solution?", picky and kind of anal-retentive question, I think. But I am curious.
Since the tests are specifications for the software you intend to write, the question is why you wrote a specification for something you didn't want (since the task of the challenge was not write a function that squares its arguments)?
You should have written a specification for "a method that returns the sum of an array of squares" in the first place. This test starts out being red.
Then you may decide that you need a function that squares its arguments (or the elements of a given array) as an intermediate step. Write a test for this function. Make it green by implementing such a function.
Eventually put all the things together and make your initial test green (your main function could use the helper function and sum-up its return values).
No you can refactor. If during refactoring you decide that you no longer need the helper function: delete the function and delete its tests.
Also when your specifications are changing, you need to rewrite your tests, write new ones or even delete some of them.
But the general rule is: The tests are always a specification for the current state of your software. They should specify exactly what your software is intended to to. Nothing more, nothing less.

Speed differences between proc, Proc.new, lambda, and stabby lambda

Procs and lambdas differ with respect to method scoping and the effect of the return keyword. I am rather interested in the performance differences between them. I wrote a test as shown below:
def time(&block)
start = Time.now
block.call
p "that took #{Time.now - start}"
end
def test(proc)
time{(0..10000000).each{|n| proc.call(n)}}
end
def test_block(&block)
time{(0..10000000).each{|n| block.call(n)}}
end
def method_test
time{(1..10000000).each{|n| my_method(n)}}
end
proc1 = Proc.new{|x| x*x}
proc2 = proc{|x| x*x}
lam1 = lambda{|x| x*x}
lam2 = ->x{x*x}
def my_method(x)
x*x
end
test(proc1)
test(proc2)
test(lam1)
test(lam2)
test_block{|x| x*x}
test(method(:my_method))
method_test
The result of this code is shown below.
"that took 0.988388739"
"that took 0.963193172"
"that took 0.943111226"
"that took 0.950506263"
"that took 0.960760843"
"that took 1.090146951"
"that took 0.644500627"
method(:my_method) is the slowest, which is because it checks a look up table for each iteration in the loop.
Similarly, another test as below:
def test2(&block)
time{(0..1000000).each{block.call}}
end
test2{Proc.new{|x| x*x}}
test2{proc{|x| x*x}}
test2{lambda{|x| x*x}}
test2{->(x){x*x}}
returns this result:
"that took 0.415290453"
"that took 0.378787963"
"that took 0.3888118"
"that took 0.391414639"
Proc.new is the slowest creation method, which is because we have the overhead of creating an entire object to wrap our proc.
I assert that the execution time of the procs and lambdas are the same as one another regardless of their creation method.
Why is normal method invocation so much faster than procs and lambdas (1/3 time reduction)?
Do people think this is likely to change with different block functions etc.?
Are there any other (performance based) reasons to chose between the different approaches?
So it seems you have three questions. The middle one is unclear to me, so I will address the other two:
Why is normal method invocation so much faster?
This is the easier of the questions.
First realize that the times involved here are for function call overhead. I did my own timings based on your code (but with an identity function instead of multiplication), and non-direct invocations took 49% longer. With one multiplication, non-direct invocations took only 43% longer. In other words, one reason why you're seeing a large disparity is that your function itself is doing almost nothing. Even a single multiplication makes 6% of the difference "vanish". In a method of any reasonable complexity, the method call overhead is usually going to be a relatively small percentage of the overall time.
Next, remember that a proc/block/lambda is essentially a chunk of code that is being carried around (though a block literal cannot be saved into a variable). This implies one more level of indirection than a method call...meaning that at the very least the CPU is going to have to traverse a pointer to something.
Also, remember that Ruby supports closures, and I'm betting there is some overhead in deciding which environment the indirect code should run in.
On my machine, running a C program that invokes a function directly has 10% less overhead than one that uses a pointer to a function. An interpreted language like Ruby, where closures are also involved, is definitely going to use more.
My measurements (Ruby 2.2) indicate a direct method invocation takes about as long as 6 multiplications, and an indirect invocation takes about as long as 10.
So while the overhead is nearly twice as large, remember that the overhead in both cases is often relatively small.
Are there any other (performance based) reasons to chose between the different approaches?
I'd say given the above data the answer is usually no: you're much better off using the construct that gives you the most maintainable code.
There are definitely good reasons to choose one over the other. To be honest, I'm surprised about the difference I see between lambdas and blocks (on my machine, lambdas have 20% less overhead). Lambdas create anonymous methods that include parameter list checking, so if anything I would expect it to be slightly slower, but my measurements put it ahead.
That the direct invocation is faster simply isn't surprising at all.
The place where this kind of thing makes a difference is in very frequently called code where the overhead adds up to be noticeable in wall-clock kinds of ways. In this case, it can make sense to do all manner of ugly optimizations to try to squeeze a bit more speed, all the way up to inlining code (dodge the function-call overhead altogether).
For example, say your application uses a database framework. You profile it and find a frequently-called, small (e.g. less than 20 multiplications worth of work) function that copies the data from the database result into data structures. Such a function might comprise the lion's share of the result processing, and simply inlining that function might shave off significant amounts of CPU time at the expense of some code clarity.
Simply inlining your "square function" into a long numeric calculation with a billion steps could save you dramatic amounts of time because the operation itself takes a lot less time than even a direct method invocation.
In most cases, though, you're better off with clean, clear code.

fixnum and prime numbers in ruby

Before I set about to writing this myself, has anyone seen a ruby implementation of the following behavior?
puts 7.nextprime(); #=> 11
puts 7.previousprime(); #=> 5
puts 7.isprime(); #=> true
Obviously this kind of thing would be ugly for large numbers but for integers never exceeding a few thousand (the common instance for me) a sensible implementation is doable, hence the question.
Ruby comes with a built-in Prime class that allows you to iterate through primes starting at 1, but I see no way to initialize it with a starting value other than 1, nor a predicate check to determine whether or not a number is prime. I'd say go for it, though you should keep in mind that math in Ruby can be slow and if performance is a factor you may be better off considering writing it as a C or Java extension. Here's an example of how to use RubyInline to generate primes in C.
Also, I suggest you avoid using the method name 7.isprime - the convention in Ruby is 7.prime?.
Take a look at the snippets found here. They could give you a headstart.

Does Ruby perform Tail Call Optimization?

Functional languages lead to use of recursion to solve a lot of problems, and therefore many of them perform Tail Call Optimization (TCO). TCO causes calls to a function from another function (or itself, in which case this feature is also known as Tail Recursion Elimination, which is a subset of TCO), as the last step of that function, to not need a new stack frame, which decreases overhead and memory usage.
Ruby obviously has "borrowed" a number of concepts from functional languages (lambdas, functions like map and so forth, etc.), which makes me curious: Does Ruby perform tail call optimization?
No, Ruby doesn't perform TCO. However, it also doesn't not perform TCO.
The Ruby Language Specification doesn't say anything about TCO. It doesn't say you have to do it, but it also doesn't say you can't do it. You just can't rely on it.
This is unlike Scheme, where the Language Specification requires that all Implementations must perform TCO. But it is also unlike Python, where Guido van Rossum has made it very clear on multiple occasions (the last time just a couple of days ago) that Python Implementations should not perform TCO.
Yukihiro Matsumoto is sympathetic to TCO, he just doesn't want to force all Implementations to support it. Unfortunately, this means that you cannot rely on TCO, or if you do, your code will no longer be portable to other Ruby Implementations.
So, some Ruby Implementations perform TCO, but most don't. YARV, for example, supports TCO, although (for the moment) you have to explicitly uncomment a line in the source code and recompile the VM, to activate TCO – in future versions it is going to be on by default, after the implementation proves stable. The Parrot Virtual Machine supports TCO natively, therefore Cardinal could quite easily support it, too. The CLR has some support for TCO, which means that IronRuby and Ruby.NET could probably do it. Rubinius could probably do it, too.
But JRuby and XRuby don't support TCO, and they probably won't, unless the JVM itself gains support for TCO. The problem is this: if you want to have a fast implementation, and fast and seamless integration with Java, then you should be stack-compatible with Java and use the JVM's stack as much as possible. You can quite easily implement TCO with trampolines or explicit continuation-passing style, but then you are no longer using the JVM stack, which means that everytime you want to call into Java or call from Java into Ruby, you have to perform some kind of conversion, which is slow. So, XRuby and JRuby chose to go with speed and Java integration over TCO and continuations (which basically have the same problem).
This applies to all implementations of Ruby that want to tightly integrate with some host platform that doesn't support TCO natively. For example, I guess MacRuby is going to have the same problem.
Update: Here's nice explanation of TCO in Ruby: http://nithinbekal.com/posts/ruby-tco/
Update: You might want also check out the tco_method gem: http://blog.tdg5.com/introducing-the-tco_method-gem/
In Ruby MRI (1.9, 2.0 and 2.1) you can turn TCO on with:
RubyVM::InstructionSequence.compile_option = {
:tailcall_optimization => true,
:trace_instruction => false
}
There was a proposal to turn TCO on by default in Ruby 2.0. It also explains some issues that come with that: Tail call optimization: enable by default?.
Short excerpt from the link:
Generally, tail-recursion optimization includes another optimization technique - "call" to "jump" translation. In my opinion,
it is difficult to apply this optimization because recognizing
"recursion" is difficult in Ruby's world.
Next example. fact() method invocation in "else" clause is not a "tail
call".
def fact(n)
if n < 2
1
else
n * fact(n-1)
end
end
If you want to use tail-call optimization on fact() method, you need
to change fact() method as follows (continuation passing style).
def fact(n, r)
if n < 2
r
else
fact(n-1, n*r)
end
end
It can have but is not guaranteed to:
https://bugs.ruby-lang.org/issues/1256
TCO can also be compiled in by tweaking a couple variables in vm_opts.h before compiling:
https://github.com/ruby/ruby/blob/trunk/vm_opts.h#L21
// vm_opts.h
#define OPT_TRACE_INSTRUCTION 0 // default 1
#define OPT_TAILCALL_OPTIMIZATION 1 // default 0
This builds on Jörg's and Ernest's answers.
Basically it depends on implementation.
I couldn't get Ernest's answer to work on MRI, but it is doable.
I found this example that works for MRI 1.9 to 2.1. This should print a very large number. If you don't set TCO option to true, you should get the "stack too deep" error.
source = <<-SOURCE
def fact n, acc = 1
if n.zero?
acc
else
fact n - 1, acc * n
end
end
fact 10000
SOURCE
i_seq = RubyVM::InstructionSequence.new source, nil, nil, nil,
tailcall_optimization: true, trace_instruction: false
#puts i_seq.disasm
begin
value = i_seq.eval
p value
rescue SystemStackError => e
p e
end

Resources