Why does RuboCop suggest replacing .times.map with Array.new? - ruby

RuboCop suggests:
Use Array.new with a block instead of .times.map.
In the docs for the cop:
This cop checks for .times.map calls. In most cases such calls can be replaced with an explicit array creation.
Examples:
# bad
9.times.map do |i|
i.to_s
end
# good
Array.new(9) do |i|
i.to_s
end
I know it can be replaced, but I feel 9.times.map is closer to English grammar, and it's easier to understand what the code does.
Why should it be replaced?

The latter is more performant; here is an explanation: Pull request where this cop was added
It checks for calls like this:
9.times.map { |i| f(i) }
9.times.collect(&foo)
and suggests using this instead:
Array.new(9) { |i| f(i) }
Array.new(9, &foo)
The new code has approximately the same size, but uses fewer method
calls, consumes less memory, works a tiny bit faster and in my opinion
is more readable.
I've seen many occurrences of times.{map,collect} in different
well-known projects: Rails, GitLab, Rubocop and several closed-source
apps.
Benchmarks:
Benchmark.ips do |x|
x.report('times.map') { 5.times.map{} }
x.report('Array.new') { Array.new(5){} }
x.compare!
end
__END__
Calculating -------------------------------------
times.map 21.188k i/100ms
Array.new 30.449k i/100ms
-------------------------------------------------
times.map 311.613k (± 3.5%) i/s - 1.568M
Array.new 590.374k (± 1.2%) i/s - 2.954M
Comparison:
Array.new: 590373.6 i/s
times.map: 311612.8 i/s - 1.89x slower
I'm not sure now that Lint is the correct namespace for the cop. Let
me know if I should move it to Performance.
Also I didn't implement autocorrection because it can potentially
break existing code, e.g. if someone has Fixnum#times method redefined
to do something fancy. Applying autocorrection would break their code.

If you feel it is more readable, go with it.
This is a performance rule and most codepaths in your application are probably not performance critical. Personally, I am always open to favor readability over premature optimization.
That said
100.times.map { ... }
times creates an Enumerator object
map enumerates over that object without being able to optimize, for example the size of the array is not known upfront and it might have to reallocate more space dynamically and it has to enumerate over the values by calling Enumerable#each since map is implemented that way
Whereas
Array.new(100) { ... }
new allocates an array of size N
And then uses a native loop to fill in the values

When you need to map the result of a block invoked a fixed amount of times, you have an option between:
Array.new(n) { ... }
and:
n.times.map { ... }
The latter one is about 60% slower for n = 10, which goes down to around 40% for n > 1_000.
Note: logarithmic scale!

Related

Array difference by explicitly specified method or block

If I have Arrays a and b, the expression a-b returns an Array with all those elements in a which are not in b. "Not in" means unequality (!=) here.
In my case, both arrays only contain elements of the same type (or, from the ducktyping perspective, only elements which understand a "equality" method f).
Is there an easy way to specify this f as a criterium of equality, in a similar way I can provide my own comparator when doing sort? Currently, I implemented this explicitly :
# Get the difference a-b, based on 'f':
a.select { |ael| b.all? {|bel| ael.f != bel.f} }
This works, but I wonder if there is an easier way.
UPDATE: From the comments to this question, I get the impression, that a concrete example would be appreciated. So, here we go:
class Dummy; end
# Create an Array of Dummy objects.
a = Array.new(99) { Dummy.new }
# Pick some of them at random
b = Array.new(10) { a.sample }
# Now I want to get those elements from a, which are not in b.
diff = a.select { |ael| b.all? {|bel| ael.object_id != bel.object_id} }
Of course in this case, I could also have said ! ael eql? bel, but in my general solution, this is not the case.
The "normal" object equality for e.g. Hashes and set operations on Arrays (such as the - operation) uses the output of the Object#hash method of the contained objects along with the semantics of the a.eql?(b) comparison.
This can be used to to improve performance. Ruby assumes here that two objects are eql? if the return value of their respective hash methods is the same (and consequently, assumes that two objects returning different hash values to not be eql?).
For a normal a - b operation, this can thus be used to first calculate the hash value of each object once and then only compare those values. This is quite fast.
Now, if you have a custom equality, your best bet would be to overwrite the object's hash methods so that they return suitable values for those semantics.
A common approach is to build an array containing all data taking part of the object's identity and getting its hash, e.g.
class MyObject
#...
attr_accessor :foo, :bar
def hash
[self.class, foo, bar].hash
end
end
In your object's hash method, you would than include all data that is currently considered by your f comparison method. Instead of actually using f then, you are using the default semantics of all Ruby objects and again can achieve quick set operations with your objects.
If however this is not feasible (e.g. because you need different equality semantics based on use-case), you could emulate what ruby does on your own.
With your f method, you could then perform your set operation as follows:
def f_difference(a, b)
a_map = a.each_with_object({}) do |a_el, hash|
hash[a_el.f] = a_el
end
b.each do |b_el|
a_map.delete b_el.f
end
a_map.values
end
With this approach, you only need to calculate the f value of each of your objects once. We first build a hash map with all f values and elements from a and remove the matching elements from b according to their f values. The remaining values are the result.
This approach saves you from having to loop over b for each object in a which can be slow of you have a lot of objects. If however you only have a few objects on each of your arrays, your original approach should already be fine.
Let's have a look at a benchmark whee I use the standard hash method in place of your custom f to have a comparable result.
require 'benchmark/ips'
def question_diff(a, b)
a.select { |ael| b.all? {|bel| ael.hash != bel.hash} }
end
def answer_diff(a, b)
a_map = a.each_with_object({}) do |a_el, hash|
hash[a_el.hash] = a_el
end
b.each do |b_el|
a_map.delete b_el.hash
end
a_map.values
end
A = Array.new(100) { rand(10_000) }
B = Array.new(10) { A.sample }
Benchmark.ips do |x|
x.report("question") { question_diff(A, B) }
x.report("answer") { answer_diff(A, B) }
x.compare!
end
With Ruby 2.7.1, I get the following result on my machine, showing that the original approach from the question is about 5.9 times slower than the optimized version from my answer:
Warming up --------------------------------------
question 1.304k i/100ms
answer 7.504k i/100ms
Calculating -------------------------------------
question 12.779k (± 2.0%) i/s - 63.896k in 5.002006s
answer 74.898k (± 3.3%) i/s - 375.200k in 5.015239s
Comparison:
answer: 74898.0 i/s
question: 12779.3 i/s - 5.86x (± 0.00) slower

How are Symbols faster than Strings in Hash lookups?

I understand one aspect of why Symbols should be used as opposed to Strings in Hashes. Namely that there is only one instance of a given Symbol in memory whereas there could be multiple instances of a given String with the same value.
What I don't understand is how Symbols are faster than Strings in a Hash lookup. I've looked at the answers here, but I still don't quite get it.
If :foo.hash == :foo.object_id returned true, then it would've made some sense because then it'd be able to use the object id as the value for the hash and wouldn't have to compute it every time. However this isn't the case and :foo.object_id is not equal to :foo.hash. Hence my confusion.
There's no obligation for hash to be equivalent to object_id. Those two things serve entirely different purposes. The point of hash is to be as deterministic and yet random as possible so that the values you're inserting into your hash are evenly distributed. The point of object_id is to define a unique object identifier, though there's no requirement that these be random or evenly distributed. In fact, randomizing them is counter-productive, that'd just make things slower for no reason.
The reason symbols tend to be faster is because the memory for them is allocated once (garbage collection issues aside) and recycled for all instances of the same symbol. Strings are not like that. They can be constructed in a multitude of ways, and even two strings that are byte-for-byte identical are likely to be different objects. In fact, it's safer to presume they are than otherwise unless you know for certain they're the same object.
Now when it comes to computing hash, the value must be randomly different even if the string changes very little. Since the symbol can't change computing it can be optimized more. You could just compute a hash of the object_id since that won't change, for example, while the string needs to take into account the content of itself, which is presumably dynamic.
Try benchmarking things:
require 'benchmark'
count = 100000000
Benchmark.bm do |bm|
bm.report('Symbol:') do
count.times { :symbol.hash }
end
bm.report('String:') do
count.times { "string".hash }
end
end
This gives me results like this:
user system total real
Symbol: 6.340000 0.020000 6.360000 ( 6.420563)
String: 11.380000 0.040000 11.420000 ( 11.454172)
Which in this most trivial case is easily 2x faster. Based on some basic testing the performance of the string code degrades O(N) as the strings get longer but the symbol times remain constant.
Just want to add that I do not entirely agree with the numbers that #tadman came up with. On my testing it is at most 1.5 faster to use calcualte '#hash'. I used benchmark/ips to test performance.
require 'benchmark/ips'
Benchmark.ips do |bm|
bm.compare!
bm.report('Symbol:') do
:symbol.hash
end
bm.report('String:') do
'string'.hash
end
end
And this results in
Comparison:
Symbol:: 10741305.8 i/s
String:: 7051559.3 i/s - 1.52x slower
Also if you enable 'frozen string literals' (which will be default in future ruby verions) the difference drops to factor of 1.2:
# frozen_string_literal: true
Comparison:
Symbol:: 9014176.3 i/s
String:: 7532196.9 i/s - 1.20x slower
An additional overhead for strings as hash keys is that since strings are mutable, and also commonly used has hash keys, the Hash class makes a copy of all string keys (likely with a method like dup or clone) in order to protect the integrity of the hash from key damage.
Consider:
irb(main):001:0> a = {}
=> {}
irb(main):002:0> b = "fred"
=> "fred"
irb(main):003:0> a[b] = 42
=> 42
irb(main):004:0> a
=> {"fred"=>42}
irb(main):005:0> b << " flintstone"
=> "fred flintstone"
irb(main):006:0> a
=> {"fred"=>42}
irb(main):007:0> b
=> "fred flintstone"
irb(main):008:0>
irb(main):008:0> b.object_id
=> 17350536
irb(main):009:0> a.keys[0].object_id
=> 15113052
irb(main):010:0>
Symbols are immutable and need no such drastic measures.

What is prefered way to loop in Ruby?

Why is each loop preferred over for loop in Ruby? Is there a difference in time complexity or are they just syntactically different?
Yes, these are two different ways of iterating over, But hope this calculation helps.
require 'benchmark'
a = Array( 1..100000000 )
sum = 0
Benchmark.realtime {
a.each { |x| sum += x }
}
This takes 5.866932 sec
a = Array( 1..100000000 )
sum = 0
Benchmark.realtime {
for x in a
sum += x
end
}
This takes 6.146521 sec.
Though its not a right way to do the benchmarking, there are some other constraints too. But on a single machine, each seems to be a bit faster than for.
The variable referencing an item in iteration is temporary and does not have significance outside of the iteration. It is better if it is hidden from outside of the iteration. With external iterators, such variable is located outside of the iteration block. In the following, e is useful only within do ... end, but is separated from the block, and written outside of it; it does not look easy to a programmer:
for e in [:foo, :bar] do
...
end
With internal iterators, the block variable is defined right inside the block, where it is used. It is easier to read:
[:foo, :bar].each do |e|
...
end
This visibility issue is not just for a programmer. With respect to visibility in the sense of scope, the variable for an external iterator is accessible outside of the iteration:
for e in [:foo] do; end
e # => :foo
whereas in internal iterator, a block variable is invisible from outside:
[:foo].each do |e|; end
e # => undefined local variable or method `e'
The latter is better from the point of view of encapsulation.
When you want to nest the loops, the order of variables would be somewhat backwards with external iterators:
for a in [[:foo, :bar]] do
for e in a do
...
end
end
but with internal iterators, the order is more straightforward:
[[:foo, :bar]].each do |a|
a.each do |e|
...
end
end
With external iterators, you can only use hard-coded Ruby syntax, and you also have to remember the matching between the keyword and the method that is internally called (for calls each), but for internal iterators, you can define your own, which gives flexibility.
each is the Ruby Way. Implements the Iterator Pattern that has decoupling benefits.
Check also this: "for" vs "each" in Ruby
An interesting question. There are several ways of looping in Ruby. I have noted that there is a design principle in Ruby, that when there are multiple ways of doing the same, there are usually subtle differences between them, and each case has its own unique use, its own problem that it solves. So in the end you end up needing to be able to write (and not just to read) all of them.
As for the question about for loop, this is similar to my earlier question whethe for loop is a trap.
Basically there are 2 main explicit ways of looping, one is by iterators (or, more generally, blocks), such as
[1, 2, 3].each { |e| puts e * 10 }
[1, 2, 3].map { |e| e * 10 )
# etc., see Array and Enumerable documentation for more iterator methods.
Connected to this way of iterating is the class Enumerator, which you should strive to understand.
The other way is Pascal-ish looping by while, until and for loops.
for y in [1, 2, 3]
puts y
end
x = 0
while x < 3
puts x; x += 1
end
# same for until loop
Like if and unless, while and until have their tail form, such as
a = 'alligator'
a.chop! until a.chars.last == 'g'
#=> 'allig'
The third very important way of looping is implicit looping, or looping by recursion. Ruby is extremely malleable, all classes are modifiable, hooks can be set up for various events, and this can be exploited to produce most unusual ways of looping. The possibilities are so endless that I don't even know where to start talking about them. Perhaps a good place is the blog by Yusuke Endoh, a well known artist working with Ruby code as his artistic material of choice.
To demonstrate what I mean, consider this loop
class Object
def method_missing sym
s = sym.to_s
if s.chars.last == 'g' then s else eval s.chop end
end
end
alligator
#=> "allig"
Aside of readability issues, the for loop iterates in the Ruby land whereas each does it from native code, so in principle each should be more efficient when iterating all elements in an array.
Loop with each:
arr.each {|x| puts x}
Loop with for:
for i in 0..arr.length
puts arr[i]
end
In the each case we are just passing a code block to a method implemented in the machine's native code (fast code), whereas in the for case, all code must be interpreted and run taking into account all the complexity of the Ruby language.
However for is more flexible and lets you iterate in more complex ways than each does, for example, iterating with a given step.
EDIT
I didn't come across that you can step over a range by using the step() method before calling each(), so the flexibility I claimed for the for loop is actually unjustified.

Chaining partition, keep_if etc

[1,2,3].partition.inject(0) do |acc, x|
x>2 # this line is intended to be used by `partition`
acc+=x # this line is intended to be used by `inject`
end
I know that I can write above stanza using different methods but this is not important here.
What I want to ask why somebody want to use partition (or other methods like keep_if, delete_if) at the beginning of the "chain"?
In my example, after I chained inject I couldn't use partition. I can write above stanza using each:
[1,2,3].each.inject(0) do |acc, x|
x>2 # this line is intended to be used by `partition`
acc+=x # this line is intended to be used by `inject`
end
and it will be the same, right?
I know that x>2 will be discarded (and not used) by partition. Only acc+=x will do the job (sum all elements in this case).
I only wrote that to show my "intention": I want to use partition in the chain like this [].partition.inject(0).
I know that above code won't work as I intended and I know that I can chain after block( }.map as mentioned by Neil Slater).
I wanted to know why, and when partition (and other methods like keep_if, delete_if etc) becomes each (just return elements of the array as partition do in the above cases).
In my example, partition.inject, partition became each because partition cannot take condition (x>2).
However partition.with_index (as mentioned by Boris Stitnicky) works (I can partition array and use index for whatever I want):
shuffled_array
.partition
.with_index { |element, index|
element > index
}
ps. This is not question about how to get sum of elements that are bigger than 2.
This is an interesting situation. Looking at your code examples, you are obviously new to Ruby and perhaps also to programming. Yet you managed to ask a very difficult question that basically concerns the Enumerator class, one of the least publicly understood classes, especially since Enumerator::Lazy was introduced. To me, your question is difficult enough that I am not able to provide a comprehensive answer. Yet the remarks about your code would not fit into a comment under the OP. That's why I'm adding this non-answer.
First of all, let us notice a few awful things in your code:
Useless lines. In both blocks, x>2 line is useless, because its return value is discarded.
[1,2,3].partition.inject(0) do |x, acc|
x>2 # <---- return value of this line is never used
acc+=x
end
[1,2,3].each.inject(0) do |x, acc|
x>2 # <---- return value of this line is never used
acc+=x
end
I will ignore this useless line when discussing your code examples further.
Useless #each method. It is useless to write
[1,2,3].each.inject(0) do |x, acc|
acc+=x
end
This is enough:
[1,2,3].inject(0) do |x, acc|
acc+=x
end
Useless use of #partition method. Instead of:
[1,2,3].partition.inject(0) do |x, acc|
acc+=x
end
You can just write this:
[1,2,3].inject(0) do |x, acc|
acc+=x
end
Or, as I would write it, this:
[ 1, 2, 3 ].inject :+
But then, you ask a deep question about using #partition method in the enumerator mode. Having discussed the trivial newbie problems of your code, we are left with the question how exactly the enumerator-returning versions of the #partition, #keep_if etc. should be used, or rather, what are the interesting way of using them, because everyone knows that we can use them for chaining:
array = [ *1..6 ]
shuffled_arrray = array.shuffle # randomly shuffles the array elements
shuffled_array
.partition # partition enumerator comes into play
.with_index { |element, index| # method Enumerator#with_index comes into play
element > index # and partitions elements into those greater
} # than their index, and those smaller
And also like this:
e = partition_enumerator_of_array = array.partition
# And then, we can partition the array in many ways:
e.each &:even? # partitions into odd / even numbers
e.each { rand() > 0.5 } # partitions the array randomly
# etc.
An easily understood advantage is that instead of writing longer:
array.partition &:even?
You can write shorter:
e.each &:even?
But I am basically sure that enumerators provide more power to the programmer than just chaining collection methods and shortening code a little bit. Because different enumerators do very different things. Some, such as #map! or #reject!, can even modify the collection on which they operate. In this case, it is imaginable that one could combine different enumerators with the same block to do different things. This ability to vary not just the blocks, but also the enumerators to which they are passed, gives combinatorial power, which can very likely be used to make some otherwise lengthy code very concise. But I am unable to provide a very useful concrete example of this.
In sum, Enumerator class is here mainly for chaining, and to use chaining, programmers do not really need to undestand Enumerator in detail. But I suspect that the correct habits regarding the use of Enumerator might be as difficult to learn as, for instance, correct habits of parametrized subclassing. I suspect I have not grasped the most powerful ways to use enumerators yet.
I think that the result [3, 3] is what you are looking for here - partitioning the array into smaller and larger numbers then summing each group. You seem to be confused about how you give the block "rules" to the two different methods, and have merged what should be two blocks into one.
If you need the net effects of many methods that each take a block, then you can chain after any block, by adding the .method after the close of the block like this: }.each or end.each
Also note that if you create partitions, you are probably wanting to sum over each partition separately. To do that you will need an extra link in the chain (in this case a map):
[1,2,3].partition {|x| x > 2}.map do |part|
part.inject(0) do |acc, x|
x + acc
end
end
# => [3, 3]
(You also got the accumulator and current value wrong way around in the inject, and there is no need to assign to the accumulator, Ruby does that for you).
The .inject is no longer in a method chain, instead it is inside a block. There is no problem with blocks inside other blocks, in fact you will see this very often in Ruby code.
I have chained .partition and .map in the above example. You could also write the above like this:
[1,2,3].partition do
|x| x > 2
end.map do |part|
part.inject(0) do |acc, x|
x + acc
end
end
. . . although when chaining with short blocks, I personally find it easier to use the { } syntax instead of do end, especially at the start of a chain.
If it all starts to look complex, there is not usually a high cost to assigning the results of the first part of a chain to a local variable, in which case there is no chain at all.
parts = [1,2,3].partition {|x| x > 2}
parts.map do |part|
part.inject(0) do |acc, x|
x + acc
end
end

How to sum properties of the objects within an array in Ruby

I understand that in order to sum array elements in Ruby one can use the inject method, i.e.
array = [1,2,3,4,5];
puts array.inject(0, &:+)
But how do I sum the properties of objects within an object array e.g.?
There's an array of objects and each object has a property "cash" for example. So I want to sum their cash balances into one total. Something like...
array.cash.inject(0, &:+) # (but this doesn't work)
I realise I could probably make a new array composed only of the property cash and sum this, but I'm looking for a cleaner method if possible!
array.map(&:cash).inject(0, &:+)
or
array.inject(0){|sum,e| sum + e.cash }
In Ruby On Rails you might also try:
array.sum(&:cash)
Its a shortcut for the inject business and seems more readable to me.
http://api.rubyonrails.org/classes/Enumerable.html
#reduce takes a block (the &:+ is a shortcut to create a proc/block that does +). This is one way of doing what you want:
array.reduce(0) { |sum, obj| sum + obj.cash }
Most concise way:
array.map(&:cash).sum
If the resulting array from the map has nil items:
array.map(&:cash).compact.sum
If start value for the summation is 0, then sum alone is identical to inject:
array.map(&:cash).sum
And I would prefer the block version:
array.sum { |a| a.cash }
Because the Proc from symbol is often too limited (no parameters, etc.).
(Needs Active_Support)
Here some interesting benchmarks
array = Array.new(1000) { OpenStruct.new(property: rand(1000)) }
Benchmark.ips do |x|
x.report('map.sum') { array.map(&:property).sum }
x.report('inject(0)') { array.inject(0) { |sum, x| sum + x.property } }
x.compare!
end
And results
Calculating -------------------------------------
map.sum 249.000 i/100ms
inject(0) 268.000 i/100ms
-------------------------------------------------
map.sum 2.947k (± 5.1%) i/s - 14.691k
inject(0) 3.089k (± 5.4%) i/s - 15.544k
Comparison:
inject(0): 3088.9 i/s
map.sum: 2947.5 i/s - 1.05x slower
As you can see inject a little bit faster
There's no need to use initial in inject and plus operation can be shorter
array.map(&:cash).inject(:+)

Resources