I understand that in order to sum array elements in Ruby one can use the inject method, i.e.
array = [1,2,3,4,5];
puts array.inject(0, &:+)
But how do I sum the properties of objects within an object array e.g.?
There's an array of objects and each object has a property "cash" for example. So I want to sum their cash balances into one total. Something like...
array.cash.inject(0, &:+) # (but this doesn't work)
I realise I could probably make a new array composed only of the property cash and sum this, but I'm looking for a cleaner method if possible!
array.map(&:cash).inject(0, &:+)
or
array.inject(0){|sum,e| sum + e.cash }
In Ruby On Rails you might also try:
array.sum(&:cash)
Its a shortcut for the inject business and seems more readable to me.
http://api.rubyonrails.org/classes/Enumerable.html
#reduce takes a block (the &:+ is a shortcut to create a proc/block that does +). This is one way of doing what you want:
array.reduce(0) { |sum, obj| sum + obj.cash }
Most concise way:
array.map(&:cash).sum
If the resulting array from the map has nil items:
array.map(&:cash).compact.sum
If start value for the summation is 0, then sum alone is identical to inject:
array.map(&:cash).sum
And I would prefer the block version:
array.sum { |a| a.cash }
Because the Proc from symbol is often too limited (no parameters, etc.).
(Needs Active_Support)
Here some interesting benchmarks
array = Array.new(1000) { OpenStruct.new(property: rand(1000)) }
Benchmark.ips do |x|
x.report('map.sum') { array.map(&:property).sum }
x.report('inject(0)') { array.inject(0) { |sum, x| sum + x.property } }
x.compare!
end
And results
Calculating -------------------------------------
map.sum 249.000 i/100ms
inject(0) 268.000 i/100ms
-------------------------------------------------
map.sum 2.947k (± 5.1%) i/s - 14.691k
inject(0) 3.089k (± 5.4%) i/s - 15.544k
Comparison:
inject(0): 3088.9 i/s
map.sum: 2947.5 i/s - 1.05x slower
As you can see inject a little bit faster
There's no need to use initial in inject and plus operation can be shorter
array.map(&:cash).inject(:+)
Related
I want to know which approach is more efficient in comparing two classes.
Approach 1:
a = '123'
a.class.name == 'String'
Approach 2:
a = '123'
a.kind_of? String
Any pointers would be really appreciated. Thanks!
I would argue is_a? is the correct one. Not because of performance but because of correctness.
Because it is perfectly fine to have multiple String classes in your app, it is not enough to just compare the class name. .class.name == 'String' only returns if the name of the class is "String", but it doesn't tell if it is the same String class that will be returned when you call String in the current context.
Whereas kind_of? does not only check if a is an instance of String. It would also return true if a is an instance of a subclass of String.
You asked about what approach is most efficient but did not tell how you define efficient in the context of your question. is_a? is the shortest, I would argue that is efficient. When you were thinking about performance, have a look at this:
require 'benchmark/ips'
string = "123"
Benchmark.ips do |x|
x.report("name == name") { string.class.name == "String" }
x.report("kind_of?") { string.kind_of?(String) }
x.report("is_a?") { string.is_a?(String) }
end
Warming up --------------------------------------
name == name 585.361k i/100ms
kind_of? 1.173M i/100ms
is_a? 1.299M i/100ms
Calculating -------------------------------------
name == name 5.870M (± 4.6%) i/s - 29.853M in 5.099899s
kind_of? 12.803M (± 3.0%) i/s - 64.514M in 5.043457s
is_a? 12.971M (± 3.6%) i/s - 64.935M in 5.012808s
If I have Arrays a and b, the expression a-b returns an Array with all those elements in a which are not in b. "Not in" means unequality (!=) here.
In my case, both arrays only contain elements of the same type (or, from the ducktyping perspective, only elements which understand a "equality" method f).
Is there an easy way to specify this f as a criterium of equality, in a similar way I can provide my own comparator when doing sort? Currently, I implemented this explicitly :
# Get the difference a-b, based on 'f':
a.select { |ael| b.all? {|bel| ael.f != bel.f} }
This works, but I wonder if there is an easier way.
UPDATE: From the comments to this question, I get the impression, that a concrete example would be appreciated. So, here we go:
class Dummy; end
# Create an Array of Dummy objects.
a = Array.new(99) { Dummy.new }
# Pick some of them at random
b = Array.new(10) { a.sample }
# Now I want to get those elements from a, which are not in b.
diff = a.select { |ael| b.all? {|bel| ael.object_id != bel.object_id} }
Of course in this case, I could also have said ! ael eql? bel, but in my general solution, this is not the case.
The "normal" object equality for e.g. Hashes and set operations on Arrays (such as the - operation) uses the output of the Object#hash method of the contained objects along with the semantics of the a.eql?(b) comparison.
This can be used to to improve performance. Ruby assumes here that two objects are eql? if the return value of their respective hash methods is the same (and consequently, assumes that two objects returning different hash values to not be eql?).
For a normal a - b operation, this can thus be used to first calculate the hash value of each object once and then only compare those values. This is quite fast.
Now, if you have a custom equality, your best bet would be to overwrite the object's hash methods so that they return suitable values for those semantics.
A common approach is to build an array containing all data taking part of the object's identity and getting its hash, e.g.
class MyObject
#...
attr_accessor :foo, :bar
def hash
[self.class, foo, bar].hash
end
end
In your object's hash method, you would than include all data that is currently considered by your f comparison method. Instead of actually using f then, you are using the default semantics of all Ruby objects and again can achieve quick set operations with your objects.
If however this is not feasible (e.g. because you need different equality semantics based on use-case), you could emulate what ruby does on your own.
With your f method, you could then perform your set operation as follows:
def f_difference(a, b)
a_map = a.each_with_object({}) do |a_el, hash|
hash[a_el.f] = a_el
end
b.each do |b_el|
a_map.delete b_el.f
end
a_map.values
end
With this approach, you only need to calculate the f value of each of your objects once. We first build a hash map with all f values and elements from a and remove the matching elements from b according to their f values. The remaining values are the result.
This approach saves you from having to loop over b for each object in a which can be slow of you have a lot of objects. If however you only have a few objects on each of your arrays, your original approach should already be fine.
Let's have a look at a benchmark whee I use the standard hash method in place of your custom f to have a comparable result.
require 'benchmark/ips'
def question_diff(a, b)
a.select { |ael| b.all? {|bel| ael.hash != bel.hash} }
end
def answer_diff(a, b)
a_map = a.each_with_object({}) do |a_el, hash|
hash[a_el.hash] = a_el
end
b.each do |b_el|
a_map.delete b_el.hash
end
a_map.values
end
A = Array.new(100) { rand(10_000) }
B = Array.new(10) { A.sample }
Benchmark.ips do |x|
x.report("question") { question_diff(A, B) }
x.report("answer") { answer_diff(A, B) }
x.compare!
end
With Ruby 2.7.1, I get the following result on my machine, showing that the original approach from the question is about 5.9 times slower than the optimized version from my answer:
Warming up --------------------------------------
question 1.304k i/100ms
answer 7.504k i/100ms
Calculating -------------------------------------
question 12.779k (± 2.0%) i/s - 63.896k in 5.002006s
answer 74.898k (± 3.3%) i/s - 375.200k in 5.015239s
Comparison:
answer: 74898.0 i/s
question: 12779.3 i/s - 5.86x (± 0.00) slower
I have a class that handles a large array of IDs. I need them to be in Initialize, but will only use them in certain circumstances.
Will instantiating an object with an array of 6 million ids be slower than instantiating an object with an array of 4 ids? (in the scenario where this data is not used)
No matter how big your array. Variables in ruby are references to objects. Which means in both cases you send a pointer, but not real data as a parameter.
require 'benchmark/ips'
class A
def initialize(arr)
#arr = arr
end
def mutate
#arr[0] = 11 # once this code been launched, it will change original object.
end
end
# Do not create test data in the bench!
big_one = 6_000_000.times.map { |_| rand(10)}
small_one = [1,2,3,4]
Benchmark.ips do |x|
x.report do
A.new(big_one)
end
x.report do
A.new(small_one)
end
x.compare!
end
So, result:
Warming up --------------------------------------
125.218k i/100ms
128.972k i/100ms
Calculating -------------------------------------
3.422M (± 0.7%) i/s - 17.155M in 5.014048s
3.485M (± 0.5%) i/s - 17.540M in 5.033405s
Note: you cannot use methods like #to_a in the benchmark, (1..6_000_000) range conversion to array is a slow operation, which is affect on final score.
Yes, it is bad for performance. To prove this, I created this test, where I made a class with an instance variable for the ids. I then ran a simple unrelated method adding a number to itself.
require 'benchmark/ips'
class TestSize
def initialize(ids)
#ids = ids
end
def simple_task(n)
n + n
end
end
Benchmark.ips do |x|
x.report('4 ids') do
test = TestSize.new((1..4).to_a)
test.simple_task(3)
end
x.report('6 million') do
test = TestSize.new((1..6_000_000).to_a)
test.simple_task(3)
end
x.compare!
end
Here are the results
Warming up --------------------------------------
4 ids 112.545k i/100ms
6 million 1.000 i/100ms
Calculating -------------------------------------
4 ids 1.557M (± 5.0%) i/s - 7.766M in 5.001166s
6 million 5.947 (± 0.0%) i/s - 30.000 in 5.077560s
Comparison:
4 ids: 1557013.9 i/s
6 million: 5.9 i/s - 261822.49x slower
So you can see it is much slower, and using much more memory.
I use benchmark-ips all the time to test ideas like this. https://github.com/evanphx/benchmark-ips
I'm running OSX 10.12.6, ruby 2.4.1p111
RuboCop suggests:
Use Array.new with a block instead of .times.map.
In the docs for the cop:
This cop checks for .times.map calls. In most cases such calls can be replaced with an explicit array creation.
Examples:
# bad
9.times.map do |i|
i.to_s
end
# good
Array.new(9) do |i|
i.to_s
end
I know it can be replaced, but I feel 9.times.map is closer to English grammar, and it's easier to understand what the code does.
Why should it be replaced?
The latter is more performant; here is an explanation: Pull request where this cop was added
It checks for calls like this:
9.times.map { |i| f(i) }
9.times.collect(&foo)
and suggests using this instead:
Array.new(9) { |i| f(i) }
Array.new(9, &foo)
The new code has approximately the same size, but uses fewer method
calls, consumes less memory, works a tiny bit faster and in my opinion
is more readable.
I've seen many occurrences of times.{map,collect} in different
well-known projects: Rails, GitLab, Rubocop and several closed-source
apps.
Benchmarks:
Benchmark.ips do |x|
x.report('times.map') { 5.times.map{} }
x.report('Array.new') { Array.new(5){} }
x.compare!
end
__END__
Calculating -------------------------------------
times.map 21.188k i/100ms
Array.new 30.449k i/100ms
-------------------------------------------------
times.map 311.613k (± 3.5%) i/s - 1.568M
Array.new 590.374k (± 1.2%) i/s - 2.954M
Comparison:
Array.new: 590373.6 i/s
times.map: 311612.8 i/s - 1.89x slower
I'm not sure now that Lint is the correct namespace for the cop. Let
me know if I should move it to Performance.
Also I didn't implement autocorrection because it can potentially
break existing code, e.g. if someone has Fixnum#times method redefined
to do something fancy. Applying autocorrection would break their code.
If you feel it is more readable, go with it.
This is a performance rule and most codepaths in your application are probably not performance critical. Personally, I am always open to favor readability over premature optimization.
That said
100.times.map { ... }
times creates an Enumerator object
map enumerates over that object without being able to optimize, for example the size of the array is not known upfront and it might have to reallocate more space dynamically and it has to enumerate over the values by calling Enumerable#each since map is implemented that way
Whereas
Array.new(100) { ... }
new allocates an array of size N
And then uses a native loop to fill in the values
When you need to map the result of a block invoked a fixed amount of times, you have an option between:
Array.new(n) { ... }
and:
n.times.map { ... }
The latter one is about 60% slower for n = 10, which goes down to around 40% for n > 1_000.
Note: logarithmic scale!
I'm working on a mini project for a summer class. I'd like some feedback on the code I have written, especially part 3.
Here's the question:
Create an array called numbers containing the integers 1 - 10 and assign it to a variable.
Create an empty array called even_numbers.
Create a method that iterates over the array. Place all even numbers in the array even_numbers.
Print the array even_numbers.
Here's my code, so far:
numbers = [1,2,3,4,5,6,7,8,9,10]
print numbers[3]
even_numbers.empty?
def even_numbers
numbers.sort!
end
Rather than doing explicit iteration, the best way is likely Array#select thus:
even_numbers = numbers.select { |n| n.even? }
which will run the block given on each element in the array numbers and produce an array containing all elements for which the block returned true.
or an alternative solution following the convention of your problem:
def get_even_numbers(array)
even_num = []
array.each do |n|
even_num << n if n.even?
end
even_num
end
and of course going for the select method is always preferred.