I've often heard Ruby's inject method criticized as being "slow." As I rather like the function, and see equivalents in other languages, I'm curious if it's merely Ruby's implementation of the method that's slow, or if it is inherently a slow way to do things (e.g. should be avoided for non-small collections)?
inject is like fold, and can be very efficient in other languages, fold_left specifically, since it's tail-recursive.
It's mostly an implementation issue, but this gives you a good idea of the comparison:
$ ruby -v
ruby 1.8.7 (2008-08-11 patchlevel 72) [i486-linux]
$ ruby exp/each_v_inject.rb
Rehearsal -----------------------------------------------------
loop 0.000000 0.000000 0.000000 ( 0.000178)
fixnums each 0.790000 0.280000 1.070000 ( 1.078589)
fixnums each add 1.010000 0.290000 1.300000 ( 1.297733)
Enumerable#inject 1.900000 0.430000 2.330000 ( 2.330083)
-------------------------------------------- total: 4.700000sec
user system total real
loop 0.000000 0.000000 0.000000 ( 0.000178)
fixnums each 0.760000 0.300000 1.060000 ( 1.079252)
fixnums each add 1.030000 0.280000 1.310000 ( 1.305888)
Enumerable#inject 1.850000 0.490000 2.340000 ( 2.340341)
exp/each_v_inject.rb
require 'benchmark'
total = (ENV['TOTAL'] || 1_000).to_i
fixnums = Array.new(total) {|x| x}
Benchmark.bmbm do |x|
x.report("loop") do
total.times { }
end
x.report("fixnums each") do
total.times do |i|
fixnums.each {|x| x}
end
end
x.report("fixnums each add") do
total.times do |i|
v = 0
fixnums.each {|x| v += x}
end
end
x.report("Enumerable#inject") do
total.times do |i|
fixnums.inject(0) {|a,x| a + x }
end
end
end
So yes it is slow, but as improvements occur in the implementation it should become a non-issue. There is nothing inherent about WHAT it is doing that requires it to be slower.
each_with_object may be faster than inject, if you're mutating an existing object rather than creating a new object in each block.
Related
I did some benchmarks:
require 'benchmark'
words = File.open('/usr/share/dict/words', 'r') do |file|
file.each_line.take(1_000_000).map(&:chomp)
end
Benchmark.bmbm(20) do |x|
GC.start
x.report(:map) do
words.map do |word|
word.size if word.size > 5
end.compact
end
GC.start
x.report(:each_with_object) do
words.each_with_object([]) do |word, long_sizes|
long_sizes << word.size if word.size > 5
end
end
end
Output (ruby 2.3.0):
Rehearsal --------------------------------------------------------
map 0.020000 0.000000 0.020000 ( 0.016906)
each_with_object 0.020000 0.000000 0.020000 ( 0.024695)
----------------------------------------------- total: 0.040000sec
user system total real
map 0.010000 0.000000 0.010000 ( 0.015004)
each_with_object 0.020000 0.000000 0.020000 ( 0.024183)
I cannot understand it because I thought that each_with_object should be faster: it needs only 1 loop and 1 new object to create a new array instead of 2 loops and 2 new objects in case when we combine map and compact.
Any ideas?
Array#<< needs to reallocate memory if the original memory space doesn't have enough room to hold the new item. See the implementation, especially this line
VALUE target_ary = ary_ensure_room_for_push(ary, 1);
While Array#map doesn't have to reallocate memory from time to time because it already knows the size of the result array. See the implementation, especially
collect = rb_ary_new2(RARRAY_LEN(ary));
which allocates the same size of memory as the original array.
I recently installed Ruby 2.0.0 and found that it now has a lazy method for the Enumerable mixin. From previous experience in functional languages, I know that this makes for more efficient code.
I did a benchmark (not sure if it is moot) of lazy versus eager and found that lazy was continually faster. Why is this? What makes lazy evaluation better for large input?
Benchmark code:
#!/usr/bin/env ruby
require 'benchmark'
num = 1000
arr = (1..50000).to_a
Benchmark.bm do |rep|
rep.report('lazy') { num.times do ; arr.lazy.map { |x| x * 2 }; end }
rep.report('eager') { num.times do ; arr.map { |x| x * 2}; end }
end
Benchmark report sample:
user system total real
lazy 0.000000 0.000000 0.000000 ( 0.009502)
eager 5.550000 0.480000 6.030000 ( 6.231269)
It's so lazy it's not even doing the work - probably because you're not actually using the results of the operation. Put a sleep() in there to confirm:
> Benchmark.bm do |rep|
rep.report('lazy') { num.times do ; arr.lazy.map { |x| sleep(5) }; end }
rep.report('notlazy') { 1.times do ; [0,1].map { |x| sleep(5) } ; end }
end
user system total real
lazy 0.010000 0.000000 0.010000 ( 0.007130)
notlazy 0.000000 0.000000 0.000000 ( 10.001788)
I have learned two array sorting methods in Ruby:
array = ["one", "two", "three"]
array.sort.reverse!
or:
array = ["one", "two", "three"]
array.sort { |x,y| y<=>x }
And I am not able to differentiate between the two. Which method is better and how exactly are they different in execution?
Both lines do the same (create a new array, which is reverse sorted). The main argument is about readability and performance. array.sort.reverse! is more readable than array.sort{|x,y| y<=>x} - I think we can agree here.
For the performance part, I created a quick benchmark script, which gives the following on my system (ruby 1.9.3p392 [x86_64-linux]):
user system total real
array.sort.reverse 1.330000 0.000000 1.330000 ( 1.334667)
array.sort.reverse! 1.200000 0.000000 1.200000 ( 1.198232)
array.sort!.reverse! 1.200000 0.000000 1.200000 ( 1.199296)
array.sort{|x,y| y<=>x} 5.220000 0.000000 5.220000 ( 5.239487)
Run times are pretty constant for multiple executions of the benchmark script.
array.sort.reverse (with or without !) is way faster than array.sort{|x,y| y<=>x}. Thus, I recommend that.
Here is the script as a Reference:
#!/usr/bin/env ruby
require 'benchmark'
Benchmark.bm do|b|
master = (1..1_000_000).map(&:to_s).shuffle
a = master.dup
b.report("array.sort.reverse ") do
a.sort.reverse
end
a = master.dup
b.report("array.sort.reverse! ") do
a.sort.reverse!
end
a = master.dup
b.report("array.sort!.reverse! ") do
a.sort!.reverse!
end
a = master.dup
b.report("array.sort{|x,y| y<=>x} ") do
a.sort{|x,y| y<=>x}
end
end
There really is no difference here. Both methods return a new array.
For the purposes of this example, simpler is better. I would recommend array.sort.reverse because it is much more readable than the alternative. Passing blocks to methods like sort should be saved for arrays of more complex data structures and user-defined classes.
Edit: While destructive methods (anything ending in a !) are good for performance games, it was pointed out that they aren't required to return an updated array, or anything at all for that matter. It is important to keep this in mind because array.sort.reverse! could very likely return nil. If you wish to use a destructive method on a newly generated array, you should prefer calling .reverse! on a separate line instead of having a one-liner.
Example:
array = array.sort
array.reverse!
should be preferred to
array = array.sort.reverse!
Reverse! is Faster
There's often no substitute for benchmarking. While it probably makes no difference in shorter scripts, the #reverse! method is significantly faster than sorting using the "spaceship" operator. For example, on MRI Ruby 2.0, and given the following benchmark code:
require 'benchmark'
array = ["one", "two", "three"]
loops = 1_000_000
Benchmark.bmbm do |bm|
bm.report('reverse!') { loops.times {array.sort.reverse!} }
bm.report('spaceship') { loops.times {array.sort {|x,y| y<=>x} }}
end
the system reports that #reverse! is almost twice as fast as using the combined comparison operator.
user system total real
reverse! 0.340000 0.000000 0.340000 ( 0.344198)
spaceship 0.590000 0.010000 0.600000 ( 0.595747)
My advice: use whichever is more semantically meaningful in a given context, unless you're running in a tight loop.
With comparison as simple as your example, there is not much difference, but as the formula for comparison gets complicated, it is better to avoid using <=> with a block because the block you pass will be evaluated for each element of the array, causing redundancy. Consider this:
array.sort{|x, y| some_expensive_method(x) <=> some_expensive_method(y)}
In this case, some_expensive_method will be evaluated for each possible pair of element of array.
In your particular case, use of a block with <=> can be avoided with reverse.
array.sort_by{|x| some_expensive_method(x)}.reverse
This is called Schwartzian transform.
In playing with tessi's benchmarks on my machine, I've gotten some interesting results. I'm running ruby 2.0.0p195 [x86_64-darwin12.3.0], i.e., latest release of Ruby 2 on an OS X system. I used bmbm rather than bm from the Benchmark module. My timings are:
Rehearsal -------------------------------------------------------------
array.sort.reverse: 1.010000 0.000000 1.010000 ( 1.020397)
array.sort.reverse!: 0.810000 0.000000 0.810000 ( 0.808368)
array.sort!.reverse!: 0.800000 0.010000 0.810000 ( 0.809666)
array.sort{|x,y| y<=>x}: 0.300000 0.000000 0.300000 ( 0.291002)
array.sort!{|x,y| y<=>x}: 0.100000 0.000000 0.100000 ( 0.105345)
---------------------------------------------------- total: 3.030000sec
user system total real
array.sort.reverse: 0.210000 0.000000 0.210000 ( 0.208378)
array.sort.reverse!: 0.030000 0.000000 0.030000 ( 0.027746)
array.sort!.reverse!: 0.020000 0.000000 0.020000 ( 0.020082)
array.sort{|x,y| y<=>x}: 0.110000 0.000000 0.110000 ( 0.107065)
array.sort!{|x,y| y<=>x}: 0.110000 0.000000 0.110000 ( 0.105359)
First, note that in the Rehearsal phase that sort! using a comparison block comes in as the clear winner. Matz must have tuned the heck out of it in Ruby 2!
The other thing that I found exceedingly weird was how much improvement array.sort.reverse! and array.sort!.reverse! exhibited in the production pass. It was so extreme it made me wonder whether I had somehow screwed up and passed these already sorted data, so I added explicit checks for sorted or reverse-sorted data prior to performing each benchmark.
My variant of tessi's script follows:
#!/usr/bin/env ruby
require 'benchmark'
class Array
def sorted?
(1...length).each {|i| return false if self[i] < self[i-1] }
true
end
def reversed?
(1...length).each {|i| return false if self[i] > self[i-1] }
true
end
end
master = (1..1_000_000).map(&:to_s).shuffle
Benchmark.bmbm(25) do|b|
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort.reverse:") { a.sort.reverse }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort.reverse!:") { a.sort.reverse! }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort!.reverse!:") { a.sort!.reverse! }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort{|x,y| y<=>x}:") { a.sort{|x,y| y<=>x} }
a = master.dup
puts "uh-oh!" if a.sorted?
puts "oh-uh!" if a.reversed?
b.report("array.sort!{|x,y| y<=>x}:") { a.sort!{|x,y| y<=>x} }
end
I read in the documentation for the String class that eql? is a strict equality operator, without type conversion, and == is a equality operator which tries to convert second its argument to a String, and, the C source code for this methods confirms that:
The eql? source code:
static VALUE
rb_str_eql(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (TYPE(str2) != T_STRING) return Qfalse;
return str_eql(str1, str2);
}
The == source code:
VALUE
rb_str_equal(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (TYPE(str2) != T_STRING) {
if (!rb_respond_to(str2, rb_intern("to_str"))) {
return Qfalse;
}
return rb_equal(str2, str1);
}
return str_eql(str1, str2);
}
But when I tried to benchmark these methods, I was suprised that == is faster than eql? by up to 20%!
My benchmark code is:
require "benchmark"
RUN_COUNT = 100000000
first_string = "Woooooha"
second_string = "Woooooha"
time = Benchmark.measure do
RUN_COUNT.times do |i|
first_string.eql?(second_string)
end
end
puts time
time = Benchmark.measure do
RUN_COUNT.times do |i|
first_string == second_string
end
end
puts time
And results:
Ruby 1.9.3-p125:
26.420000 0.250000 26.670000 ( 26.820762)
21.520000 0.200000 21.720000 ( 21.843723)
Ruby 1.9.2-p290:
25.930000 0.280000 26.210000 ( 26.318998)
19.800000 0.130000 19.930000 ( 19.991929)
So, can anyone explain why the more simple eql? method is slower than == method in the case when I run it for two similar strings?
The reason you are seeing a difference is not related to the implementation of == vs eql? but is due to the fact that Ruby optimizes operators (like ==) to avoid going through the normal method lookup when possible.
We can verify this in two ways:
Create an alias for == and call that instead. You'll get similar results to eql? and thus slower results than ==.
Compare using send :== and send :eql? instead and you'll get similar timings; the speed difference disappears because Ruby will only use the optimization for direct calls to the operators, not with using send or __send__.
Here's code that shows both:
require 'fruity'
first = "Woooooha"
second = "Woooooha"
class String
alias same_value? ==
end
compare do
with_operator { first == second }
with_same_value { first.same_value? second }
with_eql { first.eql? second }
end
compare do
with_send_op { first.send :==, second }
with_send_eql { first.send :eql?, second }
end
Results:
with_operator is faster than with_same_value by 2x ± 0.1
with_same_value is similar to with_eql
with_send_eql is similar to with_send_op
If you're the curious, the optimizations for operators are in insns.def.
Note: this answer applies only to Ruby MRI, I would be surprised if there was a speed difference in JRuby / rubinius, for instance.
When doing benchmarks, don't use times, because that creates a closure RUN_COUNT times. The extra time taken as a result affects all benchmarks equally in absolute terms, but that makes it harder to notice a relative difference:
require "benchmark"
RUN_COUNT = 10_000_000
FIRST_STRING = "Woooooha"
SECOND_STRING = "Woooooha"
def times_eq_question_mark
RUN_COUNT.times do |i|
FIRST_STRING.eql?(SECOND_STRING)
end
end
def times_double_equal_sign
RUN_COUNT.times do |i|
FIRST_STRING == SECOND_STRING
end
end
def loop_eq_question_mark
i = 0
while i < RUN_COUNT
FIRST_STRING.eql?(SECOND_STRING)
i += 1
end
end
def loop_double_equal_sign
i = 0
while i < RUN_COUNT
FIRST_STRING == SECOND_STRING
i += 1
end
end
1.upto(10) do |i|
method_names = [:times_eq_question_mark, :times_double_equal_sign, :loop_eq_question_mark, :loop_double_equal_sign]
method_times = method_names.map {|method_name| Benchmark.measure { send(method_name) } }
puts "Run #{i}"
method_names.zip(method_times).each do |method_name, method_time|
puts [method_name, method_time].join("\t")
end
puts
end
gives
Run 1
times_eq_question_mark 3.500000 0.000000 3.500000 ( 3.578011)
times_double_equal_sign 2.390000 0.000000 2.390000 ( 2.453046)
loop_eq_question_mark 3.110000 0.000000 3.110000 ( 3.140525)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.124932)
Run 2
times_eq_question_mark 3.531000 0.000000 3.531000 ( 3.562386)
times_double_equal_sign 2.469000 0.000000 2.469000 ( 2.484295)
loop_eq_question_mark 3.063000 0.000000 3.063000 ( 3.109276)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.140556)
Run 3
times_eq_question_mark 3.547000 0.000000 3.547000 ( 3.593635)
times_double_equal_sign 2.437000 0.000000 2.437000 ( 2.453047)
loop_eq_question_mark 3.063000 0.000000 3.063000 ( 3.109275)
loop_double_equal_sign 2.140000 0.000000 2.140000 ( 2.140557)
Run 4
times_eq_question_mark 3.547000 0.000000 3.547000 ( 3.578011)
times_double_equal_sign 2.422000 0.000000 2.422000 ( 2.437422)
loop_eq_question_mark 3.094000 0.000000 3.094000 ( 3.140524)
loop_double_equal_sign 2.140000 0.000000 2.140000 ( 2.140557)
Run 5
times_eq_question_mark 3.578000 0.000000 3.578000 ( 3.671758)
times_double_equal_sign 2.406000 0.000000 2.406000 ( 2.468671)
loop_eq_question_mark 3.110000 0.000000 3.110000 ( 3.156149)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.156181)
Run 6
times_eq_question_mark 3.562000 0.000000 3.562000 ( 3.562386)
times_double_equal_sign 2.407000 0.000000 2.407000 ( 2.468671)
loop_eq_question_mark 3.109000 0.000000 3.109000 ( 3.124900)
loop_double_equal_sign 2.125000 0.000000 2.125000 ( 2.234303)
Run 7
times_eq_question_mark 3.500000 0.000000 3.500000 ( 3.546762)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.468671)
loop_eq_question_mark 3.031000 0.000000 3.031000 ( 3.171773)
loop_double_equal_sign 2.157000 0.000000 2.157000 ( 2.156181)
Run 8
times_eq_question_mark 3.468000 0.000000 3.468000 ( 3.656133)
times_double_equal_sign 2.454000 0.000000 2.454000 ( 2.484296)
loop_eq_question_mark 3.093000 0.000000 3.093000 ( 3.249896)
loop_double_equal_sign 2.125000 0.000000 2.125000 ( 2.140556)
Run 9
times_eq_question_mark 3.563000 0.000000 3.563000 ( 3.593635)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.453047)
loop_eq_question_mark 3.125000 0.000000 3.125000 ( 3.124900)
loop_double_equal_sign 2.141000 0.000000 2.141000 ( 2.156181)
Run 10
times_eq_question_mark 3.515000 0.000000 3.515000 ( 3.562386)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.453046)
loop_eq_question_mark 3.094000 0.000000 3.094000 ( 3.140525)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.156181)
equal? is reference equality
== is value equality
eql? is value and type equality
The third method, eql? is normally used to test if two objects have the same value as well as the same type. For example:
puts "integer == to float: #{25 == 25.0}"
puts "integer eql? to float: #{25.eql? 25.0}"
gives:
Does integer == to float: true
Does integer eql? to float: false
So I thought since eql? does more checking it would be slower, and for strings it is, at least on my Ruby 1.93. So I figured it must be type dependent and did some tests.
When integer and floats are compared eql? is a bit faster. When integers are compared == is much faster, until x2. Wrong theory, back to start.
The next theory: comparing two values of the same type will be faster with one of both proved to be true, in the case they are of the same type == is always faster, eql? is faster when types are different, again until x2.
Don't have the time to compare all types but I'm sure you'll get varying results, although the same kind of comparison always gives similar results. Can somebody prove me wrong?
Here are my results from the test of the OP:
16.863000 0.000000 16.863000 ( 16.903000) 2 strings with eql?
14.212000 0.000000 14.212000 ( 14.334600) 2 strings with ==
13.213000 0.000000 13.213000 ( 13.245600) integer and floating with eql?
14.103000 0.000000 14.103000 ( 14.200400) integer and floating with ==
13.229000 0.000000 13.229000 ( 13.410800) 2 same integers with eql?
9.406000 0.000000 9.406000 ( 9.410000) 2 same integers with ==
19.625000 0.000000 19.625000 ( 19.720800) 2 different integers with eql?
9.407000 0.000000 9.407000 ( 9.405800) 2 different integers with ==
21.825000 0.000000 21.825000 ( 21.910200) integer with string with eql?
43.836000 0.031000 43.867000 ( 44.074200) integer with string with ==
I am just starting to learn Ruby (first time programming), and have a basic syntactical question with regards to variables, and various ways of writing code.
Chris Pine's "Learn to Program" taught me to write a basic program like this...
num_cars_again= 2
puts 'I own ' + num_cars_again.to_s + ' cars.'
This is fine, but then I stumbled across the tutorial on ruby.learncodethehardway.com, and was taught to write the same exact program like this...
num_cars= 2
puts "I own #{num_cars} cars."
They both output the same thing, but obviously option 2 is a much shorter way to do it.
Is there any particular reason why I should use one format over the other?
Whenever TIMTOWTDI (there is more than one way to do it), you should look for the pros and cons. Using "string interpolation" (the second) instead of "string concatenation" (the first):
Pros:
Is less typing
Automatically calls to_s for you
More idiomatic within the Ruby community
Faster to accomplish during runtime
Cons:
Automatically calls to_s for you (maybe you thought you had a string, and the to_s representation is not what you wanted, and hides the fact that it wasn't a string)
Requires you to use " to delimit your string instead of ' (perhaps you have a habit of using ', or you previously typed a string using that and only later needed to use string interpolation)
Both interpolation and concatination has its own strength and weakness. Below I gave a benchmark which clearly demonstrates where to use concatination and where to use interpolation.
require 'benchmark'
iterations = 1_00_000
firstname = 'soundarapandian'
middlename = 'rathinasamy'
lastname = 'arumugam'
puts 'With dynamic new strings'
puts '===================================================='
5.times do
Benchmark.bm(10) do |benchmark|
benchmark.report('concatination') do
iterations.times do
'Mr. ' + firstname + middlename + lastname + ' aka soundar'
end
end
benchmark.report('interpolaton') do
iterations.times do
"Mr. #{firstname} #{middlename} #{lastname} aka soundar"
end
end
end
puts '--------------------------------------------------'
end
puts 'With predefined strings'
puts '===================================================='
5.times do
Benchmark.bm(10) do |benchmark|
benchmark.report('concatination') do
iterations.times do
firstname + middlename + lastname
end
end
benchmark.report('interpolaton') do
iterations.times do
"#{firstname} #{middlename} #{lastname}"
end
end
end
puts '--------------------------------------------------'
end
And below is the Benchmark result
Without predefined strings
====================================================
user system total real
concatination 0.170000 0.000000 0.170000 ( 0.165821)
interpolaton 0.130000 0.010000 0.140000 ( 0.133665)
--------------------------------------------------
user system total real
concatination 0.180000 0.000000 0.180000 ( 0.180410)
interpolaton 0.120000 0.000000 0.120000 ( 0.125051)
--------------------------------------------------
user system total real
concatination 0.140000 0.000000 0.140000 ( 0.134256)
interpolaton 0.110000 0.000000 0.110000 ( 0.111427)
--------------------------------------------------
user system total real
concatination 0.130000 0.000000 0.130000 ( 0.132047)
interpolaton 0.120000 0.000000 0.120000 ( 0.120443)
--------------------------------------------------
user system total real
concatination 0.170000 0.000000 0.170000 ( 0.170394)
interpolaton 0.150000 0.000000 0.150000 ( 0.149601)
--------------------------------------------------
With predefined strings
====================================================
user system total real
concatination 0.070000 0.000000 0.070000 ( 0.067735)
interpolaton 0.100000 0.000000 0.100000 ( 0.099335)
--------------------------------------------------
user system total real
concatination 0.060000 0.000000 0.060000 ( 0.061955)
interpolaton 0.130000 0.000000 0.130000 ( 0.127011)
--------------------------------------------------
user system total real
concatination 0.090000 0.000000 0.090000 ( 0.092136)
interpolaton 0.110000 0.000000 0.110000 ( 0.110224)
--------------------------------------------------
user system total real
concatination 0.080000 0.000000 0.080000 ( 0.077587)
interpolaton 0.110000 0.000000 0.110000 ( 0.112975)
--------------------------------------------------
user system total real
concatination 0.090000 0.000000 0.090000 ( 0.088154)
interpolaton 0.140000 0.000000 0.140000 ( 0.135349)
--------------------------------------------------
Conclusion
If strings already defined and sure they will never be nil use concatination else use interpolation.Use appropriate one which will result in better performance than one which is easy to indent.
#user1181898 - IMHO, it's because it's easier to see what's happening. To #Phrogz's point, string interpolation automatically calls the to_s for you. As a beginner, you need to see what's happening "under the hood" so that you learn the concept as opposed to just learning by rote.
Think of it like learning mathematics. You learn the "long" way in order to understand the concepts so that you can take shortcuts once you actually know what you are doing. I speak from experience b/c I'm not that advanced in Ruby yet, but I've made enough mistakes to advise people on what not to do. Hope this helps.
If you are using a string as a buffer, I found that using concatenation (String#concat) to be faster.
require 'benchmark/ips'
puts "Ruby #{RUBY_VERSION} at #{Time.now}"
puts
firstname = 'soundarapandian'
middlename = 'rathinasamy'
lastname = 'arumugam'
Benchmark.ips do |x|
x.report("String\#<<") do |i|
buffer = String.new
while (i -= 1) > 0
buffer << 'Mr. ' << firstname << middlename << lastname << ' aka soundar'
end
end
x.report("String interpolate") do |i|
buffer = String.new
while (i -= 1) > 0
buffer << "Mr. #{firstname} #{middlename} #{lastname} aka soundar"
end
end
x.compare!
end
Results:
Ruby 2.3.1 at 2016-11-15 15:03:57 +1300
Warming up --------------------------------------
String#<< 230.615k i/100ms
String interpolate 234.274k i/100ms
Calculating -------------------------------------
String#<< 2.345M (± 7.2%) i/s - 11.761M in 5.041164s
String interpolate 1.242M (± 5.4%) i/s - 6.325M in 5.108324s
Comparison:
String#<<: 2344530.4 i/s
String interpolate: 1241784.9 i/s - 1.89x slower
At a guess, I'd say that interpolation generates a temporary string which is why it's slower.
Here is a full benchmark which also compares Kernel#format and String#+ as it's all methods for construction dynamic string in ruby that I know 🤔
require 'benchmark/ips'
firstname = 'soundarapandian'
middlename = 'rathinasamy'
lastname = 'arumugam'
FORMAT_STR = 'Mr. %<firstname>s %<middlename>s %<lastname>s aka soundar'
Benchmark.ips do |x|
x.report("String\#<<") do |i|
str = String.new
str << 'Mr. ' << firstname << ' ' << middlename << ' ' << lastname << ' aka soundar'
end
x.report "String\#+" do
'Mr. ' + firstname + ' ' + middlename + ' ' + lastname + ' aka soundar'
end
x.report "format" do
format(FORMAT_STR, firstname: firstname, middlename: middlename, lastname: lastname)
end
x.report("String interpolate") do |i|
"Mr. #{firstname} #{middlename} #{lastname} aka soundar"
end
x.compare!
end
And results for ruby 2.6.5
Warming up --------------------------------------
String#<<
94.597k i/100ms
String#+ 75.512k i/100ms
format 73.269k i/100ms
String interpolate 164.005k i/100ms
Calculating -------------------------------------
String#<< 91.385B (±16.9%) i/s - 315.981B
String#+ 905.389k (± 4.2%) i/s - 4.531M in 5.013725s
format 865.746k (± 4.5%) i/s - 4.323M in 5.004103s
String interpolate 161.694B (±11.3%) i/s - 503.542B
Comparison:
String interpolate: 161693621120.0 i/s
String#<<: 91385051886.2 i/s - 1.77x slower
String#+: 905388.7 i/s - 178590.27x slower
format: 865745.8 i/s - 186768.00x slower