Why is == faster than eql? - ruby

I read in the documentation for the String class that eql? is a strict equality operator, without type conversion, and == is a equality operator which tries to convert second its argument to a String, and, the C source code for this methods confirms that:
The eql? source code:
static VALUE
rb_str_eql(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (TYPE(str2) != T_STRING) return Qfalse;
return str_eql(str1, str2);
}
The == source code:
VALUE
rb_str_equal(VALUE str1, VALUE str2)
{
if (str1 == str2) return Qtrue;
if (TYPE(str2) != T_STRING) {
if (!rb_respond_to(str2, rb_intern("to_str"))) {
return Qfalse;
}
return rb_equal(str2, str1);
}
return str_eql(str1, str2);
}
But when I tried to benchmark these methods, I was suprised that == is faster than eql? by up to 20%!
My benchmark code is:
require "benchmark"
RUN_COUNT = 100000000
first_string = "Woooooha"
second_string = "Woooooha"
time = Benchmark.measure do
RUN_COUNT.times do |i|
first_string.eql?(second_string)
end
end
puts time
time = Benchmark.measure do
RUN_COUNT.times do |i|
first_string == second_string
end
end
puts time
And results:
Ruby 1.9.3-p125:
26.420000 0.250000 26.670000 ( 26.820762)
21.520000 0.200000 21.720000 ( 21.843723)
Ruby 1.9.2-p290:
25.930000 0.280000 26.210000 ( 26.318998)
19.800000 0.130000 19.930000 ( 19.991929)
So, can anyone explain why the more simple eql? method is slower than == method in the case when I run it for two similar strings?

The reason you are seeing a difference is not related to the implementation of == vs eql? but is due to the fact that Ruby optimizes operators (like ==) to avoid going through the normal method lookup when possible.
We can verify this in two ways:
Create an alias for == and call that instead. You'll get similar results to eql? and thus slower results than ==.
Compare using send :== and send :eql? instead and you'll get similar timings; the speed difference disappears because Ruby will only use the optimization for direct calls to the operators, not with using send or __send__.
Here's code that shows both:
require 'fruity'
first = "Woooooha"
second = "Woooooha"
class String
alias same_value? ==
end
compare do
with_operator { first == second }
with_same_value { first.same_value? second }
with_eql { first.eql? second }
end
compare do
with_send_op { first.send :==, second }
with_send_eql { first.send :eql?, second }
end
Results:
with_operator is faster than with_same_value by 2x ± 0.1
with_same_value is similar to with_eql
with_send_eql is similar to with_send_op
If you're the curious, the optimizations for operators are in insns.def.
Note: this answer applies only to Ruby MRI, I would be surprised if there was a speed difference in JRuby / rubinius, for instance.

When doing benchmarks, don't use times, because that creates a closure RUN_COUNT times. The extra time taken as a result affects all benchmarks equally in absolute terms, but that makes it harder to notice a relative difference:
require "benchmark"
RUN_COUNT = 10_000_000
FIRST_STRING = "Woooooha"
SECOND_STRING = "Woooooha"
def times_eq_question_mark
RUN_COUNT.times do |i|
FIRST_STRING.eql?(SECOND_STRING)
end
end
def times_double_equal_sign
RUN_COUNT.times do |i|
FIRST_STRING == SECOND_STRING
end
end
def loop_eq_question_mark
i = 0
while i < RUN_COUNT
FIRST_STRING.eql?(SECOND_STRING)
i += 1
end
end
def loop_double_equal_sign
i = 0
while i < RUN_COUNT
FIRST_STRING == SECOND_STRING
i += 1
end
end
1.upto(10) do |i|
method_names = [:times_eq_question_mark, :times_double_equal_sign, :loop_eq_question_mark, :loop_double_equal_sign]
method_times = method_names.map {|method_name| Benchmark.measure { send(method_name) } }
puts "Run #{i}"
method_names.zip(method_times).each do |method_name, method_time|
puts [method_name, method_time].join("\t")
end
puts
end
gives
Run 1
times_eq_question_mark 3.500000 0.000000 3.500000 ( 3.578011)
times_double_equal_sign 2.390000 0.000000 2.390000 ( 2.453046)
loop_eq_question_mark 3.110000 0.000000 3.110000 ( 3.140525)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.124932)
Run 2
times_eq_question_mark 3.531000 0.000000 3.531000 ( 3.562386)
times_double_equal_sign 2.469000 0.000000 2.469000 ( 2.484295)
loop_eq_question_mark 3.063000 0.000000 3.063000 ( 3.109276)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.140556)
Run 3
times_eq_question_mark 3.547000 0.000000 3.547000 ( 3.593635)
times_double_equal_sign 2.437000 0.000000 2.437000 ( 2.453047)
loop_eq_question_mark 3.063000 0.000000 3.063000 ( 3.109275)
loop_double_equal_sign 2.140000 0.000000 2.140000 ( 2.140557)
Run 4
times_eq_question_mark 3.547000 0.000000 3.547000 ( 3.578011)
times_double_equal_sign 2.422000 0.000000 2.422000 ( 2.437422)
loop_eq_question_mark 3.094000 0.000000 3.094000 ( 3.140524)
loop_double_equal_sign 2.140000 0.000000 2.140000 ( 2.140557)
Run 5
times_eq_question_mark 3.578000 0.000000 3.578000 ( 3.671758)
times_double_equal_sign 2.406000 0.000000 2.406000 ( 2.468671)
loop_eq_question_mark 3.110000 0.000000 3.110000 ( 3.156149)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.156181)
Run 6
times_eq_question_mark 3.562000 0.000000 3.562000 ( 3.562386)
times_double_equal_sign 2.407000 0.000000 2.407000 ( 2.468671)
loop_eq_question_mark 3.109000 0.000000 3.109000 ( 3.124900)
loop_double_equal_sign 2.125000 0.000000 2.125000 ( 2.234303)
Run 7
times_eq_question_mark 3.500000 0.000000 3.500000 ( 3.546762)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.468671)
loop_eq_question_mark 3.031000 0.000000 3.031000 ( 3.171773)
loop_double_equal_sign 2.157000 0.000000 2.157000 ( 2.156181)
Run 8
times_eq_question_mark 3.468000 0.000000 3.468000 ( 3.656133)
times_double_equal_sign 2.454000 0.000000 2.454000 ( 2.484296)
loop_eq_question_mark 3.093000 0.000000 3.093000 ( 3.249896)
loop_double_equal_sign 2.125000 0.000000 2.125000 ( 2.140556)
Run 9
times_eq_question_mark 3.563000 0.000000 3.563000 ( 3.593635)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.453047)
loop_eq_question_mark 3.125000 0.000000 3.125000 ( 3.124900)
loop_double_equal_sign 2.141000 0.000000 2.141000 ( 2.156181)
Run 10
times_eq_question_mark 3.515000 0.000000 3.515000 ( 3.562386)
times_double_equal_sign 2.453000 0.000000 2.453000 ( 2.453046)
loop_eq_question_mark 3.094000 0.000000 3.094000 ( 3.140525)
loop_double_equal_sign 2.109000 0.000000 2.109000 ( 2.156181)

equal? is reference equality
== is value equality
eql? is value and type equality
The third method, eql? is normally used to test if two objects have the same value as well as the same type. For example:
puts "integer == to float: #{25 == 25.0}"
puts "integer eql? to float: #{25.eql? 25.0}"
gives:
Does integer == to float: true
Does integer eql? to float: false
So I thought since eql? does more checking it would be slower, and for strings it is, at least on my Ruby 1.93. So I figured it must be type dependent and did some tests.
When integer and floats are compared eql? is a bit faster. When integers are compared == is much faster, until x2. Wrong theory, back to start.
The next theory: comparing two values of the same type will be faster with one of both proved to be true, in the case they are of the same type == is always faster, eql? is faster when types are different, again until x2.
Don't have the time to compare all types but I'm sure you'll get varying results, although the same kind of comparison always gives similar results. Can somebody prove me wrong?
Here are my results from the test of the OP:
16.863000 0.000000 16.863000 ( 16.903000) 2 strings with eql?
14.212000 0.000000 14.212000 ( 14.334600) 2 strings with ==
13.213000 0.000000 13.213000 ( 13.245600) integer and floating with eql?
14.103000 0.000000 14.103000 ( 14.200400) integer and floating with ==
13.229000 0.000000 13.229000 ( 13.410800) 2 same integers with eql?
9.406000 0.000000 9.406000 ( 9.410000) 2 same integers with ==
19.625000 0.000000 19.625000 ( 19.720800) 2 different integers with eql?
9.407000 0.000000 9.407000 ( 9.405800) 2 different integers with ==
21.825000 0.000000 21.825000 ( 21.910200) integer with string with eql?
43.836000 0.031000 43.867000 ( 44.074200) integer with string with ==

Related

Performance difference between MRI Ruby and jRuby

While doing some benchmarking to answer this question about the fastest way to concatenate arrays I was surprised that when I did the same benchmarks in with jRuby the tests were a lot slower.
Does this mean that the old adagio about jRuby being faster than MRI Ruby is gone ? Or is this about how arrays are treated in jRuby ?
Here the benchmark and the results in both MRI Ruby 2.3.0 and jRuby 9.1.2.0
Both run on a 64bit Windows 7 box, all 4 processors busy for 50-60%, memory in use ± 5.5GB. The jRuby had to be started with the parameter -J-Xmx1500M to provide enough heap space. I had to remove the test with push because of stack level too deep and also removed the slowest methods to not make the tests too long. Used Jave runtime: 1.7.0_21
require 'Benchmark'
N = 100
class Array
def concat_all
self.reduce([], :+)
end
end
# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a
Benchmark.bm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a
Benchmark.bm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
This question is not about the different methods used, see the original question for that.
In both situations MRI is 7 times faster !
Can someone exlain me why ?
I'm also curious to how other implementations do, like RBX (Rubinius)
C:\Users\...>d:\jruby\bin\jruby -J-Xmx1500M concat3.rb
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000946)
concat 0.000000 0.000000 0.000000 ( 0.001436)
splash 0.000000 0.000000 0.000000 ( 0.001456)
concat_all 0.000000 0.000000 0.000000 ( 0.002177)
flat_map 0.010000 0.000000 0.010000 ( 0.003179)
user system total real
plus 140.166000 0.000000 140.166000 (140.158687)
concat 143.475000 0.000000 143.475000 (143.473786)
splash 139.408000 0.000000 139.408000 (139.406671)
concat_all 144.475000 0.000000 144.475000 (144.474436)
flat_map143.519000 0.000000 143.519000 (143.517636)
C:\Users\...>ruby concat3.rb
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000074)
concat 0.000000 0.000000 0.000000 ( 0.000065)
splash 0.000000 0.000000 0.000000 ( 0.000098)
concat_all 0.000000 0.000000 0.000000 ( 0.000141)
flat_map 0.000000 0.000000 0.000000 ( 0.000122)
user system total real
plus 15.226000 6.723000 21.949000 ( 21.958854)
concat 11.700000 9.142000 20.842000 ( 20.928087)
splash 21.247000 12.589000 33.836000 ( 33.933170)
concat_all 14.508000 8.315000 22.823000 ( 22.871641)
flat_map 11.170000 8.923000 20.093000 ( 20.170945)
general rule is (as mentioned in the comments) that JRuby/JVM needs warmup.
usually bmbm is good fit, although TIMES=1000 should be increased (at least for the small array cases), also 1.5G might be not enough for optimal performance of JRuby (noticed a considerable change in numbers going from -Xmx2g to -Xmx3g). here's the results :
ruby 2.3.1p112 (2016-04-26 revision 54768) [x86_64-linux]
$ ruby concat3.rb
Rehearsal -----------------------------------------------
plus 0.000000 0.000000 0.000000 ( 0.000076)
concat 0.000000 0.000000 0.000000 ( 0.000070)
splash 0.000000 0.000000 0.000000 ( 0.000099)
concat_all 0.000000 0.000000 0.000000 ( 0.000136)
flat_map 0.000000 0.000000 0.000000 ( 0.000138)
-------------------------------------- total: 0.000000sec
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000051)
concat 0.000000 0.000000 0.000000 ( 0.000059)
splash 0.000000 0.000000 0.000000 ( 0.000083)
concat_all 0.000000 0.000000 0.000000 ( 0.000120)
flat_map 0.000000 0.000000 0.000000 ( 0.000173)
Rehearsal -----------------------------------------------
plus 43.040000 3.320000 46.360000 ( 46.351004)
concat 15.080000 3.870000 18.950000 ( 19.228059)
splash 49.680000 4.820000 54.500000 ( 54.587707)
concat_all 51.840000 5.260000 57.100000 ( 57.114867)
flat_map 17.380000 5.340000 22.720000 ( 22.716987)
------------------------------------ total: 199.630000sec
user system total real
plus 42.880000 3.600000 46.480000 ( 46.506013)
concat 17.230000 5.290000 22.520000 ( 22.890809)
splash 60.300000 7.480000 67.780000 ( 67.878534)
concat_all 54.910000 6.480000 61.390000 ( 61.404383)
flat_map 17.310000 5.570000 22.880000 ( 23.223789)
...
jruby 9.1.6.0 (2.3.1) 2016-11-09 0150a76 Java HotSpot(TM) 64-Bit Server VM 25.112-b15 on 1.8.0_112-b15 +jit [linux-x86_64]
$ jruby -J-Xmx3g concat3.rb
Rehearsal -----------------------------------------------
plus 0.010000 0.000000 0.010000 ( 0.001445)
concat 0.000000 0.000000 0.000000 ( 0.002534)
splash 0.000000 0.000000 0.000000 ( 0.001791)
concat_all 0.000000 0.000000 0.000000 ( 0.002513)
flat_map 0.010000 0.000000 0.010000 ( 0.007088)
-------------------------------------- total: 0.020000sec
user system total real
plus 0.010000 0.000000 0.010000 ( 0.002700)
concat 0.000000 0.000000 0.000000 ( 0.001085)
splash 0.000000 0.000000 0.000000 ( 0.001569)
concat_all 0.000000 0.000000 0.000000 ( 0.003052)
flat_map 0.000000 0.000000 0.000000 ( 0.002252)
Rehearsal -----------------------------------------------
plus 32.410000 0.670000 33.080000 ( 17.385688)
concat 18.610000 0.060000 18.670000 ( 11.206419)
splash 57.770000 0.330000 58.100000 ( 25.366032)
concat_all 19.100000 0.030000 19.130000 ( 13.747319)
flat_map 16.160000 0.040000 16.200000 ( 10.534130)
------------------------------------ total: 145.180000sec
user system total real
plus 16.060000 0.040000 16.100000 ( 11.737483)
concat 15.950000 0.030000 15.980000 ( 10.480468)
splash 47.870000 0.130000 48.000000 ( 22.668069)
concat_all 19.150000 0.030000 19.180000 ( 13.934314)
flat_map 16.850000 0.020000 16.870000 ( 10.862716)
... so it seems like the opposite - MRI 2.3 gets 2-5x slower than JRuby 9.1
cat concat3.rb
require 'benchmark'
N = (ENV['TIMES'] || 100).to_i
class Array
def concat_all
self.reduce([], :+)
end
end
# small arrays
a = (1..10).to_a
b = (11..20).to_a
c = (21..30).to_a
Benchmark.bmbm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
#large arrays
a = (1..10_000_000).to_a
b = (10_000_001..20_000_000).to_a
c = (20_000_001..30_000_000).to_a
Benchmark.bmbm do |r|
r.report('plus ') { N.times { a + b + c }}
r.report('concat ') { N.times { [].concat(a).concat(b).concat(c) }}
r.report('splash ') { N.times {[*a, *b, *c]} }
r.report('concat_all ') { N.times { [a, b, c].concat_all }}
r.report('flat_map ') { N.times {[a, b, c].flat_map(&:itself)} }
end
What I have learned from these comments and answers and the tests I did myself afterward..
the OS probably makes a difference, I would have liked more answers
in different situations so here I'm just guessing
the fastest method differs between runtime, MRI or jRuby, 32 of 64bit, JRE, so making claims that that method is beter than that
other one is difficult, on my sysrtem the plus method was fastest in
almost all circumstances but I didin't use Java HotSpot like kares
in 64 bit jRuby you can specify a much higher heap than in 32 bit (1.5G on my system), in 64 bit I coult use more heap than I have
memory (a bug somewhere ?)
higher heaps speed up operations using much memory like the huge arrays I used
use the latest Java runtime, speed is better
jRuby needs a warmup, a methods needs to run a number of times before compiled, so use .bm and .bmbm with different repeat values to
find that margin
Sometimes MRI is faster but with the right parameters and warmup jRuby was 3 to 3.5 times as fast on my system for this particular
test
The last, together with the loading of the JVM makes MRI better for short ad hoc scripts, jRuby better for process hungry, longer running processes with methods repeated often, so jRuby would be better for running servers and services.
What I saw confirmed: do your own benchmarks for long or repeated processes.
Both implementations have made big improvements in speed compared to earlier versions, let's not forget: Ruby may be a slower runner but a faster developer and if you compare the cost of some extra hardware to some extra developers...
Thanks to all the commenters and karen for their expertise.
EDIT
Out of curiosity I run the test also with Rubinius in a docker container (I'm on Windows), rubinius 3.69 (2.3.1 a57071c6 2016-11-17 3.8.0) [x86_64-linux-gnu]
Only concat and flat_map are on par with MRI, I wonder if these methods are in C and the rest in pure Ruby..
Rehearsal -----------------------------------------------
plus 0.000000 0.000000 0.000000 ( 0.000742)
concat 0.000000 0.000000 0.000000 ( 0.000093)
splash 0.000000 0.000000 0.000000 ( 0.000619)
concat_all 0.000000 0.000000 0.000000 ( 0.001357)
flat_map 0.000000 0.000000 0.000000 ( 0.001536)
-------------------------------------- total: 0.000000sec
user system total real
plus 0.000000 0.000000 0.000000 ( 0.000589)
concat 0.000000 0.000000 0.000000 ( 0.000084)
splash 0.000000 0.000000 0.000000 ( 0.000596)
concat_all 0.000000 0.000000 0.000000 ( 0.001679)
flat_map 0.000000 0.000000 0.000000 ( 0.001568)
Rehearsal -----------------------------------------------
plus 68.770000 63.320000 132.090000 (265.589506)
concat 20.300000 2.810000 23.110000 ( 23.662007)
splash 79.310000 74.090000 153.400000 (305.013934)
concat_all 83.130000 100.580000 183.710000 (378.988638)
flat_map 20.680000 0.960000 21.640000 ( 21.769550)
------------------------------------ total: 513.950000sec
user system total real
plus 65.310000 70.300000 135.610000 (273.799215)
concat 20.050000 0.610000 20.660000 ( 21.163930)
splash 79.360000 80.000000 159.360000 (316.366122)
concat_all 84.980000 99.880000 184.860000 (383.870653)
flat_map 20.940000 1.760000 22.700000 ( 22.760643)

Random string generation with same pattern

I want to generate a random string with the pattern:
number-number-letter-SPACE-letter-number-number
for example "81b t15", "12a x13". How can I generate something like this? I tried generating each char and joining them into one string, but it does not look efficient.
Nums = (0..9).to_a
Ltrs = ("A".."Z").to_a + ("a".."z").to_a
def rand_num; Nums.sample end
def rand_ltr; Ltrs.sample end
"#{rand_num}#{rand_num}#{rand_ltr} #{rand_ltr}#{rand_num}#{rand_num}"
# => "71P v33"
Have you looked at randexp gem
It works like this:
> /\d\d\w \w\d\d/.gen
=> "64M c82"
Ok here's another entry for the competition :D
module RandomString
LETTERS = (("A".."Z").to_a + ("a".."z").to_a)
LETTERS_SIZE = LETTERS.size
SPACE = " "
FORMAT = [:number, :letter, :number, :space, :letter, :number, :number]
class << self
def generate
chars.join
end
def generate2
"#{number}#{letter}#{number} #{letter}#{number}#{number}"
end
private
def chars
FORMAT.collect{|char_class| send char_class}
end
def letter
LETTERS[rand(LETTERS_SIZE)]
end
def number
rand 10
end
def space
SPACE
end
end
end
And you use it like:
50.times { puts RandomString.generate }
Out of curiosity, I made a benchmark of all the solutions presented here. Here are the results:
JRuby:
user system total real
kimmmo 1.490000 0.000000 1.490000 ( 0.990000)
kimmmo2 0.600000 0.010000 0.610000 ( 0.479000)
sawa 0.960000 0.040000 1.000000 ( 0.533000)
hp4k 2.050000 0.230000 2.280000 ( 1.234000)
brian 17.700000 0.170000 17.870000 ( 14.867000)
MRI 2.0
user system total real
kimmmo 0.900000 0.000000 0.900000 ( 0.908601)
kimmmo2 0.410000 0.000000 0.410000 ( 0.406443)
sawa 0.570000 0.000000 0.570000 ( 0.568935)
hp4k 4.940000 0.000000 4.940000 ( 4.945404)
brian 25.860000 0.010000 25.870000 ( 25.870011)
You can do it this way
(0..9).to_a.sample(2).join + ('a'..'z').to_a.sample + " " + ('a'..'z').to_a.sample + (0..9).to_a.sample(2).join

Count total characters in an Array of Strings in Ruby?

How would I count the total number of characters in an array of strings in Ruby? Assume I have the following:
array = ['peter' , 'romeo' , 'bananas', 'pijamas']
I'm trying:
array.each do |counting|
puts counting.count "array[]"
end
but, I'm not getting the desired result. It appears I am counting something other than the characters.
I searched for the count property but I haven't had any luck or found a good source of info. Basically, I'd like to get an output of the total of characters inside the array.,
Wing's Answer will work, but just for fun here are a few alternatives
['peter' , 'romeo' , 'bananas', 'pijamas'].inject(0) {|c, w| c += w.length }
or
['peter' , 'romeo' , 'bananas', 'pijamas'].join.length
The real issue is that string.count is not the method you're looking for. (Docs)
Or...
a.map(&:size).reduce(:+) # from Andrew: reduce(0, :+)
Another alternative:
['peter' , 'romeo' , 'bananas', 'pijamas'].join('').size
An interesting result :)
>> array = []
>> 1_000_000.times { array << 'foo' }
>> Benchmark.bmbm do |x|
>> x.report('mapreduce') { array.map(&:size).reduce(:+) }
>> x.report('mapsum') { array.map(&:size).sum }
>> x.report('inject') { array.inject(0) { |c, w| c += w.length } }
>> x.report('joinsize') { array.join('').size }
>> x.report('joinsize2') { array.join.size }
>> end
Rehearsal ---------------------------------------------
mapreduce 0.220000 0.000000 0.220000 ( 0.222946)
mapsum 0.210000 0.000000 0.210000 ( 0.210070)
inject 0.150000 0.000000 0.150000 ( 0.158709)
joinsize 0.120000 0.000000 0.120000 ( 0.116889)
joinsize2 0.070000 0.000000 0.070000 ( 0.071718)
------------------------------------ total: 0.770000sec
user system total real
mapreduce 0.220000 0.000000 0.220000 ( 0.228385)
mapsum 0.210000 0.000000 0.210000 ( 0.207359)
inject 0.160000 0.000000 0.160000 ( 0.156711)
joinsize 0.120000 0.000000 0.120000 ( 0.116652)
joinsize2 0.080000 0.000000 0.080000 ( 0.069612)
so it looks like array.join.size has the lowest runtime
a = ['peter' , 'romeo' , 'bananas', 'pijamas']
count = 0
a.each {|s| count += s.length}
puts count

Ruby on Rails regexp equals-tilde vs. array include for checking a list of options

I'm using Rails 3.2.3 with Ruby 1.9.3p0.
I'm finding that I often have to determine whether a string occurs in a list of options. It seems I can use the Ruby array .include method:
<% if ['todo','pending','history'].include?(params[:category]) %>
or the regular expression equals-tilde match shorthand with vertical bars separating options:
<% if params[:category] =~ /todo|pending|history/ %>
Is one better than the other in terms of performance?
Is there an even better approach?
Summary: Array#include? with String elements wins for both accepted and rejected inputs, for your example with only three acceptable values. For a larger set to check, it looks like Set#include? with String elements might win.
How to Test
We should test this is empirically.
Here are a couple of alternatives that you may want to consider as well: pre-compiled regex, a list of symbols, and a Set with String elements.
I would imagine that the performance may also depend on whether most of your inputs fall into the expected set, and are accepted, or whether most are outside the set, and are rejected.
Here's an empirical test script:
require 'benchmark'
require 'set'
strings = ['todo','pending','history']
string_set = Set.new(strings)
symbols = strings.map(&:to_sym)
regex_compiled = Regexp.new(strings.join("|"))
strings_avg_size = (strings.map(&:size).inject {|sum, n| sum + n}.to_f / strings.size).to_i
num_inputs = 1_000_000
accepted_inputs = (0...num_inputs).map { strings[rand(strings.size)] }
rejected_inputs = (0...num_inputs).map { (0..strings_avg_size).map { ('a'...'z').to_a[rand(26)] }.join }
Benchmark.bmbm(40) do |x|
x.report("Array#include?, Strings, accepted:") { accepted_inputs.map {|s| strings.include?(s) } }
x.report("Array#include?, Strings, rejected:") { rejected_inputs.map {|s| strings.include?(s) } }
x.report("Array#include?, Symbols, accepted:") { accepted_inputs.map {|s| symbols.include?(s.to_sym) } }
x.report("Array#include?, Symbols, rejected:") { rejected_inputs.map {|s| symbols.include?(s.to_sym) } }
x.report("Set#include?, Strings, accepted:") { accepted_inputs.map {|s| string_set.include?(s) } }
x.report("Set#include?, Strings, rejected:") { rejected_inputs.map {|s| string_set.include?(s) } }
x.report("Regexp#match, interpreted, accepted:") { accepted_inputs.map {|s| s =~ /todo|pending|history/ } }
x.report("Regexp#match, interpreted, rejected:") { rejected_inputs.map {|s| s =~ /todo|pending|history/ } }
x.report("Regexp#match, compiled, accepted:") { accepted_inputs.map {|s| regex_compiled.match(s) } }
x.report("Regexp#match, compiled, rejected:") { rejected_inputs.map {|s| regex_compiled.match(s) } }
end
Results
Rehearsal ---------------------------------------------------------------------------
Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.215099)
Array#include?, Strings, rejected: 0.530000 0.010000 0.540000 ( 0.543898)
Array#include?, Symbols, accepted: 0.330000 0.000000 0.330000 ( 0.337767)
Array#include?, Symbols, rejected: 1.870000 0.050000 1.920000 ( 1.923155)
Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.274774)
Set#include?, Strings, rejected: 0.460000 0.000000 0.460000 ( 0.463925)
Regexp#match, interpreted, accepted: 0.380000 0.000000 0.380000 ( 0.382060)
Regexp#match, interpreted, rejected: 0.650000 0.000000 0.650000 ( 0.660775)
Regexp#match, compiled, accepted: 1.130000 0.080000 1.210000 ( 1.220970)
Regexp#match, compiled, rejected: 0.630000 0.000000 0.630000 ( 0.640721)
------------------------------------------------------------------ total: 6.600000sec
user system total real
Array#include?, Strings, accepted: 0.210000 0.000000 0.210000 ( 0.219060)
Array#include?, Strings, rejected: 0.430000 0.000000 0.430000 ( 0.444911)
Array#include?, Symbols, accepted: 0.340000 0.000000 0.340000 ( 0.341970)
Array#include?, Symbols, rejected: 1.080000 0.000000 1.080000 ( 1.089961)
Set#include?, Strings, accepted: 0.270000 0.000000 0.270000 ( 0.281270)
Set#include?, Strings, rejected: 0.400000 0.000000 0.400000 ( 0.406181)
Regexp#match, interpreted, accepted: 0.370000 0.000000 0.370000 ( 0.366931)
Regexp#match, interpreted, rejected: 0.560000 0.000000 0.560000 ( 0.558652)
Regexp#match, compiled, accepted: 0.920000 0.000000 0.920000 ( 0.915914)
Regexp#match, compiled, rejected: 0.620000 0.000000 0.620000 ( 0.627620)
Conclusions
(see the Summary above)
It makes sense to me, upon reflection, that the Array of Symbols would be very slow for rejected inputs, because every single one of those random strings must be interned in the Symbol table before the check can be made.
It makes less sense to me, even after pondering, that the compiled Regexp would perform so badly, especially compared to the Regexp interpreted as a literal in the code. Can anyone explain why it does so badly?
#ms-tg's answer has good benchmarks, and answers your question well as far as I'm concerned. I just wanted to add a small note: be careful with this because these two options won't always have the same results:
params = Hash.new
keyword_array = ['todo','pending','history']
included = nil
params[:category] = "history plus other text"
start_time = Time.now
1000.times do
included = keyword_array.include?(params[:category])
end
puts "Array.include? returned #{included} in #{(Time.now - start_time)*1000}ms"
start_time = Time.now
1000.times do
included = (params[:category] =~ /todo|pending|history/).is_a?(Integer)
end
puts "Regexp returned #{included} in #{(Time.now - start_time)*1000}ms"
Returns:
Array.include? returned false in 0.477ms
Regexp returned true in 0.953ms
Notice that the regular expression returned true in this case, but the array.include? returned false. This should be considered when building your logic.
Basically, if the string isn't in the array exactly, array.include? will be false, but if one of the keywords is anywhere in the string the regular expression will be true (regardless of whether there's other text).

obj.nil? vs. obj == nil

Is it better to use obj.nil? or obj == nil and what are the benefits of both?
Is it better to use obj.nil? or obj == nil
It is exactly the same. It has the exact same observable effects from the outside ( pfff ) *
and what are the benefits of both.
If you like micro optimizations all the objects will return false to the .nil? message except for the object nil itself, while the object using the == message will perform a tiny micro comparison
with the other object to determine if it is the same object.
* See comments.
Syntax and style aside, I wanted to see how "the same" various approaches to testing for nil were. So, I wrote some benchmarks to see, and threw various forms of nil testing at it.
TL;DR - Results First
The actual results showed that using obj as a nil check is the fastest in all cases. obj is consistently faster by 30% or more than checking obj.nil?.
Surprisingly, obj performs about 3-4 times as fast as variations on obj == nil, for which there seems to be a punishing performance penalty.
Want to speed up your performance-intensive algorithm by 200%-300%? Convert all obj == nil checks to obj. Want to sandbag your code's performance? Use obj == nil everywhere that you can. (just kidding: don't sandbag your code!).
In the final analysis, always use obj. That jives with the Ruby Style Guide rule: Don't do explicit non-nil checks unless you're dealing with boolean values.
The Benchmark Conditions
OK, those are the results. So how is this benchmark put together, what tests were done, and what are the details of the results?
The nil checks that I came up with are:
obj
obj.nil?
!obj
!!obj
obj == nil
obj != nil
I picked various Ruby types to test, in case the results changed based on the type. These types were Fixnum, Float, FalseClass, TrueClass, String, and Regex.
I used these nil check conditions on each of the types to see if there was a difference between them, performance-wise. For each type, I tested both nil objects and non-nil value objects (e.g. 1_000_000, 100_000.0, false, true, "string", and /\w/) to see if there's a difference in checking for nil on an object that is nil versus on an object that's not nil.
The Benchmarks
With all of that out of the way, here is the benchmark code:
require 'benchmark'
nil_obj = nil
N = 10_000_000
puts RUBY_DESCRIPTION
[1_000_000, 100_000.0, false, true, "string", /\w/].each do |obj|
title = "#{obj} (#{obj.class.name})"
puts "============================================================"
puts "Running tests for obj = #{title}"
Benchmark.bm(15, title) do |x|
implicit_obj_report = x.report("obj:") { N.times { obj } }
implicit_nil_report = x.report("nil_obj:") { N.times { nil_obj } }
explicit_obj_report = x.report("obj.nil?:") { N.times { obj.nil? } }
explicit_nil_report = x.report("nil_obj.nil?:") { N.times { nil_obj.nil? } }
not_obj_report = x.report("!obj:") { N.times { !obj } }
not_nil_report = x.report("!nil_obj:") { N.times { !nil_obj } }
not_not_obj_report = x.report("!!obj:") { N.times { !!obj } }
not_not_nil_report = x.report("!!nil_obj:") { N.times { !!nil_obj } }
equals_obj_report = x.report("obj == nil:") { N.times { obj == nil } }
equals_nil_report = x.report("nil_obj == nil:") { N.times { nil_obj == nil } }
not_equals_obj_report = x.report("obj != nil:") { N.times { obj != nil } }
not_equals_nil_report = x.report("nil_obj != nil:") { N.times { nil_obj != nil } }
end
end
The Results
The results were interesting, because Fixnum, Float, and String types performance was virtually identical, Regex nearly so, and FalseClass and TrueClass performed much more quickly. Testing was done on MRI versions 1.9.3, 2.0.0, 2.1.5, and 2.2.5 with very similar comparative results across the versions. The results from the MRI 2.2.5 version are shown here (and available in the gist:
ruby 2.2.5p319 (2016-04-26 revision 54774) [x86_64-darwin14]
============================================================
Running tests for obj = 1000000 (Fixnum)
user system total real
obj: 0.970000 0.000000 0.970000 ( 0.987204)
nil_obj: 0.980000 0.010000 0.990000 ( 0.980796)
obj.nil?: 1.250000 0.000000 1.250000 ( 1.268564)
nil_obj.nil?: 1.280000 0.000000 1.280000 ( 1.287800)
!obj: 1.050000 0.000000 1.050000 ( 1.064061)
!nil_obj: 1.070000 0.000000 1.070000 ( 1.170393)
!!obj: 1.110000 0.000000 1.110000 ( 1.122204)
!!nil_obj: 1.120000 0.000000 1.120000 ( 1.147679)
obj == nil: 2.110000 0.000000 2.110000 ( 2.137807)
nil_obj == nil: 1.150000 0.000000 1.150000 ( 1.158301)
obj != nil: 2.980000 0.010000 2.990000 ( 3.041131)
nil_obj != nil: 1.170000 0.000000 1.170000 ( 1.203015)
============================================================
Running tests for obj = 100000.0 (Float)
user system total real
obj: 0.940000 0.000000 0.940000 ( 0.947136)
nil_obj: 0.950000 0.000000 0.950000 ( 0.986488)
obj.nil?: 1.260000 0.000000 1.260000 ( 1.264953)
nil_obj.nil?: 1.280000 0.000000 1.280000 ( 1.306817)
!obj: 1.050000 0.000000 1.050000 ( 1.058924)
!nil_obj: 1.070000 0.000000 1.070000 ( 1.096747)
!!obj: 1.100000 0.000000 1.100000 ( 1.105708)
!!nil_obj: 1.120000 0.010000 1.130000 ( 1.132248)
obj == nil: 2.140000 0.000000 2.140000 ( 2.159595)
nil_obj == nil: 1.130000 0.000000 1.130000 ( 1.151257)
obj != nil: 3.010000 0.000000 3.010000 ( 3.042263)
nil_obj != nil: 1.170000 0.000000 1.170000 ( 1.189145)
============================================================
Running tests for obj = false (FalseClass)
user system total real
obj: 0.930000 0.000000 0.930000 ( 0.933712)
nil_obj: 0.950000 0.000000 0.950000 ( 0.973776)
obj.nil?: 1.250000 0.000000 1.250000 ( 1.340943)
nil_obj.nil?: 1.270000 0.010000 1.280000 ( 1.282267)
!obj: 1.030000 0.000000 1.030000 ( 1.039532)
!nil_obj: 1.060000 0.000000 1.060000 ( 1.068765)
!!obj: 1.100000 0.000000 1.100000 ( 1.111930)
!!nil_obj: 1.110000 0.000000 1.110000 ( 1.115355)
obj == nil: 1.110000 0.000000 1.110000 ( 1.121403)
nil_obj == nil: 1.100000 0.000000 1.100000 ( 1.114550)
obj != nil: 1.190000 0.000000 1.190000 ( 1.207389)
nil_obj != nil: 1.140000 0.000000 1.140000 ( 1.181232)
============================================================
Running tests for obj = true (TrueClass)
user system total real
obj: 0.960000 0.000000 0.960000 ( 0.964583)
nil_obj: 0.970000 0.000000 0.970000 ( 0.977366)
obj.nil?: 1.260000 0.000000 1.260000 ( 1.265229)
nil_obj.nil?: 1.270000 0.010000 1.280000 ( 1.283342)
!obj: 1.040000 0.000000 1.040000 ( 1.059689)
!nil_obj: 1.070000 0.000000 1.070000 ( 1.068290)
!!obj: 1.120000 0.000000 1.120000 ( 1.154803)
!!nil_obj: 1.130000 0.000000 1.130000 ( 1.155932)
obj == nil: 1.100000 0.000000 1.100000 ( 1.102394)
nil_obj == nil: 1.130000 0.000000 1.130000 ( 1.160324)
obj != nil: 1.190000 0.000000 1.190000 ( 1.202544)
nil_obj != nil: 1.200000 0.000000 1.200000 ( 1.200812)
============================================================
Running tests for obj = string (String)
user system total real
obj: 0.940000 0.000000 0.940000 ( 0.953357)
nil_obj: 0.960000 0.000000 0.960000 ( 0.962029)
obj.nil?: 1.290000 0.010000 1.300000 ( 1.306233)
nil_obj.nil?: 1.240000 0.000000 1.240000 ( 1.243312)
!obj: 1.030000 0.000000 1.030000 ( 1.046630)
!nil_obj: 1.060000 0.000000 1.060000 ( 1.123925)
!!obj: 1.130000 0.000000 1.130000 ( 1.144168)
!!nil_obj: 1.130000 0.000000 1.130000 ( 1.147330)
obj == nil: 2.320000 0.000000 2.320000 ( 2.341705)
nil_obj == nil: 1.100000 0.000000 1.100000 ( 1.118905)
obj != nil: 3.040000 0.010000 3.050000 ( 3.057040)
nil_obj != nil: 1.150000 0.000000 1.150000 ( 1.162085)
============================================================
Running tests for obj = (?-mix:\w) (Regexp)
user system total real
obj: 0.930000 0.000000 0.930000 ( 0.939815)
nil_obj: 0.960000 0.000000 0.960000 ( 0.961852)
obj.nil?: 1.270000 0.000000 1.270000 ( 1.284321)
nil_obj.nil?: 1.260000 0.000000 1.260000 ( 1.275042)
!obj: 1.040000 0.000000 1.040000 ( 1.042543)
!nil_obj: 1.040000 0.000000 1.040000 ( 1.047280)
!!obj: 1.120000 0.000000 1.120000 ( 1.128137)
!!nil_obj: 1.130000 0.000000 1.130000 ( 1.138988)
obj == nil: 1.520000 0.010000 1.530000 ( 1.529547)
nil_obj == nil: 1.110000 0.000000 1.110000 ( 1.125693)
obj != nil: 2.210000 0.000000 2.210000 ( 2.226783)
nil_obj != nil: 1.170000 0.000000 1.170000 ( 1.169347)
Personally, I prefer object.nil? as it can be less confusing on longer lines; however, I also usually use object.blank? if I'm working in Rails as that also checks to see if the variable is empty.
In many cases, neither, just test the boolean truth value
Although the two operations are very different I'm pretty sure they will always produce the same result, at least until someone out on the edge of something decides to override Object's #nil? method. (One calls the #nil? method inherited from Object or overridden in NilClass and one compares against the nil singleton.)
I would suggest that when in doubt you go a third way, actually, and just test the truth value of an expression.
So, if x and not if x == nil or if x.nil?, in order to have this test DTRT when the expression value is false. Working this way may also help to avoiding tempting someone to define FalseClass#nil? as true.
You can use Symbol#to_proc on nil?, whereas it wouldn't be practical on x == nil.
arr = [1, 2, 3]
arr.any?(&:nil?) # Can be done
arr.any?{|x| x == nil} # More verbose, and I suspect is slower on Ruby 1.9.2
! arr.all? # Check if any values are nil or false
I find myself not using .nil? at all when you can do:
unless obj
// do work
end
It's actually slower using .nil? but not noticeably. .nil? is just a method to check if that object is equal to nil, other than the visual appeal and very little performance it takes there is no difference.
Some might suggest that using .nil? is slower than the simple comparison, which makes sense when you think about it.
But if scale and speed are not your concern, then .nil? is perhaps more readable.

Resources