What's the fastest way to build a string in Ruby? - ruby

In Ternary operator, a person wanting to join ["foo", "bar", "baz"] with commas and an "and" cited The Ruby Cookbook as saying
If efficiency is important to you,
don't build a new string when you can
append items onto an existing string.
[And so on]... Use str << var1 << ' '
<< var2 instead.
But the book was written in 2006.
Is using appending (ie <<) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?

Use Array#join when you can, and String#<< when you can't.
The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:
Ruby 1.9.2p180 user system total real
Array#join 0.320000 0.000000 0.320000 ( 0.330224)
String#+ 1 7.730000 0.200000 7.930000 ( 8.373900)
String#+ 2 4.670000 0.600000 5.270000 ( 5.546633)
String#<< 1 1.260000 0.010000 1.270000 ( 1.315991)
String#<< 2 1.600000 0.020000 1.620000 ( 1.793415)
JRuby 1.6.1 user system total real
Array#join 0.185000 0.000000 0.185000 ( 0.185000)
String#+ 1 9.118000 0.000000 9.118000 ( 9.118000)
String#+ 2 4.544000 0.000000 4.544000 ( 4.544000)
String#<< 1 0.865000 0.000000 0.865000 ( 0.866000)
String#<< 2 0.852000 0.000000 0.852000 ( 0.852000)
Ruby 1.8.7p334 user system total real
Array#join 0.290000 0.010000 0.300000 ( 0.305367)
String#+ 1 7.620000 0.060000 7.680000 ( 7.682265)
String#+ 2 4.820000 0.130000 4.950000 ( 4.957258)
String#<< 1 1.290000 0.010000 1.300000 ( 1.304764)
String#<< 2 1.350000 0.010000 1.360000 ( 1.347226)
Rubinius (head) user system total real
Array#join 0.864054 0.008001 0.872055 ( 0.870757)
String#+ 1 9.636602 0.076005 9.712607 ( 9.714820)
String#+ 2 6.456403 0.064004 6.520407 ( 6.521633)
String#<< 1 2.196138 0.016001 2.212139 ( 2.212564)
String#<< 2 2.176136 0.012001 2.188137 ( 2.186298)
Here's the benchmarking code:
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("String#+ 1"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.

What if your source of string bits is not an array?
TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<
Old hands at benchmarking may have looked at #Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("Array#join 2"){
N.times{
arr = Array.new(WORDS.length)
arr[0] = WORDS.first
WORDS[1..-1].each{ |w| arr << w; }
s = WORDS.join(', ')
}
}
x.report("String#+ 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
}
}
x.report("String#<< 2 A"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
small words, ruby 2.1.1
user system total real
Array#join 0.130000 0.000000 0.130000 ( 0.128281)
Array#join 2 0.220000 0.000000 0.220000 ( 0.219588)
String#+ 1 1.720000 0.770000 2.490000 ( 2.478555)
String#+ 2 1.040000 0.370000 1.410000 ( 1.407190)
String#<< 1 0.370000 0.000000 0.370000 ( 0.371125)
String#<< 2 0.360000 0.000000 0.360000 ( 0.360161)
String#<< 2 A 0.310000 0.000000 0.310000 ( 0.318130)
small words, ruby 2.1.1
user system total real
Array#join 0.090000 0.000000 0.090000 ( 0.092072)
Array#join 2 0.180000 0.000000 0.180000 ( 0.180423)
String#+ 1 3.400000 0.750000 4.150000 ( 4.149934)
String#+ 2 1.740000 0.370000 2.110000 ( 2.122511)
String#<< 1 0.360000 0.000000 0.360000 ( 0.359707)
String#<< 2 0.340000 0.000000 0.340000 ( 0.343233)
String#<< 2 A 0.300000 0.000000 0.300000 ( 0.297420)
I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:
WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }
as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<
words often longer than 23 chars, ruby 2.1.1
user system total real
Array#join 0.150000 0.000000 0.150000 ( 0.152846)
Array#join 2 0.230000 0.010000 0.240000 ( 0.231272)
String#+ 1 7.450000 5.490000 12.940000 ( 12.936776)
String#+ 2 4.200000 2.590000 6.790000 ( 6.791125)
String#<< 1 0.400000 0.000000 0.400000 ( 0.399452)
String#<< 2 0.380000 0.010000 0.390000 ( 0.389791)
String#<< 2 A 0.340000 0.000000 0.340000 ( 0.341099)
words often longer than 23 chars, ruby 1.9.3
user system total real
Array#join 0.130000 0.010000 0.140000 ( 0.132957)
Array#join 2 0.220000 0.000000 0.220000 ( 0.220181)
String#+ 1 20.060000 5.230000 25.290000 ( 25.293366)
String#+ 2 9.750000 2.670000 12.420000 ( 12.425229)
String#<< 1 0.390000 0.000000 0.390000 ( 0.397733)
String#<< 2 0.390000 0.000000 0.390000 ( 0.390540)
String#<< 2 A 0.330000 0.000000 0.330000 ( 0.333791)

Related

String length modification in Ruby

How can I fix a length of strings to be 10, i.e. an input string will always have less than or 10 characters. If it has less than 10 characters, I have to add 0's at the beginning of the string.
Example of input:
123456
1234567
Needed output:
0000123456
0001234567
Input is arbitrary.
Thanks in advance!
String#rjust does just that:
'1234567'.rjust(10, '0') # => "0001234567"
a = "123456"
b = "1234567"
"0"*(10-a.size)+a
=> "0000123456"
"0"*(10-b.size)+b
=> "0001234567"
Interestingly for speed purposes:
a = "123456"
n = 50000
Benchmark.bm do |x|
x.report{n.times do ; a.rjust(10,'0'); end}
x.report{n.times do ; "0"*(10-a.size)+a; end}
end
user system total real
0.020000 0.000000 0.020000 ( 0.016442)
0.010000 0.000000 0.010000 ( 0.015134)
or a even higher sample size:
irb(main):001:0> require 'benchmark'
=> true
irb(main):002:0> a = "123456"
=> "123456"
irb(main):003:0> n = 5_000_000
=> 5000000
irb(main):004:0> Benchmark.bm do |x|
irb(main):005:1* x.report{n.times do ; a.rjust(10,'0'); end}
irb(main):006:1> x.report{n.times do ; "0"*(10-a.size)+a; end}
irb(main):007:1> end
user system total real
1.510000 0.000000 1.510000 ( 1.519720)
1.480000 0.000000 1.480000 ( 1.486935)

Reverse an array without using a loop in ruby

I have a coding challenge to reverse a an array with 5 elements in it. How would I do this without using the reverse method?
Code:
def reverse(array)
array
end
p reverse(["a", 1, "apple", 8, 90])
You can treat array as a stack and pop the elements from the end:
def reverse(array)
rev = []
rev << array.pop until array.empty?
rev
end
or if you don't like modifying objects, use more functional-like reduce:
def reverse(array)
array.reduce([]) {|acc, x| [x] + acc}
end
Cary mentioned in the comment about the performance. The functional approach might not be the fastest way, so if you really want to do it fast, create a buffor array and just add the items from the end to begin:
def reverse(array)
reversed = Array.new(array.count)
array.each_with_index do |item, index|
reversed[-(index + 1)] = item
end
reversed
end
Gentlemen, start your engines!
[Edit: added two method from #Grych and results for n = 8_000.]
#Grych, #ArupRakshit, #konsolebox and #JörgWMittag: please check that I've written your method(s) correctly.
Methods
def grych_reduce(array)
array.reduce([]) {|acc, x| [x] + acc}
end
def grych_prebuild(array)
reversed = Array.new(array.count)
array.each_with_index do |item, index|
reversed[-(index + 1)] = item
end
reversed
end
def arup(ary)
ary.values_at(*(ary.size-1).downto(0))
end
def konsolebox(array)
t = array.pop
konsolebox(array) if array.length > 0
array.unshift t
end
def jorg_recurse(array)
return array if array.size < 2
reverse(array.drop(1)) + array.first(1)
end
def jorg_tail(array, accum=[])
return accum if array.empty?
reverse(array.drop(1), array.first(1) + accum)
end
def jorg_fold(array)
array.reduce([]) {|accum, el| [el] + accum }
end
def jorg_loop(array)
array.each_with_object([]) {|el, accum| accum.unshift(el) }
end
def cary_rotate(arr)
arr.size.times.with_object([]) { |_,a| a << arr.rotate!(-1).first }
end
def cary_boring(arr)
(arr.size-1).downto(0).with_object([]) { |i,a| a << arr[i] }
end
Benchmark
require 'benchmark'
arr = [*(1..n)]
puts "n = #{n}"
Benchmark.bm(16) do |bm|
bm.report('grych_reduce') { grych_reduce(arr) }
bm.report('grych_prebuild') { grych_prebuild(arr) }
bm.report('arup') { arup(arr) }
bm.report('konsolebox') { konsolebox(arr) }
bm.report('jorg_recurse') { jorg_recurse(arr) }
bm.report('jorg_tail') { jorg_tail(arr) }
bm.report('jorg_fold') { jorg_fold(arr) }
bm.report('jorg_loop') { jorg_loop(arr) }
bm.report('cary_rotate') { cary_rotate(arr) }
bm.report('cary_boring') { cary_boring(arr) }
bm.report('grych_destructo') { grych_destructo(arr) }
end
Wednesday: warm-up (n = 8_000)
user system total real
grych_reduce 0.060000 0.060000 0.120000 ( 0.115510)
grych_prebuild 0.000000 0.000000 0.000000 ( 0.001150)
arup 0.000000 0.000000 0.000000 ( 0.000563)
konsolebox 0.000000 0.000000 0.000000 ( 0.001581)
jorg_recurse 0.060000 0.040000 0.100000 ( 0.096417)
jorg_tail 0.210000 0.070000 0.280000 ( 0.282729)
jorg_fold 0.060000 0.080000 0.140000 ( 0.138216)
jorg_loop 0.000000 0.000000 0.000000 ( 0.001174)
cary_rotate 0.060000 0.000000 0.060000 ( 0.056863)
cary_boring 0.000000 0.000000 0.000000 ( 0.000961)
grych_destructo 0.000000 0.000000 0.000000 ( 0.000524)
Thursday: trials #1 (n = 10_000)
user system total real
grych_reduce 0.090000 0.080000 0.170000 ( 0.163276)
grych_prebuild 0.000000 0.000000 0.000000 ( 0.001500)
arup 0.000000 0.000000 0.000000 ( 0.000706)
jorg_fold 0.080000 0.060000 0.140000 ( 0.139656)
jorg_loop 0.000000 0.000000 0.000000 ( 0.001388)
cary_rotate 0.090000 0.000000 0.090000 ( 0.087327)
cary_boring 0.000000 0.000000 0.000000 ( 0.001185)
grych_destructo 0.000000 0.000000 0.000000 ( 0.000694)
konsolebox, jorg_recurse and jorg_tail eliminated (stack level too deep).
Friday: trials #2 (n = 50_000)
user system total real
grych_reduce 2.430000 3.490000 5.920000 ( 5.920393)
grych_prebuild 0.010000 0.000000 0.010000 ( 0.007000)
arup 0.000000 0.000000 0.000000 ( 0.003826)
jorg_fold 2.430000 3.590000 6.020000 ( 6.026433)
jorg_loop 0.010000 0.010000 0.020000 ( 0.008491)
cary_rotate 2.680000 0.000000 2.680000 ( 2.686009)
cary_boring 0.010000 0.000000 0.010000 ( 0.006122)
grych_destructo 0.000000 0.000000 0.000000 ( 0.003288)
Saturday: qualifications (n = 200_000)
user system total real
grych_reduce 43.720000 66.140000 109.860000 (109.901040)
grych_prebuild 0.030000 0.000000 0.030000 ( 0.028287)
jorg_fold 43.700000 66.490000 110.190000 (110.252620)
jorg_loop 0.030000 0.010000 0.040000 ( 0.030409)
cary_rotate 43.060000 0.050000 43.110000 ( 43.118151)
cary_boring 0.020000 0.000000 0.020000 ( 0.024570)
grych_destructo 0.010000 0.000000 0.010000 ( 0.013338)
arup_verse eliminated (stack level too deep); grych_reduce, jorg_fold and cary_rotate eliminated (uncompetitive).
Sunday: final (n = 10_000_000)
user system total real
grych_prebuild 1.450000 0.020000 1.470000 ( 1.478903)
jorg_loop 1.530000 0.040000 1.570000 ( 1.649403)
cary_boring 1.250000 0.040000 1.290000 ( 1.288357)
grych_destructo 0.640000 0.030000 0.670000 ( 0.689819)
Recursion indeed is the solution if you're not going to use a loop. while or until is still a loop, and using built-in methods not doing recursion may also still be using a loop internally.
#!/usr/bin/env ruby
a = [1, 2, 3]
def reverse(array)
t = array.pop
reverse(array) if array.length > 0
array.unshift t
end
puts reverse(Array.new(a)).inspect # [3, 2, 1]
Update
Naturally recursion has limits since it depends on the stack but that's the best you can have if you don't want to use a loop. Following Cary Swoveland's post, this is the benchmark on 8500 elements:
user system total real
#Grych 0.060000 0.010000 0.070000 ( 0.073179)
#ArupRakshit 0.000000 0.000000 0.000000 ( 0.000836)
#konsolebox 0.000000 0.000000 0.000000 ( 0.001771)
#JörgWMittag recursion 0.050000 0.000000 0.050000 ( 0.053475)
#Jörg tail 0.210000 0.040000 0.250000 ( 0.246849)
#Jörg fold 0.040000 0.010000 0.050000 ( 0.045788)
#Jörg loop 0.000000 0.000000 0.000000 ( 0.000924)
Cary rotate 0.060000 0.000000 0.060000 ( 0.059954)
Cary boring 0.000000 0.000000 0.000000 ( 0.001004)
One thought :-
ary = ["a", 1, "apple", 8, 90]
ary.values_at(*(ary.size-1).downto(0))
# => [90, 8, "apple", 1, "a"]
ary.size.downto(0) gives #<Enumerator: ...>. And *#<Enumerator: ...> is just a Enumerable#to_a method call which splats the Enumerator to [4, 3, 2, 1, 0]. Finally, Array#values_at is working as documented.
The obvious solution is to use recursion:
def reverse(array)
return array if array.size < 2
reverse(array.drop(1)) + array.first(1)
end
We can make this tail-recursive using the standard accumulator trick:
def reverse(array, accum=[])
return accum if array.empty?
reverse(array.drop(1), array.first(1) + accum)
end
But of course, tail recursion is isomorphic to looping.
We could use a fold:
def reverse(array)
array.reduce([]) {|accum, el| [el] + accum }
end
But fold is equivalent to a loop.
def reverse(array)
array.each_with_object([]) {|el, accum| accum.unshift(el) }
end
Really, each_with_object is an iterator and it is the side-effectful cousin of fold, so there's actually two reasons why this is equivalent to a loop.
Here's another non-destructive approach:
arr = ["a", 1, "apple", 8, 90]
arr.size.times.with_object([]) { |_,a| a << arr.rotate!(-1).first }
#=> [90, 8, "apple", 1, "a"]
arr
#=> ["a", 1, "apple", 8, 90]
Another would the most uninteresting method imaginable:
(arr.size-1).downto(0).with_object([]) { |i,a| a << arr[i] }
#=> [90, 8, "apple", 1, "a"]
arr
#=> ["a", 1, "apple", 8, 90]
Konsolebox is right. If they are asking for the method without loops, that simply means that you cannot use any kind of loop whether it is map, each, while, until or any even built in methods that use loops, like length, size and count etc.
Everything needs to be recursive:
def recursive_reversal(array)
return array if array == [] # or array.empty?
last_element = array.pop
return [last_element, recursive_reversal(array)].flatten
end
Ruby uses recursion to flatten, so flatten will not entail any kind of loop.
def reverse(array)
array.values_at(*((array.size-1).downto 0))
end

How do I create the intersection of two hashes?

I have two hashes:
hash1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
hash2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
I need a hash which contains common keys in both hashes:
hash3 = {3 => "hello" , 4 => "world"}
Is it possible to do it without any loop?
hash3 = hash1.keep_if { |k, v| hash2.key? k }
This won't have the same effect as the code in the question, instead it will return:
hash3 #=> { 3 => "c", 4 => "d" }
The order of the hashes is important here. The values will always be taken from the hash that #keep_if is send to.
hash3 = hash2.keep_if { |k, v| hash1.key? k }
#=> {3 => "hello", 4 => "world"}
I'd go with this:
hash1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
hash2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
Hash[(hash1.keys & hash2.keys).zip(hash2.values_at(*(hash1.keys & hash2.keys)))]
=> {3=>"hello", 4=>"world"}
Which can be reduced a bit to:
keys = (hash1.keys & hash2.keys)
Hash[keys.zip(hash2.values_at(*keys))]
The trick is in Array's & method. The documentation says:
Set Intersection — Returns a new array containing elements common to the two arrays, excluding any duplicates. The order is preserved from the original array.
Here are some benchmarks to show what is the most efficient way to do this:
require 'benchmark'
HASH1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
HASH2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
def tinman
keys = (HASH1.keys & HASH2.keys)
Hash[keys.zip(HASH2.values_at(*keys))]
end
def santhosh
HASH2.select {|key, value| HASH1.has_key? key }
end
def santhosh_2
HASH2.select {|key, value| HASH1[key] }
end
def priti
HASH2.select{|k,v| HASH1.assoc(k) }
end
def koraktor
HASH1.keep_if { |k, v| HASH2.key? k }
end
def koraktor2
HASH2.keep_if { |k, v| HASH1.key? k }
end
N = 1_000_000
puts RUBY_VERSION
puts "N= #{N}"
puts [:tinman, :santhosh, :santhosh_2, :priti, :koraktor, :koraktor2].map{ |s| "#{s.to_s} = #{send(s)}" }
Benchmark.bm(11) do |x|
x.report('tinman') { N.times { tinman() }}
x.report('santhosh_2') { N.times { santhosh_2() }}
x.report('santhosh') { N.times { santhosh() }}
x.report('priti') { N.times { priti() }}
x.report('koraktor') { N.times { koraktor() }}
x.report('koraktor2') { N.times { koraktor2() }}
end
Ruby 1.9.3-p448:
1.9.3
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
user system total real
tinman 2.430000 0.000000 2.430000 ( 2.430030)
santhosh_2 1.000000 0.020000 1.020000 ( 1.003635)
santhosh 1.090000 0.010000 1.100000 ( 1.104067)
priti 1.350000 0.000000 1.350000 ( 1.352476)
koraktor 0.490000 0.000000 0.490000 ( 0.484686)
koraktor2 0.480000 0.000000 0.480000 ( 0.483327)
Running under Ruby 2.0.0-p247:
2.0.0
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
user system total real
tinman 1.890000 0.000000 1.890000 ( 1.882352)
santhosh_2 0.710000 0.010000 0.720000 ( 0.735830)
santhosh 0.790000 0.020000 0.810000 ( 0.807413)
priti 1.030000 0.010000 1.040000 ( 1.030018)
koraktor 0.390000 0.000000 0.390000 ( 0.389431)
koraktor2 0.390000 0.000000 0.390000 ( 0.389072)
Koraktor's original code doesn't work, but he turned it around nicely with his second code pass, and walks away with the best speed. I added the santhosh_2 method to see what effect removing key? would have. It sped the routine up a little, but not enough to catch up to Koraktor's.
Just for documentation purposes, I tweaked Koraktor's second code to remove the key? method also, and shaved more time from it. Here's the added method and the new output:
def koraktor3
HASH2.keep_if { |k, v| HASH1[k] }
end
1.9.3
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
koraktor3 = {3=>"hello", 4=>"world"}
user system total real
tinman 2.380000 0.000000 2.380000 ( 2.382392)
santhosh_2 0.970000 0.020000 0.990000 ( 0.976672)
santhosh 1.070000 0.010000 1.080000 ( 1.078397)
priti 1.320000 0.000000 1.320000 ( 1.318652)
koraktor 0.480000 0.000000 0.480000 ( 0.488613)
koraktor2 0.490000 0.000000 0.490000 ( 0.490099)
koraktor3 0.390000 0.000000 0.390000 ( 0.389386)
2.0.0
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
koraktor3 = {3=>"hello", 4=>"world"}
user system total real
tinman 1.840000 0.000000 1.840000 ( 1.832491)
santhosh_2 0.720000 0.010000 0.730000 ( 0.737737)
santhosh 0.780000 0.020000 0.800000 ( 0.801619)
priti 1.040000 0.010000 1.050000 ( 1.044588)
koraktor 0.390000 0.000000 0.390000 ( 0.387265)
koraktor2 0.390000 0.000000 0.390000 ( 0.388648)
koraktor3 0.320000 0.000000 0.320000 ( 0.327859)
hash2.select {|key, value| hash1.has_key? key }
# => {3=>"hello", 4=>"world"}
Ruby 2.5 has added Hash#slice, which allows a compact code like:
hash3 = hash1.slice(*hash2.keys)
In older rubies this was possible in rails or projects using active support's hash extensions.

What is the complexity of Ruby's Array#insert?

What is the complexity of Ruby's Array#insert?
Is it O(1) or O(n) (memory is copied)?
Simple benchmark shows that insert is O(n):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..100000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
end
user system total real
0.078000 0.000000 0.078000 ( 0.077023)
0.500000 0.000000 0.500000 ( 0.522345)
5.953000 0.000000 5.953000 ( 5.967949)
As long as you don't push to the end of the array, when it becomes O(1):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..100000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..10000000).to_a
x.report { 10000.times {arr.push 1} }
end
user system total real
0.000000 0.000000 0.000000 ( 0.001002)
0.000000 0.000000 0.000000 ( 0.001000)
0.000000 0.000000 0.000000 ( 0.001001)
0.000000 0.000000 0.000000 ( 0.002001)

Check if a string variable is in a set of strings

Which one is better:
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
x =~ /abc|def|ghi/
?
Which one is better? The question can't be easily answered, because they don't all do the same things.
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
compare x against fixed strings for equality. x has to be one of those values. Between those two I tend to go with the second because it's easier to maintain. Imagine what it would look like if you had to compare against twenty, fifty or one hundred strings.
The third test:
x ~= /abc|def|ghi/
matches substrings:
x = 'xyzghi'
(x =~ /abc|def|ghi/) # => 3
so it isn't the same as the first two.
EDIT: There are some things in the benchmarks done by nash that I'd do differently. Using Ruby 1.9.2-p180 on a MacBook Pro, this tests 1,000,000 loops and compares the results of anchoring the regex, using grouping, along with not splitting the %w() array each time through the loop:
require 'benchmark'
str = "test"
n = 1_000_000
Benchmark.bm do |x|
x.report { n.times { str == 'abc' || str == 'def' || str == 'ghi' } }
x.report { n.times { %w(abc def ghi).include? str } }
x.report { ary = %w(abc def ghi); n.times { ary.include? str } }
x.report { n.times { str =~ /abc|def|ghi/ } }
x.report { n.times { str =~ /^abc|def|ghi$/ } }
x.report { n.times { str =~ /^(abc|def|ghi)$/ } }
x.report { n.times { str =~ /^(?:abc|def|ghi)$/ } }
x.report { n.times { str =~ /\b(?:abc|def|ghi)\b/ } }
end
# >> user system total real
# >> 1.160000 0.000000 1.160000 ( 1.165331)
# >> 1.920000 0.000000 1.920000 ( 1.920120)
# >> 0.990000 0.000000 0.990000 ( 0.983921)
# >> 1.070000 0.000000 1.070000 ( 1.068140)
# >> 1.050000 0.010000 1.060000 ( 1.054852)
# >> 1.060000 0.000000 1.060000 ( 1.063909)
# >> 1.060000 0.000000 1.060000 ( 1.050813)
# >> 1.050000 0.000000 1.050000 ( 1.056147)
The first might be a tad quicker, since there are no method calls and your doing straight string comparisons, but its also probably the least readable and least maintainable.
The second is definitely the grooviest, and the ruby way of going about it. It's the most maintainable, and probably the best to read.
The last way uses old school perl regex syntax. Fairly fast, not as annoying as the first to maintain, fairly readable.
I guess it depends what you mean by "better".
some benchmarks:
require 'benchmark'
str = "test"
Benchmark.bm do |x|
x.report {100000.times {if str == 'abc' || str == 'def' || str == 'ghi'; end}}
x.report {100000.times {if %w(abc def ghi).include? str; end}}
x.report {100000.times {if str =~ /abc|def|ghi/; end}}
end
user system total real
0.250000 0.000000 0.250000 ( 0.251014)
0.374000 0.000000 0.374000 ( 0.402023)
0.265000 0.000000 0.265000 ( 0.259014)
So as you can see the first way works faster then other. And the longer str, the slower the last way works:
str = "testasdasdasdasdasddkmfskjndfbdkjngdjgndksnfg"
user system total real
0.234000 0.000000 0.234000 ( 0.248014)
0.405000 0.000000 0.405000 ( 0.403023)
1.046000 0.000000 1.046000 ( 1.038059)

Resources