What is the complexity of Ruby's Array#insert? - ruby

What is the complexity of Ruby's Array#insert?
Is it O(1) or O(n) (memory is copied)?

Simple benchmark shows that insert is O(n):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..100000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.insert(1, 1)} }
end
user system total real
0.078000 0.000000 0.078000 ( 0.077023)
0.500000 0.000000 0.500000 ( 0.522345)
5.953000 0.000000 5.953000 ( 5.967949)
As long as you don't push to the end of the array, when it becomes O(1):
Benchmark.bm do |x|
arr = (1..10000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..100000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..1000000).to_a
x.report { 10000.times {arr.push 1} }
arr = (1..10000000).to_a
x.report { 10000.times {arr.push 1} }
end
user system total real
0.000000 0.000000 0.000000 ( 0.001002)
0.000000 0.000000 0.000000 ( 0.001000)
0.000000 0.000000 0.000000 ( 0.001001)
0.000000 0.000000 0.000000 ( 0.002001)

Related

Reverse an array without using a loop in ruby

I have a coding challenge to reverse a an array with 5 elements in it. How would I do this without using the reverse method?
Code:
def reverse(array)
array
end
p reverse(["a", 1, "apple", 8, 90])
You can treat array as a stack and pop the elements from the end:
def reverse(array)
rev = []
rev << array.pop until array.empty?
rev
end
or if you don't like modifying objects, use more functional-like reduce:
def reverse(array)
array.reduce([]) {|acc, x| [x] + acc}
end
Cary mentioned in the comment about the performance. The functional approach might not be the fastest way, so if you really want to do it fast, create a buffor array and just add the items from the end to begin:
def reverse(array)
reversed = Array.new(array.count)
array.each_with_index do |item, index|
reversed[-(index + 1)] = item
end
reversed
end
Gentlemen, start your engines!
[Edit: added two method from #Grych and results for n = 8_000.]
#Grych, #ArupRakshit, #konsolebox and #JörgWMittag: please check that I've written your method(s) correctly.
Methods
def grych_reduce(array)
array.reduce([]) {|acc, x| [x] + acc}
end
def grych_prebuild(array)
reversed = Array.new(array.count)
array.each_with_index do |item, index|
reversed[-(index + 1)] = item
end
reversed
end
def arup(ary)
ary.values_at(*(ary.size-1).downto(0))
end
def konsolebox(array)
t = array.pop
konsolebox(array) if array.length > 0
array.unshift t
end
def jorg_recurse(array)
return array if array.size < 2
reverse(array.drop(1)) + array.first(1)
end
def jorg_tail(array, accum=[])
return accum if array.empty?
reverse(array.drop(1), array.first(1) + accum)
end
def jorg_fold(array)
array.reduce([]) {|accum, el| [el] + accum }
end
def jorg_loop(array)
array.each_with_object([]) {|el, accum| accum.unshift(el) }
end
def cary_rotate(arr)
arr.size.times.with_object([]) { |_,a| a << arr.rotate!(-1).first }
end
def cary_boring(arr)
(arr.size-1).downto(0).with_object([]) { |i,a| a << arr[i] }
end
Benchmark
require 'benchmark'
arr = [*(1..n)]
puts "n = #{n}"
Benchmark.bm(16) do |bm|
bm.report('grych_reduce') { grych_reduce(arr) }
bm.report('grych_prebuild') { grych_prebuild(arr) }
bm.report('arup') { arup(arr) }
bm.report('konsolebox') { konsolebox(arr) }
bm.report('jorg_recurse') { jorg_recurse(arr) }
bm.report('jorg_tail') { jorg_tail(arr) }
bm.report('jorg_fold') { jorg_fold(arr) }
bm.report('jorg_loop') { jorg_loop(arr) }
bm.report('cary_rotate') { cary_rotate(arr) }
bm.report('cary_boring') { cary_boring(arr) }
bm.report('grych_destructo') { grych_destructo(arr) }
end
Wednesday: warm-up (n = 8_000)
user system total real
grych_reduce 0.060000 0.060000 0.120000 ( 0.115510)
grych_prebuild 0.000000 0.000000 0.000000 ( 0.001150)
arup 0.000000 0.000000 0.000000 ( 0.000563)
konsolebox 0.000000 0.000000 0.000000 ( 0.001581)
jorg_recurse 0.060000 0.040000 0.100000 ( 0.096417)
jorg_tail 0.210000 0.070000 0.280000 ( 0.282729)
jorg_fold 0.060000 0.080000 0.140000 ( 0.138216)
jorg_loop 0.000000 0.000000 0.000000 ( 0.001174)
cary_rotate 0.060000 0.000000 0.060000 ( 0.056863)
cary_boring 0.000000 0.000000 0.000000 ( 0.000961)
grych_destructo 0.000000 0.000000 0.000000 ( 0.000524)
Thursday: trials #1 (n = 10_000)
user system total real
grych_reduce 0.090000 0.080000 0.170000 ( 0.163276)
grych_prebuild 0.000000 0.000000 0.000000 ( 0.001500)
arup 0.000000 0.000000 0.000000 ( 0.000706)
jorg_fold 0.080000 0.060000 0.140000 ( 0.139656)
jorg_loop 0.000000 0.000000 0.000000 ( 0.001388)
cary_rotate 0.090000 0.000000 0.090000 ( 0.087327)
cary_boring 0.000000 0.000000 0.000000 ( 0.001185)
grych_destructo 0.000000 0.000000 0.000000 ( 0.000694)
konsolebox, jorg_recurse and jorg_tail eliminated (stack level too deep).
Friday: trials #2 (n = 50_000)
user system total real
grych_reduce 2.430000 3.490000 5.920000 ( 5.920393)
grych_prebuild 0.010000 0.000000 0.010000 ( 0.007000)
arup 0.000000 0.000000 0.000000 ( 0.003826)
jorg_fold 2.430000 3.590000 6.020000 ( 6.026433)
jorg_loop 0.010000 0.010000 0.020000 ( 0.008491)
cary_rotate 2.680000 0.000000 2.680000 ( 2.686009)
cary_boring 0.010000 0.000000 0.010000 ( 0.006122)
grych_destructo 0.000000 0.000000 0.000000 ( 0.003288)
Saturday: qualifications (n = 200_000)
user system total real
grych_reduce 43.720000 66.140000 109.860000 (109.901040)
grych_prebuild 0.030000 0.000000 0.030000 ( 0.028287)
jorg_fold 43.700000 66.490000 110.190000 (110.252620)
jorg_loop 0.030000 0.010000 0.040000 ( 0.030409)
cary_rotate 43.060000 0.050000 43.110000 ( 43.118151)
cary_boring 0.020000 0.000000 0.020000 ( 0.024570)
grych_destructo 0.010000 0.000000 0.010000 ( 0.013338)
arup_verse eliminated (stack level too deep); grych_reduce, jorg_fold and cary_rotate eliminated (uncompetitive).
Sunday: final (n = 10_000_000)
user system total real
grych_prebuild 1.450000 0.020000 1.470000 ( 1.478903)
jorg_loop 1.530000 0.040000 1.570000 ( 1.649403)
cary_boring 1.250000 0.040000 1.290000 ( 1.288357)
grych_destructo 0.640000 0.030000 0.670000 ( 0.689819)
Recursion indeed is the solution if you're not going to use a loop. while or until is still a loop, and using built-in methods not doing recursion may also still be using a loop internally.
#!/usr/bin/env ruby
a = [1, 2, 3]
def reverse(array)
t = array.pop
reverse(array) if array.length > 0
array.unshift t
end
puts reverse(Array.new(a)).inspect # [3, 2, 1]
Update
Naturally recursion has limits since it depends on the stack but that's the best you can have if you don't want to use a loop. Following Cary Swoveland's post, this is the benchmark on 8500 elements:
user system total real
#Grych 0.060000 0.010000 0.070000 ( 0.073179)
#ArupRakshit 0.000000 0.000000 0.000000 ( 0.000836)
#konsolebox 0.000000 0.000000 0.000000 ( 0.001771)
#JörgWMittag recursion 0.050000 0.000000 0.050000 ( 0.053475)
#Jörg tail 0.210000 0.040000 0.250000 ( 0.246849)
#Jörg fold 0.040000 0.010000 0.050000 ( 0.045788)
#Jörg loop 0.000000 0.000000 0.000000 ( 0.000924)
Cary rotate 0.060000 0.000000 0.060000 ( 0.059954)
Cary boring 0.000000 0.000000 0.000000 ( 0.001004)
One thought :-
ary = ["a", 1, "apple", 8, 90]
ary.values_at(*(ary.size-1).downto(0))
# => [90, 8, "apple", 1, "a"]
ary.size.downto(0) gives #<Enumerator: ...>. And *#<Enumerator: ...> is just a Enumerable#to_a method call which splats the Enumerator to [4, 3, 2, 1, 0]. Finally, Array#values_at is working as documented.
The obvious solution is to use recursion:
def reverse(array)
return array if array.size < 2
reverse(array.drop(1)) + array.first(1)
end
We can make this tail-recursive using the standard accumulator trick:
def reverse(array, accum=[])
return accum if array.empty?
reverse(array.drop(1), array.first(1) + accum)
end
But of course, tail recursion is isomorphic to looping.
We could use a fold:
def reverse(array)
array.reduce([]) {|accum, el| [el] + accum }
end
But fold is equivalent to a loop.
def reverse(array)
array.each_with_object([]) {|el, accum| accum.unshift(el) }
end
Really, each_with_object is an iterator and it is the side-effectful cousin of fold, so there's actually two reasons why this is equivalent to a loop.
Here's another non-destructive approach:
arr = ["a", 1, "apple", 8, 90]
arr.size.times.with_object([]) { |_,a| a << arr.rotate!(-1).first }
#=> [90, 8, "apple", 1, "a"]
arr
#=> ["a", 1, "apple", 8, 90]
Another would the most uninteresting method imaginable:
(arr.size-1).downto(0).with_object([]) { |i,a| a << arr[i] }
#=> [90, 8, "apple", 1, "a"]
arr
#=> ["a", 1, "apple", 8, 90]
Konsolebox is right. If they are asking for the method without loops, that simply means that you cannot use any kind of loop whether it is map, each, while, until or any even built in methods that use loops, like length, size and count etc.
Everything needs to be recursive:
def recursive_reversal(array)
return array if array == [] # or array.empty?
last_element = array.pop
return [last_element, recursive_reversal(array)].flatten
end
Ruby uses recursion to flatten, so flatten will not entail any kind of loop.
def reverse(array)
array.values_at(*((array.size-1).downto 0))
end

Find most common hash value

I have the following hash:
h = Hash["a","foo", "b","bar", "c","foo"]
I would like to return the most common value, in this case foo. What is the most efficient way to do this?
Similar to this question, but adapted to hashes.
You can get the values as an array and then just plug into the solution you linked.
h.values.group_by { |e| e }.values.max_by(&:size).first
#=> foo
We can do this:
h = Hash["a","foo", "b","bar", "c","foo", "d", "bar", 'e', 'foobar']
p h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
# >> "foo"
UPDATE(slower than my first solution)
h = Hash["a","foo", "b","bar", "c","foo"]
h.group_by { |_,v| v }.max_by{|_,v| v.size}.first
# >> "foo"
Benchmark
require 'benchmark'
def seanny123(h)
h.values.group_by { |e| e }.values.max_by(&:size).first
end
def stefan(h)
frequencies = h.each_with_object(Hash.new(0)) { |(k,v), h| h[v] += 1 }
value, count = frequencies.max_by { |k, v| v }
value
end
def yevgeniy_anfilofyev(h)
h.group_by{|(_,v)| v }.sort_by{|(_,v)| v.size }[-1][0]
end
def acts_as_geek(h)
v = h.values
max = v.map {|i| v.count(i)}.max
v.select {|i| v.count(i) == max}.uniq
end
def squiguy(h)
v = h.values
v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
end
def babai1(h)
h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
end
def babai2(h)
h.group_by { |_,v| v }.max_by{|_,v| v.size}.first
end
def benchmark(h,n)
Benchmark.bm(20) do |x|
x.report("Seanny123") { n.times { seanny123(h) } }
x.report("Stefan") { n.times { stefan(h) } }
x.report("Yevgeniy Anfilofyev") { n.times { yevgeniy_anfilofyev(h) } }
x.report("acts_as_geek") { n.times { acts_as_geek(h) } }
x.report("squiguy") { n.times { squiguy(h) } }
x.report("Babai1") { n.times { babai1(h) } }
x.report("Babai2") { n.times { babai2(h) } }
end
end
n = 10
h = {}
1000.times do |i|
h["a#{i}"] = "foo"
h["b#{i}"] = "bar"
h["c#{i}"] = "foo"
end
benchmark(h, n)
Result:-
user system total real
Seanny123 0.020000 0.000000 0.020000 ( 0.015550)
Stefan 0.040000 0.000000 0.040000 ( 0.044666)
Yevgeniy Anfilofyev 0.020000 0.000000 0.020000 ( 0.023162)
acts_as_geek 16.160000 0.000000 16.160000 ( 16.223582)
squiguy 15.740000 0.000000 15.740000 ( 15.768917)
Babai1 0.020000 0.000000 0.020000 ( 0.015430)
Babai2 0.020000 0.000000 0.020000 ( 0.025711)
You can calculate the frequencies with Enumerable#inject:
frequencies = h.inject(Hash.new(0)) { |h, (k,v)| h[v] += 1 ; h }
#=> {"foo"=>2, "bar"=>1}
Or Enumerable#each_with_object:
frequencies = h.each_with_object(Hash.new(0)) { |(k,v), h| h[v] += 1 }
#=> {"foo"=>2, "bar"=>1}
And the maximum with Enumerable#max_by:
value, count = frequencies.max_by { |k, v| v }
#=> ["foo", 2]
value
#=> "foo"
Benchmarks
With a small hash:
n = 100000
h = Hash["a","foo", "b","bar", "c","foo"]
benchmark(h, n)
Results:
user system total real
Seanny123 0.220000 0.000000 0.220000 ( 0.222342)
Stefan 0.260000 0.000000 0.260000 ( 0.263583)
Yevgeniy Anfilofyev 0.350000 0.000000 0.350000 ( 0.341685)
acts_as_geek 0.300000 0.000000 0.300000 ( 0.306601)
squiguy 0.140000 0.000000 0.140000 ( 0.139141)
Babai 0.220000 0.000000 0.220000 ( 0.218616)
With a large hash:
n = 10
h = {}
1000.times do |i|
h["a#{i}"] = "foo"
h["b#{i}"] = "bar"
h["c#{i}"] = "foo"
end
benchmark(h, n)
Results:
user system total real
Seanny123 0.060000 0.000000 0.060000 ( 0.059068)
Stefan 0.100000 0.000000 0.100000 ( 0.100760)
Yevgeniy Anfilofyev 0.080000 0.000000 0.080000 ( 0.080988)
acts_as_geek 97.020000 0.020000 97.040000 ( 97.072220)
squiguy 97.480000 0.020000 97.500000 ( 97.535130)
Babai 0.050000 0.000000 0.050000 ( 0.058653)
Benchmark code:
require 'benchmark'
def seanny123(h)
h.values.group_by { |e| e }.values.max_by(&:size).first
end
def stefan(h)
frequencies = h.each_with_object(Hash.new(0)) { |(k,v), h| h[v] += 1 }
value, count = frequencies.max_by { |k, v| v }
value
end
def yevgeniy_anfilofyev(h)
h.group_by{|(_,v)| v }.sort_by{|(_,v)| v.size }[-1][0]
end
def acts_as_geek(h)
v = h.values
max = v.map {|i| v.count(i)}.max
v.select {|i| v.count(i) == max}.uniq
end
def squiguy(h)
v = h.values
v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
end
def babai(h)
h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
end
def benchmark(h,n)
Benchmark.bm(20) do |x|
x.report("Seanny123") { n.times { seanny123(h) } }
x.report("Stefan") { n.times { stefan(h) } }
x.report("Yevgeniy Anfilofyev") { n.times { yevgeniy_anfilofyev(h) } }
x.report("acts_as_geek") { n.times { acts_as_geek(h) } }
x.report("squiguy") { n.times { squiguy(h) } }
x.report("Babai") { n.times { babai(h) } }
end
end
With group_by but without values and with sort_by:
h.group_by{|(_,v)| v }
.sort_by{|(_,v)| v.size }[-1][0]
Update: Don't use my solution. My computer had to use a lot of clock cycles to compute it. Didn't know a functional approach would be so slow here.
How about using Enumerable#reduce?
h = Hash["a","foo", "b","bar", "c","foo"]
v = h.values
most = v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
p most
As a caveat, this will only return one value if two or more share the same "max" count value in the hash. If you care about all the "max" values, this is not a solution to use.
Here is a benchmark.
#!/usr/bin/env ruby
require 'benchmark'
MULT = 10
arr = []
letters = ("a".."z").to_a
10000.times do
arr << letters.sample
end
10000.times do |i|
h[i] = arr[i]
end
Benchmark.bm do |rep|
rep.report("Seanny123") {
MULT.times do
h.values.group_by { |e| e }.values.max_by(&:size).first
end
}
rep.report("squiguy") {
MULT.times do
v = h.values
most = v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
end
}
rep.report("acts_as_geek") {
MULT.times do
v = h.values
max = v.map {|i| v.count(i)}.max
v.select {|i| v.count(i) == max}.uniq
end
}
rep.report("Yevgeniy Anfilofyev") {
MULT.times do
h.group_by{|(_,v)| v }
.sort_by{|(_,v)| v.size }[-1][0]
end
}
rep.report("Stefan") {
MULT.times do
frequencies = h.inject(Hash.new(0)) { |h, (k,v)| h[v] += 1 ; h }
value, count = frequencies.max_by { |k, v| v }
end
}
rep.report("Babai") {
MULT.times do
h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
end
}
end

How do I create the intersection of two hashes?

I have two hashes:
hash1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
hash2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
I need a hash which contains common keys in both hashes:
hash3 = {3 => "hello" , 4 => "world"}
Is it possible to do it without any loop?
hash3 = hash1.keep_if { |k, v| hash2.key? k }
This won't have the same effect as the code in the question, instead it will return:
hash3 #=> { 3 => "c", 4 => "d" }
The order of the hashes is important here. The values will always be taken from the hash that #keep_if is send to.
hash3 = hash2.keep_if { |k, v| hash1.key? k }
#=> {3 => "hello", 4 => "world"}
I'd go with this:
hash1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
hash2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
Hash[(hash1.keys & hash2.keys).zip(hash2.values_at(*(hash1.keys & hash2.keys)))]
=> {3=>"hello", 4=>"world"}
Which can be reduced a bit to:
keys = (hash1.keys & hash2.keys)
Hash[keys.zip(hash2.values_at(*keys))]
The trick is in Array's & method. The documentation says:
Set Intersection — Returns a new array containing elements common to the two arrays, excluding any duplicates. The order is preserved from the original array.
Here are some benchmarks to show what is the most efficient way to do this:
require 'benchmark'
HASH1 = {1 => "a" , 2 => "b" , 3 => "c" , 4 => "d"}
HASH2 = {3 => "hello", 4 => "world" , 5 => "welcome"}
def tinman
keys = (HASH1.keys & HASH2.keys)
Hash[keys.zip(HASH2.values_at(*keys))]
end
def santhosh
HASH2.select {|key, value| HASH1.has_key? key }
end
def santhosh_2
HASH2.select {|key, value| HASH1[key] }
end
def priti
HASH2.select{|k,v| HASH1.assoc(k) }
end
def koraktor
HASH1.keep_if { |k, v| HASH2.key? k }
end
def koraktor2
HASH2.keep_if { |k, v| HASH1.key? k }
end
N = 1_000_000
puts RUBY_VERSION
puts "N= #{N}"
puts [:tinman, :santhosh, :santhosh_2, :priti, :koraktor, :koraktor2].map{ |s| "#{s.to_s} = #{send(s)}" }
Benchmark.bm(11) do |x|
x.report('tinman') { N.times { tinman() }}
x.report('santhosh_2') { N.times { santhosh_2() }}
x.report('santhosh') { N.times { santhosh() }}
x.report('priti') { N.times { priti() }}
x.report('koraktor') { N.times { koraktor() }}
x.report('koraktor2') { N.times { koraktor2() }}
end
Ruby 1.9.3-p448:
1.9.3
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
user system total real
tinman 2.430000 0.000000 2.430000 ( 2.430030)
santhosh_2 1.000000 0.020000 1.020000 ( 1.003635)
santhosh 1.090000 0.010000 1.100000 ( 1.104067)
priti 1.350000 0.000000 1.350000 ( 1.352476)
koraktor 0.490000 0.000000 0.490000 ( 0.484686)
koraktor2 0.480000 0.000000 0.480000 ( 0.483327)
Running under Ruby 2.0.0-p247:
2.0.0
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
user system total real
tinman 1.890000 0.000000 1.890000 ( 1.882352)
santhosh_2 0.710000 0.010000 0.720000 ( 0.735830)
santhosh 0.790000 0.020000 0.810000 ( 0.807413)
priti 1.030000 0.010000 1.040000 ( 1.030018)
koraktor 0.390000 0.000000 0.390000 ( 0.389431)
koraktor2 0.390000 0.000000 0.390000 ( 0.389072)
Koraktor's original code doesn't work, but he turned it around nicely with his second code pass, and walks away with the best speed. I added the santhosh_2 method to see what effect removing key? would have. It sped the routine up a little, but not enough to catch up to Koraktor's.
Just for documentation purposes, I tweaked Koraktor's second code to remove the key? method also, and shaved more time from it. Here's the added method and the new output:
def koraktor3
HASH2.keep_if { |k, v| HASH1[k] }
end
1.9.3
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
koraktor3 = {3=>"hello", 4=>"world"}
user system total real
tinman 2.380000 0.000000 2.380000 ( 2.382392)
santhosh_2 0.970000 0.020000 0.990000 ( 0.976672)
santhosh 1.070000 0.010000 1.080000 ( 1.078397)
priti 1.320000 0.000000 1.320000 ( 1.318652)
koraktor 0.480000 0.000000 0.480000 ( 0.488613)
koraktor2 0.490000 0.000000 0.490000 ( 0.490099)
koraktor3 0.390000 0.000000 0.390000 ( 0.389386)
2.0.0
N= 1000000
tinman = {3=>"hello", 4=>"world"}
santhosh = {3=>"hello", 4=>"world"}
santhosh_2 = {3=>"hello", 4=>"world"}
priti = {3=>"hello", 4=>"world"}
koraktor = {3=>"c", 4=>"d"}
koraktor2 = {3=>"hello", 4=>"world"}
koraktor3 = {3=>"hello", 4=>"world"}
user system total real
tinman 1.840000 0.000000 1.840000 ( 1.832491)
santhosh_2 0.720000 0.010000 0.730000 ( 0.737737)
santhosh 0.780000 0.020000 0.800000 ( 0.801619)
priti 1.040000 0.010000 1.050000 ( 1.044588)
koraktor 0.390000 0.000000 0.390000 ( 0.387265)
koraktor2 0.390000 0.000000 0.390000 ( 0.388648)
koraktor3 0.320000 0.000000 0.320000 ( 0.327859)
hash2.select {|key, value| hash1.has_key? key }
# => {3=>"hello", 4=>"world"}
Ruby 2.5 has added Hash#slice, which allows a compact code like:
hash3 = hash1.slice(*hash2.keys)
In older rubies this was possible in rails or projects using active support's hash extensions.

What's the fastest way to build a string in Ruby?

In Ternary operator, a person wanting to join ["foo", "bar", "baz"] with commas and an "and" cited The Ruby Cookbook as saying
If efficiency is important to you,
don't build a new string when you can
append items onto an existing string.
[And so on]... Use str << var1 << ' '
<< var2 instead.
But the book was written in 2006.
Is using appending (ie <<) still the fastest way to build a large string given an array of smaller strings, in all major implementations of Ruby?
Use Array#join when you can, and String#<< when you can't.
The problem with using String#+ is that it must create an intermediary (unwanted) string object, while String#<< mutates the original string. Here are the time results (in seconds) of joining 1,000 strings with ", " 1,000 times, via Array#join, String#+, and String#<<:
Ruby 1.9.2p180 user system total real
Array#join 0.320000 0.000000 0.320000 ( 0.330224)
String#+ 1 7.730000 0.200000 7.930000 ( 8.373900)
String#+ 2 4.670000 0.600000 5.270000 ( 5.546633)
String#<< 1 1.260000 0.010000 1.270000 ( 1.315991)
String#<< 2 1.600000 0.020000 1.620000 ( 1.793415)
JRuby 1.6.1 user system total real
Array#join 0.185000 0.000000 0.185000 ( 0.185000)
String#+ 1 9.118000 0.000000 9.118000 ( 9.118000)
String#+ 2 4.544000 0.000000 4.544000 ( 4.544000)
String#<< 1 0.865000 0.000000 0.865000 ( 0.866000)
String#<< 2 0.852000 0.000000 0.852000 ( 0.852000)
Ruby 1.8.7p334 user system total real
Array#join 0.290000 0.010000 0.300000 ( 0.305367)
String#+ 1 7.620000 0.060000 7.680000 ( 7.682265)
String#+ 2 4.820000 0.130000 4.950000 ( 4.957258)
String#<< 1 1.290000 0.010000 1.300000 ( 1.304764)
String#<< 2 1.350000 0.010000 1.360000 ( 1.347226)
Rubinius (head) user system total real
Array#join 0.864054 0.008001 0.872055 ( 0.870757)
String#+ 1 9.636602 0.076005 9.712607 ( 9.714820)
String#+ 2 6.456403 0.064004 6.520407 ( 6.521633)
String#<< 1 2.196138 0.016001 2.212139 ( 2.212564)
String#<< 2 2.176136 0.012001 2.188137 ( 2.186298)
Here's the benchmarking code:
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("String#+ 1"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
s = WORDS.first
WORDS[1..-1].each{ |w| s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
Results obtained on Ubuntu under RVM. Results from Ruby 1.9.2p180 from RubyInstaller on Windows are similar to the 1.9.2 shown above.
What if your source of string bits is not an array?
TLDR; even when your source of string bits is not a giant array, you are still much better off constructing an array first and using join. + is not as bad in 2.1.1 as 1.9.3, but it's still bad (for this use case). 1.9.3 is actually slightly faster at both array.join & <<
Old hands at benchmarking may have looked at #Phrogz answer and thought "but but but..." because the join benchmark doesn't have the array enumeration overhead that the others do. I was curious to see how much difference it made, so...
WORDS = (1..1000).map{ rand(10000).to_s }
N = 1000
require 'benchmark'
Benchmark.bmbm do |x|
x.report("Array#join"){
N.times{ s = WORDS.join(', ') }
}
x.report("Array#join 2"){
N.times{
arr = Array.new(WORDS.length)
arr[0] = WORDS.first
WORDS[1..-1].each{ |w| arr << w; }
s = WORDS.join(', ')
}
}
x.report("String#+ 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", "; s += w }
}
}
x.report("String#+ 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first
WORDS[1..-1].each{ |w| arr << w; s += ", " + w }
}
}
x.report("String#<< 1"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", "; s << w }
}
}
x.report("String#<< 2"){
N.times{
arr = Array.new(WORDS.length)
s = WORDS.first.dup
WORDS[1..-1].each{ |w| arr << w; s << ", " << w }
}
}
x.report("String#<< 2 A"){
N.times{
s = WORDS.first.dup
WORDS[1..-1].each{ |w| s << ", " << w }
}
}
end
small words, ruby 2.1.1
user system total real
Array#join 0.130000 0.000000 0.130000 ( 0.128281)
Array#join 2 0.220000 0.000000 0.220000 ( 0.219588)
String#+ 1 1.720000 0.770000 2.490000 ( 2.478555)
String#+ 2 1.040000 0.370000 1.410000 ( 1.407190)
String#<< 1 0.370000 0.000000 0.370000 ( 0.371125)
String#<< 2 0.360000 0.000000 0.360000 ( 0.360161)
String#<< 2 A 0.310000 0.000000 0.310000 ( 0.318130)
small words, ruby 2.1.1
user system total real
Array#join 0.090000 0.000000 0.090000 ( 0.092072)
Array#join 2 0.180000 0.000000 0.180000 ( 0.180423)
String#+ 1 3.400000 0.750000 4.150000 ( 4.149934)
String#+ 2 1.740000 0.370000 2.110000 ( 2.122511)
String#<< 1 0.360000 0.000000 0.360000 ( 0.359707)
String#<< 2 0.340000 0.000000 0.340000 ( 0.343233)
String#<< 2 A 0.300000 0.000000 0.300000 ( 0.297420)
I was also curious how the benchmark would be affected by string bits that are (sometimes) longer than 23 characters so I reran with:
WORDS = (1..1000).map{ rand(100000).to_s * (rand(15)+1) }
as I expected, the impact on + was quite significant, but I was pleasantly surprised that it had very little impact on join or <<
words often longer than 23 chars, ruby 2.1.1
user system total real
Array#join 0.150000 0.000000 0.150000 ( 0.152846)
Array#join 2 0.230000 0.010000 0.240000 ( 0.231272)
String#+ 1 7.450000 5.490000 12.940000 ( 12.936776)
String#+ 2 4.200000 2.590000 6.790000 ( 6.791125)
String#<< 1 0.400000 0.000000 0.400000 ( 0.399452)
String#<< 2 0.380000 0.010000 0.390000 ( 0.389791)
String#<< 2 A 0.340000 0.000000 0.340000 ( 0.341099)
words often longer than 23 chars, ruby 1.9.3
user system total real
Array#join 0.130000 0.010000 0.140000 ( 0.132957)
Array#join 2 0.220000 0.000000 0.220000 ( 0.220181)
String#+ 1 20.060000 5.230000 25.290000 ( 25.293366)
String#+ 2 9.750000 2.670000 12.420000 ( 12.425229)
String#<< 1 0.390000 0.000000 0.390000 ( 0.397733)
String#<< 2 0.390000 0.000000 0.390000 ( 0.390540)
String#<< 2 A 0.330000 0.000000 0.330000 ( 0.333791)

How to sort an array in descending order in Ruby

I have an array of hashes:
[
{ :foo => 'foo', :bar => 2 },
{ :foo => 'foo', :bar => 3 },
{ :foo => 'foo', :bar => 5 },
]
I am trying to sort this array in descending order according to the value of :bar in each hash.
I am using sort_by to sort above array:
a.sort_by { |h| h[:bar] }
However, this sorts the array in ascending order. How do I make it sort in descending order?
One solution was to do following:
a.sort_by { |h| -h[:bar] }
But that negative sign does not seem appropriate.
It's always enlightening to do a benchmark on the various suggested answers. Here's what I found out:
#!/usr/bin/ruby
require 'benchmark'
ary = []
1000.times {
ary << {:bar => rand(1000)}
}
n = 500
Benchmark.bm(20) do |x|
x.report("sort") { n.times { ary.sort{ |a,b| b[:bar] <=> a[:bar] } } }
x.report("sort reverse") { n.times { ary.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse } }
x.report("sort_by -a[:bar]") { n.times { ary.sort_by{ |a| -a[:bar] } } }
x.report("sort_by a[:bar]*-1") { n.times { ary.sort_by{ |a| a[:bar]*-1 } } }
x.report("sort_by.reverse!") { n.times { ary.sort_by{ |a| a[:bar] }.reverse } }
end
user system total real
sort 3.960000 0.010000 3.970000 ( 3.990886)
sort reverse 4.040000 0.000000 4.040000 ( 4.038849)
sort_by -a[:bar] 0.690000 0.000000 0.690000 ( 0.692080)
sort_by a[:bar]*-1 0.700000 0.000000 0.700000 ( 0.699735)
sort_by.reverse! 0.650000 0.000000 0.650000 ( 0.654447)
I think it's interesting that #Pablo's sort_by{...}.reverse! is fastest. Before running the test I thought it would be slower than "-a[:bar]" but negating the value turns out to take longer than it does to reverse the entire array in one pass. It's not much of a difference, but every little speed-up helps.
Please note that these results are different in Ruby 1.9
Here are results for Ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]:
user system total real
sort 1.340000 0.010000 1.350000 ( 1.346331)
sort reverse 1.300000 0.000000 1.300000 ( 1.310446)
sort_by -a[:bar] 0.430000 0.000000 0.430000 ( 0.429606)
sort_by a[:bar]*-1 0.420000 0.000000 0.420000 ( 0.414383)
sort_by.reverse! 0.400000 0.000000 0.400000 ( 0.401275)
These are on an old MacBook Pro. Newer, or faster machines, will have lower values, but the relative differences will remain.
Here's a bit updated version on newer hardware and the 2.1.1 version of Ruby:
#!/usr/bin/ruby
require 'benchmark'
puts "Running Ruby #{RUBY_VERSION}"
ary = []
1000.times {
ary << {:bar => rand(1000)}
}
n = 500
puts "n=#{n}"
Benchmark.bm(20) do |x|
x.report("sort") { n.times { ary.dup.sort{ |a,b| b[:bar] <=> a[:bar] } } }
x.report("sort reverse") { n.times { ary.dup.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse } }
x.report("sort_by -a[:bar]") { n.times { ary.dup.sort_by{ |a| -a[:bar] } } }
x.report("sort_by a[:bar]*-1") { n.times { ary.dup.sort_by{ |a| a[:bar]*-1 } } }
x.report("sort_by.reverse") { n.times { ary.dup.sort_by{ |a| a[:bar] }.reverse } }
x.report("sort_by.reverse!") { n.times { ary.dup.sort_by{ |a| a[:bar] }.reverse! } }
end
# >> Running Ruby 2.1.1
# >> n=500
# >> user system total real
# >> sort 0.670000 0.000000 0.670000 ( 0.667754)
# >> sort reverse 0.650000 0.000000 0.650000 ( 0.655582)
# >> sort_by -a[:bar] 0.260000 0.010000 0.270000 ( 0.255919)
# >> sort_by a[:bar]*-1 0.250000 0.000000 0.250000 ( 0.258924)
# >> sort_by.reverse 0.250000 0.000000 0.250000 ( 0.245179)
# >> sort_by.reverse! 0.240000 0.000000 0.240000 ( 0.242340)
New results running the above code using Ruby 2.2.1 on a more recent Macbook Pro. Again, the exact numbers aren't important, it's their relationships:
Running Ruby 2.2.1
n=500
user system total real
sort 0.650000 0.000000 0.650000 ( 0.653191)
sort reverse 0.650000 0.000000 0.650000 ( 0.648761)
sort_by -a[:bar] 0.240000 0.010000 0.250000 ( 0.245193)
sort_by a[:bar]*-1 0.240000 0.000000 0.240000 ( 0.240541)
sort_by.reverse 0.230000 0.000000 0.230000 ( 0.228571)
sort_by.reverse! 0.230000 0.000000 0.230000 ( 0.230040)
Updated for Ruby 2.7.1 on a Mid-2015 MacBook Pro:
Running Ruby 2.7.1
n=500
user system total real
sort 0.494707 0.003662 0.498369 ( 0.501064)
sort reverse 0.480181 0.005186 0.485367 ( 0.487972)
sort_by -a[:bar] 0.121521 0.003781 0.125302 ( 0.126557)
sort_by a[:bar]*-1 0.115097 0.003931 0.119028 ( 0.122991)
sort_by.reverse 0.110459 0.003414 0.113873 ( 0.114443)
sort_by.reverse! 0.108997 0.001631 0.110628 ( 0.111532)
...the reverse method doesn't actually return a reversed array - it returns an enumerator that just starts at the end and works backwards.
The source for Array#reverse is:
static VALUE
rb_ary_reverse_m(VALUE ary)
{
long len = RARRAY_LEN(ary);
VALUE dup = rb_ary_new2(len);
if (len > 0) {
const VALUE *p1 = RARRAY_CONST_PTR_TRANSIENT(ary);
VALUE *p2 = (VALUE *)RARRAY_CONST_PTR_TRANSIENT(dup) + len - 1;
do *p2-- = *p1++; while (--len > 0);
}
ARY_SET_LEN(dup, RARRAY_LEN(ary));
return dup;
}
do *p2-- = *p1++; while (--len > 0); is copying the pointers to the elements in reverse order if I remember my C correctly, so the array is reversed.
Just a quick thing, that denotes the intent of descending order.
descending = -1
a.sort_by { |h| h[:bar] * descending }
(Will think of a better way in the mean time) ;)
a.sort_by { |h| h[:bar] }.reverse!
You could do:
a.sort{|a,b| b[:bar] <=> a[:bar]}
I see that we have (beside others) basically two options:
a.sort_by { |h| -h[:bar] }
and
a.sort_by { |h| h[:bar] }.reverse
While both ways give you the same result when your sorting key is unique, keep in mind that the reverse way will reverse the order of keys that are equal.
Example:
a = [{foo: 1, bar: 1},{foo: 2,bar: 1}]
a.sort_by {|h| -h[:bar]}
=> [{:foo=>1, :bar=>1}, {:foo=>2, :bar=>1}]
a.sort_by {|h| h[:bar]}.reverse
=> [{:foo=>2, :bar=>1}, {:foo=>1, :bar=>1}]
While you often don't need to care about this, sometimes you do. To avoid such behavior you could introduce a second sorting key (that for sure needs to be unique at least for all items that have the same sorting key):
a.sort_by {|h| [-h[:bar],-h[:foo]]}
=> [{:foo=>2, :bar=>1}, {:foo=>1, :bar=>1}]
a.sort_by {|h| [h[:bar],h[:foo]]}.reverse
=> [{:foo=>2, :bar=>1}, {:foo=>1, :bar=>1}]
What about:
a.sort {|x,y| y[:bar]<=>x[:bar]}
It works!!
irb
>> a = [
?> { :foo => 'foo', :bar => 2 },
?> { :foo => 'foo', :bar => 3 },
?> { :foo => 'foo', :bar => 5 },
?> ]
=> [{:bar=>2, :foo=>"foo"}, {:bar=>3, :foo=>"foo"}, {:bar=>5, :foo=>"foo"}]
>> a.sort {|x,y| y[:bar]<=>x[:bar]}
=> [{:bar=>5, :foo=>"foo"}, {:bar=>3, :foo=>"foo"}, {:bar=>2, :foo=>"foo"}]
For those folks who like to measure speed in IPS ;)
require 'benchmark/ips'
ary = []
1000.times {
ary << {:bar => rand(1000)}
}
Benchmark.ips do |x|
x.report("sort") { ary.sort{ |a,b| b[:bar] <=> a[:bar] } }
x.report("sort reverse") { ary.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse }
x.report("sort_by -a[:bar]") { ary.sort_by{ |a| -a[:bar] } }
x.report("sort_by a[:bar]*-1") { ary.sort_by{ |a| a[:bar]*-1 } }
x.report("sort_by.reverse!") { ary.sort_by{ |a| a[:bar] }.reverse }
x.compare!
end
And results:
Warming up --------------------------------------
sort 93.000 i/100ms
sort reverse 91.000 i/100ms
sort_by -a[:bar] 382.000 i/100ms
sort_by a[:bar]*-1 398.000 i/100ms
sort_by.reverse! 397.000 i/100ms
Calculating -------------------------------------
sort 938.530 (± 1.8%) i/s - 4.743k in 5.055290s
sort reverse 901.157 (± 6.1%) i/s - 4.550k in 5.075351s
sort_by -a[:bar] 3.814k (± 4.4%) i/s - 19.100k in 5.019260s
sort_by a[:bar]*-1 3.732k (± 4.3%) i/s - 18.706k in 5.021720s
sort_by.reverse! 3.928k (± 3.6%) i/s - 19.850k in 5.060202s
Comparison:
sort_by.reverse!: 3927.8 i/s
sort_by -a[:bar]: 3813.9 i/s - same-ish: difference falls within error
sort_by a[:bar]*-1: 3732.3 i/s - same-ish: difference falls within error
sort: 938.5 i/s - 4.19x slower
sort reverse: 901.2 i/s - 4.36x slower
Regarding the benchmark suite mentioned, these results also hold for sorted arrays.
sort_by/reverse it is:
# foo.rb
require 'benchmark'
NUM_RUNS = 1000
# arr = []
arr1 = 3000.times.map { { num: rand(1000) } }
arr2 = 3000.times.map { |n| { num: n } }.reverse
Benchmark.bm(20) do |x|
{ 'randomized' => arr1,
'sorted' => arr2 }.each do |label, arr|
puts '---------------------------------------------------'
puts label
x.report('sort_by / reverse') {
NUM_RUNS.times { arr.sort_by { |h| h[:num] }.reverse }
}
x.report('sort_by -') {
NUM_RUNS.times { arr.sort_by { |h| -h[:num] } }
}
end
end
And the results:
$: ruby foo.rb
user system total real
---------------------------------------------------
randomized
sort_by / reverse 1.680000 0.010000 1.690000 ( 1.682051)
sort_by - 1.830000 0.000000 1.830000 ( 1.830359)
---------------------------------------------------
sorted
sort_by / reverse 0.400000 0.000000 0.400000 ( 0.402990)
sort_by - 0.500000 0.000000 0.500000 ( 0.499350)
Simple Solution from ascending to descending and vice versa is:
STRINGS
str = ['ravi', 'aravind', 'joker', 'poker']
asc_string = str.sort # => ["aravind", "joker", "poker", "ravi"]
asc_string.reverse # => ["ravi", "poker", "joker", "aravind"]
DIGITS
digit = [234,45,1,5,78,45,34,9]
asc_digit = digit.sort # => [1, 5, 9, 34, 45, 45, 78, 234]
asc_digit.reverse # => [234, 78, 45, 45, 34, 9, 5, 1]

Resources