Sorting array of string by numbers - ruby

I want to sort an array like the following:
["10a","10b","9a","9b","8a","8b"]
When I call,
a = a.sort {|a,b| a <=> b}
it will sort like the following:
["10a","10b","8a","8b","9a","9b"]
The 10 is a string and is not handled as a number. When I first sort by integer and then by string, it will just do the same. Does anybody know how I can handle the 10 as a 10 without making it into an integer? That would mess up the letters a, b etc.

When I first sort by integer and then by string, it will just do the same.
That would have been my first instinct, and it seems to work perfectly:
%w[10a 10b 9a 9b 8a 8b].sort_by {|el| [el.to_i, el] }
# => ['8a', '8b', '9a', '9b', '10a', '10b']

I'd do something like this:
ary = ["10a","10b","9a","9b","8a","8b"]
sorted_ary = ary.sort_by{ |e|
/(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]
}
ary # => ["10a", "10b", "9a", "9b", "8a", "8b"]
sorted_ary # => ["8a", "8b", "9a", "9b", "10a", "10b"]
sorted_by is going to be faster than sort for this sort of problem. Because the value being sorted isn't a direct comparison and we need to dig into it to get the values to use for collation, a normal sort would have to do it multiple times for each element. Instead, using sort_by caches the computed value, and then sorts based on it.
/(?<digit>\d+)(?<alpha>\D+)/ =~ e isn't what you'll normally see for a regular expression. The named-captures ?<digit> and ?<alpha> define the names of local variables that can be accessed immediately, when used in that form.
[digit.to_i, alpha] returns an array consisting of the leading numeric convert to an integer, followed by the character. That array is then used for comparison by sort_by.
Benchmarking sort vs. sort_by using Fruity: I added some length to the array being sorted to push the routines a bit harder for better time resolution.
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 1000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test 2 times. Test will take about 1 second.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 19.999999999999996% ± 1.0%
# >> Running each test once. Test will take about 1 second.
# >> jorge_sort_by is faster than jorg_sort by 10.000000000000009% ± 1.0%
Ruby's sort_by uses a Schwartzian Transform, which can make a major difference in sort speed when dealing with objects where we have to compute the value to be sorted.
Could you run your benchmark for 100_000 instead of 1_000 in the definition of ARY?
require 'fruity'
ARY = (%w[10a 10b 9a 9b 8a 8b] * 100_000).shuffle
compare do
cary_to_i_sort_by { ARY.sort_by { |s| s.to_i(36) } }
cary_to_i_sort { ARY.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)} }
end
compare do
jorge_sort_by { ARY.sort_by {|el| [el.to_i, el] } }
jorg_sort { ARY.map {|el| [el.to_i, el] }.sort.map(&:last) }
end
# >> Running each test once. Test will take about 10 seconds.
# >> cary_to_i_sort_by is faster than cary_to_i_sort by 2x ± 1.0
# >> Running each test once. Test will take about 26 seconds.
# >> jorg_sort is similar to jorge_sort_by
The Wikepedia article has a good efficiency analysis and example that explains why sort_by is preferred for costly comparisons.
Ruby's sort_by documentation also covers this well.
I don't think the size of the array will make much difference. If anything, as the array size grows, if the calculation for the intermediate value is costly, sort_by will still be faster because of its caching. Remember, sort_by is all compiled code, whereas using a Ruby-script-based transform is subject to slower execution as the array is transformed, handed off to sort and then the original object is plucked from the sub-arrays. A larger array means it just has to be done more times.

▶ a = ["10a","10b","9a","9b","8a","8b"]
▶ a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
#=> [
# [0] "8a",
# [1] "8b",
# [2] "9a",
# [3] "9b",
# [4] "10a",
# [5] "10b"
#]
Hope it helps.

Two ways that don't use String#to_i (but rely on the assumption that each string consists of one or more digits followed by one lower case letter).
ary = ["10a","10b","9a","9b","8a","8b","100z", "96b"]
#1
mx = ary.map(&:size).max
ary.sort_by { |s| s.rjust(mx) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
#2
ary.sort_by { |s| s.to_i(36) }
#=> ["8a", "8b", "9a", "9b", "10a", "10b", "96b", "100z"]
Hmmm, I wonder if:
ary.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
or
ary.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
would be faster.

["10a","10b","9a","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9a", "9b", "10a", "10b"]
["10a3","10a4", "9", "9aa","9b","8a","8b"]
.sort_by{|s| s.split(/(\D+)/).map.with_index{|s, i| i.odd? ? s : s.to_i}}
#=> ["8a", "8b", "9", "9aa", "9b", "10a3", "10a4"]

Gentlemen, start your engines!
I decided to benchmark the various solutions that have been offered. One of the things I was curious about was the effect of converting sort_by solutions to sort solutions. For example, I compared my method
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
to
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
This always involves mapping the original array to the transformed values within the sort_by block, sorting that array, then mapping the results back to the elements in the original array (when that can be done).
I tried this sort_by-to-sort conversion with some of the methods that use sort_by. Not surprisingly, the conversion to sort was generally faster, though the amount of improvement varied quite a bit.
Methods compared
module Methods
def mudasobwa(a)
a.sort { |a,b| a.to_i == b.to_i ? a <=> b : a.to_i <=> b.to_i }
end
def jorg(a)
a.sort_by {|el| [el.to_i, el] }
end
def jorg_sort(a)
a.map {|el| [el.to_i, el] }.sort.map(&:last)
end
def the(a)
a.sort_by {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha] }
end
def the_sort(a)
a.map {|e| /(?<digit>\d+)(?<alpha>\D+)/ =~ e
[digit.to_i, alpha]}.sort.map {|d,a| d.to_s+a }
end
def engineer(a) a.sort_by { |s|
s.scan(/(\d+)(\D+)/).flatten.tap{ |a| a[0] = a[0].to_i } }
end
def sawa(a) a.sort_by { |s|
s.split(/(\D+)/).map.with_index { |s, i| i.odd? ? s : s.to_i } }
end
def cary_rjust(a)
mx = a.map(&:size).max
a.sort_by {|s| s.rjust(mx)}
end
def cary_rjust_sort(a)
mx = a.map(&:size).max
a.map { |s| s.rjust(mx) }.sort.map(&:lstrip)
end
def cary_to_i(a)
a.sort_by { |s| s.to_i(36) }
end
def cary_to_i_sort(a)
a.map { |s| s.to_i(36) }.sort.map {|i| i.to_s(36)}
end
end
include Methods
methods = Methods.instance_methods(false)
#=> [:mudasobwa, :jorg, :jorg_sort, :the, :the_sort,
# :cary_rjust, :cary_rjust_sort, :cary_to_i, :cary_to_i_sort]
Test data and helper
def test_data(n)
a = 10_000.times.to_a.map(&:to_s)
b = [*'a'..'z']
n.times.map { a.sample + b.sample }
end
def compute(m,a)
send(m,a)
end
Confirm methods return the same values
a = test_data(1000)
puts "All methods correct: #{methods.map { |m| compute(m,a) }.uniq.size == 1}"
Benchmark code
require 'benchmark'
indent = methods.map { |m| m.to_s.size }.max
n = 500_000
a = test_data(n)
puts "\nSort random array of size #{n}"
Benchmark.bm(indent) do |bm|
methods.each do |m|
bm.report m.to_s do
compute(m,a)
end
end
end
Test
Sort random array of size 500000
user system total real
mudasobwa 4.760000 0.000000 4.760000 ( 4.765170)
jorg 2.870000 0.020000 2.890000 ( 2.892359)
jorg_sort 2.980000 0.020000 3.000000 ( 3.010344)
the 9.040000 0.100000 9.140000 ( 9.160944)
the_sort 4.570000 0.090000 4.660000 ( 4.668146)
engineer 10.110000 0.070000 10.180000 ( 10.198117)
sawa 27.310000 0.160000 27.470000 ( 27.504958)
cary_rjust 1.080000 0.010000 1.090000 ( 1.087788)
cary_rjust_sort 0.740000 0.000000 0.740000 ( 0.746132)
cary_to_i 0.570000 0.000000 0.570000 ( 0.576570)
cary_to_i_sort 0.460000 0.020000 0.480000 ( 0.477372)
Addendum
#theTinMan demonstrated that the comparisons between the sort_by and sort methods is sensitive to the choice of test data. Using the data he used:
def test_data(n)
(%w[10a 10b 9a 9b 8a 8b] * (n/6)).shuffle
end
I got these results:
Sort random array of size 500000
user system total real
mudasobwa 0.620000 0.000000 0.620000 ( 0.622566)
jorg 0.620000 0.010000 0.630000 ( 0.636018)
jorg_sort 0.640000 0.010000 0.650000 ( 0.638493)
the 8.790000 0.090000 8.880000 ( 8.886725)
the_sort 2.670000 0.070000 2.740000 ( 2.743085)
engineer 3.150000 0.040000 3.190000 ( 3.184534)
sawa 3.460000 0.040000 3.500000 ( 3.506875)
cary_rjust 0.360000 0.010000 0.370000 ( 0.367094)
cary_rjust_sort 0.480000 0.010000 0.490000 ( 0.499956)
cary_to_i 0.190000 0.010000 0.200000 ( 0.187136)
cary_to_i_sort 0.200000 0.000000 0.200000 ( 0.203509)
Notice that the absolute times are also affected.
Can anyone explain the reason for the difference in the benchmarks?

Related

How to "sum" enumerables in Ruby

Is it possible to "sum" diverse enumerables when they are string mode?
per example like this? (well, I know this doesn't work.)
(( 'a'..'z') + ('A'..'Z')).to_a
note:
I am asking about getting an array of string chars from a to z and from A to Z all together.
About string mode I mean that the chars will appears like ["a", "b", ..... , "Y", "Z"]
You can use the splat operator:
[*('A'..'Z'), *( 'a'..'z')]
Like this?
[('a'..'z'), ('A'..'Z')].map(&:to_a).flatten
Or this?
('a'..'z').to_a + ('A'..'Z').to_a
Not answer but benchmarking of answers:
require 'benchmark'
n = 100000
Benchmark.bm do |x|
x.report("flat_map : ") { n.times do ; [('A'..'Z'), ('a'..'z')].flat_map(&:to_a) ; end }
x.report("map.flatten: ") { n.times do ; [('A'..'Z'), ('a'..'z')].map(&:to_a).flatten ; end }
x.report("splat : ") { n.times do ; [*('A'..'Z'), *( 'a'..'z')] ; end }
x.report("concat arr : ") { n.times do ; ('A'..'Z').to_a + ('a'..'z').to_a ; end }
end
Result:
#=> user system total real
#=> flat_map : 0.858000 0.000000 0.858000 ( 0.883630)
#=> map.flatten: 1.170000 0.016000 1.186000 ( 1.200421)
#=> splat : 0.858000 0.000000 0.858000 ( 0.857728)
#=> concat arr : 0.812000 0.000000 0.812000 ( 0.822861)
Since you want the elements from the first Range to be at the end of the output Array and the elements of the last Range to be at the beginning of the output Array, but still keep the same order within each Range, I would do it like this (which also generalizes nicely to more than two Enumerables):
def backwards_concat(*enums)
enums.reverse.map(&:to_a).inject([], &:concat)
end
backwards_concat('A'..'Z', 'a'..'z')
['a'..'z'].concat(['A'..'Z'])
This is probably the quickest way to do this.
About string mode I mean that the chars will appears like ["a", "b", ..... , "Y", "Z"]
To answer the above:
Array('a'..'z').concat Array('A'..'Z')

Find most common hash value

I have the following hash:
h = Hash["a","foo", "b","bar", "c","foo"]
I would like to return the most common value, in this case foo. What is the most efficient way to do this?
Similar to this question, but adapted to hashes.
You can get the values as an array and then just plug into the solution you linked.
h.values.group_by { |e| e }.values.max_by(&:size).first
#=> foo
We can do this:
h = Hash["a","foo", "b","bar", "c","foo", "d", "bar", 'e', 'foobar']
p h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
# >> "foo"
UPDATE(slower than my first solution)
h = Hash["a","foo", "b","bar", "c","foo"]
h.group_by { |_,v| v }.max_by{|_,v| v.size}.first
# >> "foo"
Benchmark
require 'benchmark'
def seanny123(h)
h.values.group_by { |e| e }.values.max_by(&:size).first
end
def stefan(h)
frequencies = h.each_with_object(Hash.new(0)) { |(k,v), h| h[v] += 1 }
value, count = frequencies.max_by { |k, v| v }
value
end
def yevgeniy_anfilofyev(h)
h.group_by{|(_,v)| v }.sort_by{|(_,v)| v.size }[-1][0]
end
def acts_as_geek(h)
v = h.values
max = v.map {|i| v.count(i)}.max
v.select {|i| v.count(i) == max}.uniq
end
def squiguy(h)
v = h.values
v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
end
def babai1(h)
h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
end
def babai2(h)
h.group_by { |_,v| v }.max_by{|_,v| v.size}.first
end
def benchmark(h,n)
Benchmark.bm(20) do |x|
x.report("Seanny123") { n.times { seanny123(h) } }
x.report("Stefan") { n.times { stefan(h) } }
x.report("Yevgeniy Anfilofyev") { n.times { yevgeniy_anfilofyev(h) } }
x.report("acts_as_geek") { n.times { acts_as_geek(h) } }
x.report("squiguy") { n.times { squiguy(h) } }
x.report("Babai1") { n.times { babai1(h) } }
x.report("Babai2") { n.times { babai2(h) } }
end
end
n = 10
h = {}
1000.times do |i|
h["a#{i}"] = "foo"
h["b#{i}"] = "bar"
h["c#{i}"] = "foo"
end
benchmark(h, n)
Result:-
user system total real
Seanny123 0.020000 0.000000 0.020000 ( 0.015550)
Stefan 0.040000 0.000000 0.040000 ( 0.044666)
Yevgeniy Anfilofyev 0.020000 0.000000 0.020000 ( 0.023162)
acts_as_geek 16.160000 0.000000 16.160000 ( 16.223582)
squiguy 15.740000 0.000000 15.740000 ( 15.768917)
Babai1 0.020000 0.000000 0.020000 ( 0.015430)
Babai2 0.020000 0.000000 0.020000 ( 0.025711)
You can calculate the frequencies with Enumerable#inject:
frequencies = h.inject(Hash.new(0)) { |h, (k,v)| h[v] += 1 ; h }
#=> {"foo"=>2, "bar"=>1}
Or Enumerable#each_with_object:
frequencies = h.each_with_object(Hash.new(0)) { |(k,v), h| h[v] += 1 }
#=> {"foo"=>2, "bar"=>1}
And the maximum with Enumerable#max_by:
value, count = frequencies.max_by { |k, v| v }
#=> ["foo", 2]
value
#=> "foo"
Benchmarks
With a small hash:
n = 100000
h = Hash["a","foo", "b","bar", "c","foo"]
benchmark(h, n)
Results:
user system total real
Seanny123 0.220000 0.000000 0.220000 ( 0.222342)
Stefan 0.260000 0.000000 0.260000 ( 0.263583)
Yevgeniy Anfilofyev 0.350000 0.000000 0.350000 ( 0.341685)
acts_as_geek 0.300000 0.000000 0.300000 ( 0.306601)
squiguy 0.140000 0.000000 0.140000 ( 0.139141)
Babai 0.220000 0.000000 0.220000 ( 0.218616)
With a large hash:
n = 10
h = {}
1000.times do |i|
h["a#{i}"] = "foo"
h["b#{i}"] = "bar"
h["c#{i}"] = "foo"
end
benchmark(h, n)
Results:
user system total real
Seanny123 0.060000 0.000000 0.060000 ( 0.059068)
Stefan 0.100000 0.000000 0.100000 ( 0.100760)
Yevgeniy Anfilofyev 0.080000 0.000000 0.080000 ( 0.080988)
acts_as_geek 97.020000 0.020000 97.040000 ( 97.072220)
squiguy 97.480000 0.020000 97.500000 ( 97.535130)
Babai 0.050000 0.000000 0.050000 ( 0.058653)
Benchmark code:
require 'benchmark'
def seanny123(h)
h.values.group_by { |e| e }.values.max_by(&:size).first
end
def stefan(h)
frequencies = h.each_with_object(Hash.new(0)) { |(k,v), h| h[v] += 1 }
value, count = frequencies.max_by { |k, v| v }
value
end
def yevgeniy_anfilofyev(h)
h.group_by{|(_,v)| v }.sort_by{|(_,v)| v.size }[-1][0]
end
def acts_as_geek(h)
v = h.values
max = v.map {|i| v.count(i)}.max
v.select {|i| v.count(i) == max}.uniq
end
def squiguy(h)
v = h.values
v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
end
def babai(h)
h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
end
def benchmark(h,n)
Benchmark.bm(20) do |x|
x.report("Seanny123") { n.times { seanny123(h) } }
x.report("Stefan") { n.times { stefan(h) } }
x.report("Yevgeniy Anfilofyev") { n.times { yevgeniy_anfilofyev(h) } }
x.report("acts_as_geek") { n.times { acts_as_geek(h) } }
x.report("squiguy") { n.times { squiguy(h) } }
x.report("Babai") { n.times { babai(h) } }
end
end
With group_by but without values and with sort_by:
h.group_by{|(_,v)| v }
.sort_by{|(_,v)| v.size }[-1][0]
Update: Don't use my solution. My computer had to use a lot of clock cycles to compute it. Didn't know a functional approach would be so slow here.
How about using Enumerable#reduce?
h = Hash["a","foo", "b","bar", "c","foo"]
v = h.values
most = v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
p most
As a caveat, this will only return one value if two or more share the same "max" count value in the hash. If you care about all the "max" values, this is not a solution to use.
Here is a benchmark.
#!/usr/bin/env ruby
require 'benchmark'
MULT = 10
arr = []
letters = ("a".."z").to_a
10000.times do
arr << letters.sample
end
10000.times do |i|
h[i] = arr[i]
end
Benchmark.bm do |rep|
rep.report("Seanny123") {
MULT.times do
h.values.group_by { |e| e }.values.max_by(&:size).first
end
}
rep.report("squiguy") {
MULT.times do
v = h.values
most = v.reduce do |memo, val|
v.count(memo) > v.count(val) ? memo : val
end
end
}
rep.report("acts_as_geek") {
MULT.times do
v = h.values
max = v.map {|i| v.count(i)}.max
v.select {|i| v.count(i) == max}.uniq
end
}
rep.report("Yevgeniy Anfilofyev") {
MULT.times do
h.group_by{|(_,v)| v }
.sort_by{|(_,v)| v.size }[-1][0]
end
}
rep.report("Stefan") {
MULT.times do
frequencies = h.inject(Hash.new(0)) { |h, (k,v)| h[v] += 1 ; h }
value, count = frequencies.max_by { |k, v| v }
end
}
rep.report("Babai") {
MULT.times do
h.values.group_by { |e| e }.max_by{|_,v| v.size}.first
end
}
end

How to merge array of hashes to get hash of arrays of values

This is the opposite of Turning a Hash of Arrays into an Array of Hashes in Ruby.
Elegantly and/or efficiently turn an array of hashes into a hash where the values are arrays of all values:
hs = [
{ a:1, b:2 },
{ a:3, c:4 },
{ b:5, d:6 }
]
collect_values( hs )
#=> { :a=>[1,3], :b=>[2,5], :c=>[4], :d=>[6] }
This terse code almost works, but fails to create an array when there are no duplicates:
def collect_values( hashes )
hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }
end
collect_values( hs )
#=> { :a=>[1,3], :b=>[2,5], :c=>4, :d=>6 }
This code works, but can you write a better version?
def collect_values( hashes )
# Requires Ruby 1.8.7+ for Object#tap
Hash.new{ |h,k| h[k]=[] }.tap do |result|
hashes.each{ |h| h.each{ |k,v| result[k]<<v } }
end
end
Solutions that only work in Ruby 1.9 are acceptable, but should be noted as such.
Here are the results of benchmarking the various answers below (and a few more of my own), using three different arrays of hashes:
one where each hash has distinct keys, so no merging ever occurs:
[{:a=>1}, {:b=>2}, {:c=>3}, {:d=>4}, {:e=>5}, {:f=>6}, {:g=>7}, ...]
one where every hash has the same key, so maximum merging occurs:
[{:a=>1}, {:a=>2}, {:a=>3}, {:a=>4}, {:a=>5}, {:a=>6}, {:a=>7}, ...]
and one that is a mix of unique and shared keys:
[{:c=>1}, {:d=>1}, {:c=>2}, {:f=>1}, {:c=>1, :d=>1}, {:h=>1}, {:c=>3}, ...]
user system total real
Phrogz 2a 0.577000 0.000000 0.577000 ( 0.576000)
Phrogz 2b 0.624000 0.000000 0.624000 ( 0.620000)
Glenn 1 0.640000 0.000000 0.640000 ( 0.641000)
Phrogz 1 0.671000 0.000000 0.671000 ( 0.668000)
Michael 1 0.702000 0.000000 0.702000 ( 0.700000)
Michael 2 0.717000 0.000000 0.717000 ( 0.726000)
Glenn 2 0.765000 0.000000 0.765000 ( 0.764000)
fl00r 0.827000 0.000000 0.827000 ( 0.836000)
sawa 0.874000 0.000000 0.874000 ( 0.868000)
Tokland 1 0.873000 0.000000 0.873000 ( 0.876000)
Tokland 2 1.077000 0.000000 1.077000 ( 1.073000)
Phrogz 3 2.106000 0.093000 2.199000 ( 2.209000)
The fastest code is this method that I added:
def collect_values(hashes)
{}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end
I've accepted "glenn mcdonald's answer" as it was competitive in terms of speed, reasonably terse, but (most importantly) because it pointed out the danger of using a Hash with a self-modifying default proc for convenient construction, as this may introduce bad changes when the user is indexing it later on.
Finally, here's the benchmark code, in case you want to run your own comparisons:
require 'prime' # To generate the third hash
require 'facets' # For tokland1's map_by
AZSYMBOLS = (:a..:z).to_a
TESTS = {
'26 Distinct Hashes' => AZSYMBOLS.zip(1..26).map{|a| Hash[*a] },
'26 Same-Key Hashes' => ([:a]*26).zip(1..26).map{|a| Hash[*a] },
'26 Mixed-Keys Hashes' => (2..27).map do |i|
factors = i.prime_division.transpose
Hash[AZSYMBOLS.values_at(*factors.first).zip(factors.last)]
end
}
def phrogz1(hashes)
Hash.new{ |h,k| h[k]=[] }.tap do |result|
hashes.each{ |h| h.each{ |k,v| result[k]<<v } }
end
end
def phrogz2a(hashes)
{}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end
def phrogz2b(hashes)
hashes.each_with_object({}){ |h,r| h.each{ |k,v| (r[k]||=[]) << v } }
end
def phrogz3(hashes)
result = hashes.inject({}){ |a,b| a.merge(b){ |_,x,y| [*x,*y] } }
result.each{ |k,v| result[k] = [v] unless v.is_a? Array }
end
def glenn1(hs)
hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}
end
def glenn2(hs)
hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}
end
def fl00r(hs)
h = Hash.new{|h,k| h[k]=[]}
hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}
h
end
def sawa(a)
a.map(&:to_a).flatten(1).group_by{|k,v| k}.each_value{|v| v.map!{|k,v| v}}
end
def michael1(hashes)
h = Hash.new{|h,k| h[k]=[]}
hashes.each_with_object(h) do |h, result|
h.each{ |k, v| result[k] << v }
end
end
def michael2(hashes)
h = Hash.new{|h,k| h[k]=[]}
hashes.inject(h) do |result, h|
h.each{ |k, v| result[k] << v }
result
end
end
def tokland1(hs)
hs.map(&:to_a).flatten(1).map_by{ |k, v| [k, v] }
end
def tokland2(hs)
Hash[hs.map(&:to_a).flatten(1).group_by(&:first).map{ |k, vs|
[k, vs.map{|o|o[1]}]
}]
end
require 'benchmark'
N = 10_000
Benchmark.bm do |x|
x.report('Phrogz 2a'){ TESTS.each{ |n,h| N.times{ phrogz2a(h) } } }
x.report('Phrogz 2b'){ TESTS.each{ |n,h| N.times{ phrogz2b(h) } } }
x.report('Glenn 1 '){ TESTS.each{ |n,h| N.times{ glenn1(h) } } }
x.report('Phrogz 1 '){ TESTS.each{ |n,h| N.times{ phrogz1(h) } } }
x.report('Michael 1'){ TESTS.each{ |n,h| N.times{ michael1(h) } } }
x.report('Michael 2'){ TESTS.each{ |n,h| N.times{ michael2(h) } } }
x.report('Glenn 2 '){ TESTS.each{ |n,h| N.times{ glenn2(h) } } }
x.report('fl00r '){ TESTS.each{ |n,h| N.times{ fl00r(h) } } }
x.report('sawa '){ TESTS.each{ |n,h| N.times{ sawa(h) } } }
x.report('Tokland 1'){ TESTS.each{ |n,h| N.times{ tokland1(h) } } }
x.report('Tokland 2'){ TESTS.each{ |n,h| N.times{ tokland2(h) } } }
x.report('Phrogz 3 '){ TESTS.each{ |n,h| N.times{ phrogz3(h) } } }
end
Take your pick:
hs.reduce({}) {|h,pairs| pairs.each {|k,v| (h[k] ||= []) << v}; h}
hs.map(&:to_a).flatten(1).reduce({}) {|h,(k,v)| (h[k] ||= []) << v; h}
I'm strongly against messing with the defaults for hashes, as the other suggestions do, because then checking for a value modifies the hash, which seems very wrong to me.
h = Hash.new{|h,k| h[k]=[]}
hs.map(&:to_a).flatten(1).each{|v| h[v[0]] << v[1]}
How's this?
def collect_values(hashes)
h = Hash.new{|h,k| h[k]=[]}
hashes.each_with_object(h) do |h, result|
h.each{ |k, v| result[k] << v }
end
end
Edit - Also possible with inject, but IMHO not as nice:
def collect_values( hashes )
h = Hash.new{|h,k| h[k]=[]}
hashes.inject(h) do |result, h|
h.each{ |k, v| result[k] << v }
result
end
end
Same with some other answers using map(&:to_a).flatten(1). The problem is how to modify the values of the hash. I used the fact that arrays are mutable.
def collect_values a
a.map(&:to_a).flatten(1).group_by{|k, v| k}.
each_value{|v| v.map!{|k, v| v}}
end
Facet's Enumerable#map_by comes in handy for these cases. This implementation will be no doubt slower than others, but modular and compact code is always easier to maintain:
require 'facets'
hs.flat_map(&:to_a).map_by { |k, v| [k, v] }
#=> {:b=>[2, 5], :d=>[6], :c=>[4], :a=>[1, 3]
I thought it might be interesting to compare the winner:
def phrogz2a(hashes)
{}.tap{ |r| hashes.each{ |h| h.each{ |k,v| (r[k]||=[]) << v } } }
end
with a slight variant:
def phrogz2ai(hashes)
Hash.new {|h,k| h[k]=[]}.tap {|r| hashes.each {|h| h.each {|k,v| r[k] << v}}}
end
because one can often employ either approach (typically to create an empty array or hash).
Using Phrogz's benchmark code, here's how they compare here:
user system total real
Phrogz 2a 0.440000 0.010000 0.450000 ( 0.444435)
Phrogz 2ai 0.580000 0.010000 0.590000 ( 0.580248)
What about this one?
hs.reduce({}, :merge)
shortest! But performance is pretty bad:
user system total real
Phrogz 2a 0.240000 0.010000 0.250000 ( 0.247337)
Phrogz 2b 0.280000 0.000000 0.280000 ( 0.274985)
Glenn 1 0.290000 0.000000 0.290000 ( 0.290370)
Phrogz 1 0.310000 0.000000 0.310000 ( 0.315548)
Michael 1 0.360000 0.000000 0.360000 ( 0.356760)
Michael 2 0.360000 0.000000 0.360000 ( 0.360119)
Glenn 2 0.370000 0.000000 0.370000 ( 0.369354)
fl00r 0.390000 0.000000 0.390000 ( 0.385883)
sawa 0.410000 0.000000 0.410000 ( 0.408190)
Tokland 1 0.410000 0.000000 0.410000 ( 0.410097)
Tokland 2 0.490000 0.000000 0.490000 ( 0.497325)
Ich 1.410000 0.000000 1.410000 ( 1.413176) # <<-- new
Phrogz 3 1.760000 0.010000 1.770000 ( 1.762979)
[{'a' => 1}, {'b' => 2}, {'c' => 3}].reduce Hash.new, :merge

Check if a string variable is in a set of strings

Which one is better:
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
x =~ /abc|def|ghi/
?
Which one is better? The question can't be easily answered, because they don't all do the same things.
x == 'abc' || x == 'def' || x == 'ghi'
%w(abc def ghi).include? x
compare x against fixed strings for equality. x has to be one of those values. Between those two I tend to go with the second because it's easier to maintain. Imagine what it would look like if you had to compare against twenty, fifty or one hundred strings.
The third test:
x ~= /abc|def|ghi/
matches substrings:
x = 'xyzghi'
(x =~ /abc|def|ghi/) # => 3
so it isn't the same as the first two.
EDIT: There are some things in the benchmarks done by nash that I'd do differently. Using Ruby 1.9.2-p180 on a MacBook Pro, this tests 1,000,000 loops and compares the results of anchoring the regex, using grouping, along with not splitting the %w() array each time through the loop:
require 'benchmark'
str = "test"
n = 1_000_000
Benchmark.bm do |x|
x.report { n.times { str == 'abc' || str == 'def' || str == 'ghi' } }
x.report { n.times { %w(abc def ghi).include? str } }
x.report { ary = %w(abc def ghi); n.times { ary.include? str } }
x.report { n.times { str =~ /abc|def|ghi/ } }
x.report { n.times { str =~ /^abc|def|ghi$/ } }
x.report { n.times { str =~ /^(abc|def|ghi)$/ } }
x.report { n.times { str =~ /^(?:abc|def|ghi)$/ } }
x.report { n.times { str =~ /\b(?:abc|def|ghi)\b/ } }
end
# >> user system total real
# >> 1.160000 0.000000 1.160000 ( 1.165331)
# >> 1.920000 0.000000 1.920000 ( 1.920120)
# >> 0.990000 0.000000 0.990000 ( 0.983921)
# >> 1.070000 0.000000 1.070000 ( 1.068140)
# >> 1.050000 0.010000 1.060000 ( 1.054852)
# >> 1.060000 0.000000 1.060000 ( 1.063909)
# >> 1.060000 0.000000 1.060000 ( 1.050813)
# >> 1.050000 0.000000 1.050000 ( 1.056147)
The first might be a tad quicker, since there are no method calls and your doing straight string comparisons, but its also probably the least readable and least maintainable.
The second is definitely the grooviest, and the ruby way of going about it. It's the most maintainable, and probably the best to read.
The last way uses old school perl regex syntax. Fairly fast, not as annoying as the first to maintain, fairly readable.
I guess it depends what you mean by "better".
some benchmarks:
require 'benchmark'
str = "test"
Benchmark.bm do |x|
x.report {100000.times {if str == 'abc' || str == 'def' || str == 'ghi'; end}}
x.report {100000.times {if %w(abc def ghi).include? str; end}}
x.report {100000.times {if str =~ /abc|def|ghi/; end}}
end
user system total real
0.250000 0.000000 0.250000 ( 0.251014)
0.374000 0.000000 0.374000 ( 0.402023)
0.265000 0.000000 0.265000 ( 0.259014)
So as you can see the first way works faster then other. And the longer str, the slower the last way works:
str = "testasdasdasdasdasddkmfskjndfbdkjngdjgndksnfg"
user system total real
0.234000 0.000000 0.234000 ( 0.248014)
0.405000 0.000000 0.405000 ( 0.403023)
1.046000 0.000000 1.046000 ( 1.038059)

How to sort an array in descending order in Ruby

I have an array of hashes:
[
{ :foo => 'foo', :bar => 2 },
{ :foo => 'foo', :bar => 3 },
{ :foo => 'foo', :bar => 5 },
]
I am trying to sort this array in descending order according to the value of :bar in each hash.
I am using sort_by to sort above array:
a.sort_by { |h| h[:bar] }
However, this sorts the array in ascending order. How do I make it sort in descending order?
One solution was to do following:
a.sort_by { |h| -h[:bar] }
But that negative sign does not seem appropriate.
It's always enlightening to do a benchmark on the various suggested answers. Here's what I found out:
#!/usr/bin/ruby
require 'benchmark'
ary = []
1000.times {
ary << {:bar => rand(1000)}
}
n = 500
Benchmark.bm(20) do |x|
x.report("sort") { n.times { ary.sort{ |a,b| b[:bar] <=> a[:bar] } } }
x.report("sort reverse") { n.times { ary.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse } }
x.report("sort_by -a[:bar]") { n.times { ary.sort_by{ |a| -a[:bar] } } }
x.report("sort_by a[:bar]*-1") { n.times { ary.sort_by{ |a| a[:bar]*-1 } } }
x.report("sort_by.reverse!") { n.times { ary.sort_by{ |a| a[:bar] }.reverse } }
end
user system total real
sort 3.960000 0.010000 3.970000 ( 3.990886)
sort reverse 4.040000 0.000000 4.040000 ( 4.038849)
sort_by -a[:bar] 0.690000 0.000000 0.690000 ( 0.692080)
sort_by a[:bar]*-1 0.700000 0.000000 0.700000 ( 0.699735)
sort_by.reverse! 0.650000 0.000000 0.650000 ( 0.654447)
I think it's interesting that #Pablo's sort_by{...}.reverse! is fastest. Before running the test I thought it would be slower than "-a[:bar]" but negating the value turns out to take longer than it does to reverse the entire array in one pass. It's not much of a difference, but every little speed-up helps.
Please note that these results are different in Ruby 1.9
Here are results for Ruby 1.9.3p194 (2012-04-20 revision 35410) [x86_64-darwin10.8.0]:
user system total real
sort 1.340000 0.010000 1.350000 ( 1.346331)
sort reverse 1.300000 0.000000 1.300000 ( 1.310446)
sort_by -a[:bar] 0.430000 0.000000 0.430000 ( 0.429606)
sort_by a[:bar]*-1 0.420000 0.000000 0.420000 ( 0.414383)
sort_by.reverse! 0.400000 0.000000 0.400000 ( 0.401275)
These are on an old MacBook Pro. Newer, or faster machines, will have lower values, but the relative differences will remain.
Here's a bit updated version on newer hardware and the 2.1.1 version of Ruby:
#!/usr/bin/ruby
require 'benchmark'
puts "Running Ruby #{RUBY_VERSION}"
ary = []
1000.times {
ary << {:bar => rand(1000)}
}
n = 500
puts "n=#{n}"
Benchmark.bm(20) do |x|
x.report("sort") { n.times { ary.dup.sort{ |a,b| b[:bar] <=> a[:bar] } } }
x.report("sort reverse") { n.times { ary.dup.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse } }
x.report("sort_by -a[:bar]") { n.times { ary.dup.sort_by{ |a| -a[:bar] } } }
x.report("sort_by a[:bar]*-1") { n.times { ary.dup.sort_by{ |a| a[:bar]*-1 } } }
x.report("sort_by.reverse") { n.times { ary.dup.sort_by{ |a| a[:bar] }.reverse } }
x.report("sort_by.reverse!") { n.times { ary.dup.sort_by{ |a| a[:bar] }.reverse! } }
end
# >> Running Ruby 2.1.1
# >> n=500
# >> user system total real
# >> sort 0.670000 0.000000 0.670000 ( 0.667754)
# >> sort reverse 0.650000 0.000000 0.650000 ( 0.655582)
# >> sort_by -a[:bar] 0.260000 0.010000 0.270000 ( 0.255919)
# >> sort_by a[:bar]*-1 0.250000 0.000000 0.250000 ( 0.258924)
# >> sort_by.reverse 0.250000 0.000000 0.250000 ( 0.245179)
# >> sort_by.reverse! 0.240000 0.000000 0.240000 ( 0.242340)
New results running the above code using Ruby 2.2.1 on a more recent Macbook Pro. Again, the exact numbers aren't important, it's their relationships:
Running Ruby 2.2.1
n=500
user system total real
sort 0.650000 0.000000 0.650000 ( 0.653191)
sort reverse 0.650000 0.000000 0.650000 ( 0.648761)
sort_by -a[:bar] 0.240000 0.010000 0.250000 ( 0.245193)
sort_by a[:bar]*-1 0.240000 0.000000 0.240000 ( 0.240541)
sort_by.reverse 0.230000 0.000000 0.230000 ( 0.228571)
sort_by.reverse! 0.230000 0.000000 0.230000 ( 0.230040)
Updated for Ruby 2.7.1 on a Mid-2015 MacBook Pro:
Running Ruby 2.7.1
n=500
user system total real
sort 0.494707 0.003662 0.498369 ( 0.501064)
sort reverse 0.480181 0.005186 0.485367 ( 0.487972)
sort_by -a[:bar] 0.121521 0.003781 0.125302 ( 0.126557)
sort_by a[:bar]*-1 0.115097 0.003931 0.119028 ( 0.122991)
sort_by.reverse 0.110459 0.003414 0.113873 ( 0.114443)
sort_by.reverse! 0.108997 0.001631 0.110628 ( 0.111532)
...the reverse method doesn't actually return a reversed array - it returns an enumerator that just starts at the end and works backwards.
The source for Array#reverse is:
static VALUE
rb_ary_reverse_m(VALUE ary)
{
long len = RARRAY_LEN(ary);
VALUE dup = rb_ary_new2(len);
if (len > 0) {
const VALUE *p1 = RARRAY_CONST_PTR_TRANSIENT(ary);
VALUE *p2 = (VALUE *)RARRAY_CONST_PTR_TRANSIENT(dup) + len - 1;
do *p2-- = *p1++; while (--len > 0);
}
ARY_SET_LEN(dup, RARRAY_LEN(ary));
return dup;
}
do *p2-- = *p1++; while (--len > 0); is copying the pointers to the elements in reverse order if I remember my C correctly, so the array is reversed.
Just a quick thing, that denotes the intent of descending order.
descending = -1
a.sort_by { |h| h[:bar] * descending }
(Will think of a better way in the mean time) ;)
a.sort_by { |h| h[:bar] }.reverse!
You could do:
a.sort{|a,b| b[:bar] <=> a[:bar]}
I see that we have (beside others) basically two options:
a.sort_by { |h| -h[:bar] }
and
a.sort_by { |h| h[:bar] }.reverse
While both ways give you the same result when your sorting key is unique, keep in mind that the reverse way will reverse the order of keys that are equal.
Example:
a = [{foo: 1, bar: 1},{foo: 2,bar: 1}]
a.sort_by {|h| -h[:bar]}
=> [{:foo=>1, :bar=>1}, {:foo=>2, :bar=>1}]
a.sort_by {|h| h[:bar]}.reverse
=> [{:foo=>2, :bar=>1}, {:foo=>1, :bar=>1}]
While you often don't need to care about this, sometimes you do. To avoid such behavior you could introduce a second sorting key (that for sure needs to be unique at least for all items that have the same sorting key):
a.sort_by {|h| [-h[:bar],-h[:foo]]}
=> [{:foo=>2, :bar=>1}, {:foo=>1, :bar=>1}]
a.sort_by {|h| [h[:bar],h[:foo]]}.reverse
=> [{:foo=>2, :bar=>1}, {:foo=>1, :bar=>1}]
What about:
a.sort {|x,y| y[:bar]<=>x[:bar]}
It works!!
irb
>> a = [
?> { :foo => 'foo', :bar => 2 },
?> { :foo => 'foo', :bar => 3 },
?> { :foo => 'foo', :bar => 5 },
?> ]
=> [{:bar=>2, :foo=>"foo"}, {:bar=>3, :foo=>"foo"}, {:bar=>5, :foo=>"foo"}]
>> a.sort {|x,y| y[:bar]<=>x[:bar]}
=> [{:bar=>5, :foo=>"foo"}, {:bar=>3, :foo=>"foo"}, {:bar=>2, :foo=>"foo"}]
For those folks who like to measure speed in IPS ;)
require 'benchmark/ips'
ary = []
1000.times {
ary << {:bar => rand(1000)}
}
Benchmark.ips do |x|
x.report("sort") { ary.sort{ |a,b| b[:bar] <=> a[:bar] } }
x.report("sort reverse") { ary.sort{ |a,b| a[:bar] <=> b[:bar] }.reverse }
x.report("sort_by -a[:bar]") { ary.sort_by{ |a| -a[:bar] } }
x.report("sort_by a[:bar]*-1") { ary.sort_by{ |a| a[:bar]*-1 } }
x.report("sort_by.reverse!") { ary.sort_by{ |a| a[:bar] }.reverse }
x.compare!
end
And results:
Warming up --------------------------------------
sort 93.000 i/100ms
sort reverse 91.000 i/100ms
sort_by -a[:bar] 382.000 i/100ms
sort_by a[:bar]*-1 398.000 i/100ms
sort_by.reverse! 397.000 i/100ms
Calculating -------------------------------------
sort 938.530 (± 1.8%) i/s - 4.743k in 5.055290s
sort reverse 901.157 (± 6.1%) i/s - 4.550k in 5.075351s
sort_by -a[:bar] 3.814k (± 4.4%) i/s - 19.100k in 5.019260s
sort_by a[:bar]*-1 3.732k (± 4.3%) i/s - 18.706k in 5.021720s
sort_by.reverse! 3.928k (± 3.6%) i/s - 19.850k in 5.060202s
Comparison:
sort_by.reverse!: 3927.8 i/s
sort_by -a[:bar]: 3813.9 i/s - same-ish: difference falls within error
sort_by a[:bar]*-1: 3732.3 i/s - same-ish: difference falls within error
sort: 938.5 i/s - 4.19x slower
sort reverse: 901.2 i/s - 4.36x slower
Regarding the benchmark suite mentioned, these results also hold for sorted arrays.
sort_by/reverse it is:
# foo.rb
require 'benchmark'
NUM_RUNS = 1000
# arr = []
arr1 = 3000.times.map { { num: rand(1000) } }
arr2 = 3000.times.map { |n| { num: n } }.reverse
Benchmark.bm(20) do |x|
{ 'randomized' => arr1,
'sorted' => arr2 }.each do |label, arr|
puts '---------------------------------------------------'
puts label
x.report('sort_by / reverse') {
NUM_RUNS.times { arr.sort_by { |h| h[:num] }.reverse }
}
x.report('sort_by -') {
NUM_RUNS.times { arr.sort_by { |h| -h[:num] } }
}
end
end
And the results:
$: ruby foo.rb
user system total real
---------------------------------------------------
randomized
sort_by / reverse 1.680000 0.010000 1.690000 ( 1.682051)
sort_by - 1.830000 0.000000 1.830000 ( 1.830359)
---------------------------------------------------
sorted
sort_by / reverse 0.400000 0.000000 0.400000 ( 0.402990)
sort_by - 0.500000 0.000000 0.500000 ( 0.499350)
Simple Solution from ascending to descending and vice versa is:
STRINGS
str = ['ravi', 'aravind', 'joker', 'poker']
asc_string = str.sort # => ["aravind", "joker", "poker", "ravi"]
asc_string.reverse # => ["ravi", "poker", "joker", "aravind"]
DIGITS
digit = [234,45,1,5,78,45,34,9]
asc_digit = digit.sort # => [1, 5, 9, 34, 45, 45, 78, 234]
asc_digit.reverse # => [234, 78, 45, 45, 34, 9, 5, 1]

Resources