Is sort in Ruby stable? - ruby

Is sort in Ruby stable? That is, for elements that are in a tie for sort, is the relative order among them preserved from the original order? For example, given:
a = [
{id: :a, int: 3},
{id: :b, int: 1},
{id: :c, int: 2},
{id: :d, int: 0},
{id: :e, int: 1},
{id: :f, int: 0},
{id: :g, int: 1},
{id: :h, int: 2},
]
is it guaranteed that we always get for
a.sort_by{|h| h[:int]}
the following
[
{id: :d, int: 0},
{id: :f, int: 0},
{id: :b, int: 1},
{id: :e, int: 1},
{id: :g, int: 1},
{id: :c, int: 2},
{id: :h, int: 2},
{id: :a, int: 3},
]
without any variation for the relative order among the elements with the :id value :d, :f, and among :b, :e, :g, and among :c, :h? If that is the case, where in the documentation is it described?
This question may or may not have connection with this question.

Both MRI's sort and sort_by are unstable. Some time ago there was a request to make them stable, but it was rejected. The reason: Ruby uses an in-place quicksort algorithm, which performs better if stability is not required. Note that you can still implement stable methods from unstable ones:
module Enumerable
def stable_sort
sort_by.with_index { |x, idx| [x, idx] }
end
def stable_sort_by
sort_by.with_index { |x, idx| [yield(x), idx] }
end
end

No, ruby's built-in sort is not stable.
If you want stable sort, this should work. You probably want to create a method for it if you're going to use it often.
a.each_with_index.sort_by {|h, idx| [h[:int], idx] }.map(&:first)
Basically it keeps track of the original array index of each item, and uses it as a tie-breaker when h[:int] is the same.
More info, for the curious:
As far as I know, using the original array index as the tie-breaker is the only way to guarantee stability when using an unstable sort. The actual attributes (or other data) of the items will not tell you their original order.
Your example is somewhat contrived because the :id keys are sorted ascending in the original array. Suppose the original array were sorted descending by :id; you'd want the :id's in the result to be descending when tie-breaking, like so:
[
{:id=>:f, :int=>0},
{:id=>:d, :int=>0},
{:id=>:g, :int=>1},
{:id=>:e, :int=>1},
{:id=>:b, :int=>1},
{:id=>:h, :int=>2},
{:id=>:c, :int=>2},
{:id=>:a, :int=>3}
]
Using the original index will handle this too.
Update:
Matz's own suggestion (see this page) is similar, and may be slightly more efficient than the above:
n = 0
ary.sort_by {|x| n+= 1; [x, n]}

For some implementations of Ruby, sort is stable, but you shouldn't depend upon it. Stability of Ruby's sort is implementation defined.
What the documentation says
The documentation says that you should not depend upon sort being stable:
The result is not guaranteed to be stable. When the comparison of two elements returns 0, the order of the elements is unpredictable.
Note that this does not say whether or not the sort is stable. It just says it is not guaranteed to be stable. Any given implementation of Ruby could have a stable sort and still be consistent with the documentation. It could also have an unstable sort, or change whether the sort is stable at any time.
What Ruby actually does
This test code prints true if Ruby's sort is stable, or false if it is not:
Foo = Struct.new(:value, :original_order) do
def <=>(foo)
value <=> foo.value
end
end
size = 1000
unsorted = size.times.map do |original_order|
value = rand(size / 10)
Foo.new(value, original_order)
end
sorted = unsorted.sort
stably_sorted = unsorted.sort_by do |foo|
[foo.value, foo.original_order]
end
p [RUBY_PLATFORM, RUBY_VERSION, RUBY_PATCHLEVEL, sorted == stably_sorted]
Here are the results for all of the Rubies I have installed on my Linux box:
["java", "1.8.7", 357, false]
["java", "1.9.3", 551, false]
["java", "2.3.3", 0, true]
["java", "2.5.7", 0, true]
["x86_64-linux", "1.8.7", 374, false]
["x86_64-linux", "1.8.7", 374, false]
["x86_64-linux", "1.8.7", 376, false]
["x86_64-linux", "1.9.3", 392, false]
["x86_64-linux", "1.9.3", 484, false]
["x86_64-linux", "1.9.3", 551, false]
["x86_64-linux", "2.0.0", 643, false]
["x86_64-linux", "2.0.0", 648, false]
["x86_64-linux", "2.1.0", 0, false]
["x86_64-linux", "2.1.10", 492, false]
["x86_64-linux", "2.1.1", 76, false]
["x86_64-linux", "2.1.2", 95, false]
["x86_64-linux", "2.1.3", 242, false]
["x86_64-linux", "2.1.4", 265, false]
["x86_64-linux", "2.1.5", 273, false]
["x86_64-linux", "2.1.6", 336, false]
["x86_64-linux", "2.1.7", 400, false]
["x86_64-linux", "2.1.8", 440, false]
["x86_64-linux", "2.1.9", 490, false]
["x86_64-linux", "2.2.0", 0, true]
["x86_64-linux", "2.2.1", 85, true]
["x86_64-linux", "2.2.2", 95, true]
["x86_64-linux", "2.2.3", 173, true]
["x86_64-linux", "2.2.4", 230, true]
["x86_64-linux", "2.2.5", 319, true]
["x86_64-linux", "2.2.6", 396, true]
["x86_64-linux", "2.3.0", 0, true]
["x86_64-linux", "2.3.1", 112, true]
["x86_64-linux", "2.3.2", 217, true]
["x86_64-linux", "2.3.3", 222, true]
["x86_64-linux", "2.4.0", 0, true]
["x86_64-linux", "2.4.0", -1, true]
["x86_64-linux", "2.4.0", -1, true]
["x86_64-linux", "2.4.0", -1, true]
["x86_64-linux", "2.4.0", -1, true]
["x86_64-linux", "2.4.1", 111, true]
["x86_64-linux", "2.4.2", 198, true]
["x86_64-linux", "2.4.5", 335, true]
["x86_64-linux", "2.4.9", 362, true]
["x86_64-linux", "2.5.0", 0, true]
["x86_64-linux", "2.5.3", 105, true]
["x86_64-linux", "2.5.7", 206, true]
["x86_64-linux", "2.6.0", 0, true]
["x86_64-linux", "2.6.2", 47, true]
["x86_64-linux", "2.6.3", 62, true]
["x86_64-linux", "2.6.4", 104, true]
["x86_64-linux", "2.6.5", 114, true]
["x86_64-linux", "2.6.6", 146, true]
["x86_64-linux", "2.7.0", 0, true]
["x86_64-linux", "2.7.1", 83, true]
["x86_64-linux", "2.7.2", 137, true]
["x86_64-linux", "3.0.0", 0, true]
["x86_64-linux", "3.0.0", -1, true]
["x86_64-linux", "3.0.1", 64, true]
["x86_64-linux", "3.0.2", 107, true]
["x86_64-linux", "3.0.3", 157, true]
["x86_64-linux", "3.1.0", 0, true]
["x86_64-linux", "3.1.1", 18, true]
["x86_64-linux", "3.1.2", 20, true]
We can see that JRuby is unstable, and MRI before 2.2, on Linux, is unstable. MRI >= 2.2.0 is stable (again, on Linux).
The platform matters, though. Although the above result shows that sort is stable in MRI 2.4.1 on Linux, the same version is unstable on Windows:
["x64-mingw32", "2.4.1", 111, false]
Why is MRI's sort stable on Linux, but not on Windows?
Even within a single version of a Ruby implementation, the sort algorithm can change. MRI can use at least three different sorts. The sort routine is selected at compile time using a series of #ifdefs in util.c. It looks like MRI has the ability to use sorts from at least two different libraries. It also has its own implementation.
What should you do about it?
Since the sort may be stable but can't be guaranteed to be stable, do not write code which depends upon Ruby's sort being stable. That code could break when used on a different version, implementation, or platform.

personally I wouldn't count on this. How bout doing something like this:
a.sort {|a, b| s1 = a[:int] <=> b[:int]; s1 != 0 ? s1 : a[:id] <=> b[:id] }

Related

Subtract Range of Numbers Algorithm

How would I accomplish the following: Take one array of ranges and subtract another array of ranges from it.
For example:
arr0 = [[0,50],[60,80],[100,150]] # 0-50, 60-80, etc.
arr1 = [[4,8],[15,20]] # 4-8, 15-20, etc.
# arr0 - arr1 magic
result = [[0,3],[9,14],[21,50],[60,80],[100,150]] # 0-3, 9-14, etc.
What's the cleanest and most efficient way to do this in Ruby?
This is a deliberately naïve solution. It's not efficient, but easy to comprehend and quite short.
Deconstruct arr0 into a list of numbers:
n1 = arr0.flat_map { |a, b| (a..b).to_a }
#=> [0, 1, ..., 49, 50, 60, 61, ..., 79, 80, 100, 101, ..., 149, 150]
Same for arr1:
n2 = arr1.flat_map { |a, b| (a..b).to_a }
#=> [4, 5, 6, 7, 8, 15, 16, 17, 18, 19, 20]
Then, subtract n2 from n1 and recombine consecutive numbers:
(n1 - n2).chunk_while { |a, b| a.succ == b }.map(&:minmax)
#=> [[0, 3], [9, 14], [21, 50], [60, 80], [100, 150]]

Finding similar objects located in same index position of arrays in Ruby

I have the following hash:
hash = {"1"=>[ 5, 13, "B", 4, 10],
"2"=>[27, 19, "B", 18, 20],
"3"=>[45, 41, "B", 44, 31],
"4"=>[48, 51, "B", 58, 52],
"5"=>[70, 69, "B", 74, 73]}
Here is my code:
if hash.values.all? { |array| array[0] == "B" } ||
hash.values.all? { |array| array[1] == "B" } ||
hash.values.all? { |array| array[2] == "B" } ||
hash.values.all? { |array| array[3] == "B" } ||
hash.values.all? { |array| array[4] == "B" }
puts "Hello World"
What my code does is iterates through an array such that if the same element appears in the same index position of each array, it will output the string "Hello World" (Since "B" is in the [2] position of each array, it will puts the string. Is there a way to condense my current code without having a bunch of or's connecting each index of the array?
Assuming all arrays are always of the same length, the following gives you the column indexes where all values are equal:
hash.values.transpose.each_with_index.map do |column, index|
index if column.all? {|x| x == column[0] }
end.compact
The result is [2] for your hash. So you know that for all arrays the index 2 has the same values.
You can print "Hello World" if the resulting array has at least one element.
How does it work?
hash.values.transpose gives you all the arrays, but with transposed (all rows are now columns) values:
hash.values.transpose
=> [[5, 27, 45, 48, 70],
[13, 19, 41, 51, 69],
["B", "B", "B", "B", "B"],
[4, 18, 44, 58, 74],
[10, 20, 31, 52, 73]]
.each_with_index.map goes over every row of the transposed array while providing an inner array and its index.
We look at every inner array, yielding the column index only if all elements are equal using all?.
hash.values.transpose.each_with_index.map {|column, index| index if column.all? {|x| x == column[0] }
=> [nil, nil, 2, nil, nil]
Finally, we compact the result to get rid of the nil values.
Edit: First, I used reduce to find the column with identical elements. #Nimir pointed out, that I re-implemented all?. So I edited my anwer to use all?.
From #tessi brilliant answer i though of this way:
hash.values.transpose.each_with_index do |column, index|
puts "Index:#{index} Repeated value:#{column.first}" if column.all? {|x| x == column[0]}
end
#> Index:2 Repeated value:B
How?
Well, the transpose already solves the problem:
hash.values.transpose
=> [[5, 27, 45, 48, 70],
[13, 19, 41, 51, 69],
["B", "B", "B", "B", "B"],
[4, 18, 44, 58, 74],
[10, 20, 31, 52, 73]
]
We can do:
column.all? {|x| x == column[0]}
To find column with identical items
Assuming that all the values of the hash will be arrays of the same size, how about something like:
hash
=> {"1"=>[5, 13, "B", 4, 10], "2"=>[27, 19, "B", 18, 20], "3"=>[45, 41, "B", 44, 31], "4"=>[48, 51, "B", 58, 52], "5"=>[70, 69, "B", 74, 73]}
arr_of_arrs = hash.values
=> [[5, 13, "B", 4, 10], [27, 19, "B", 18, 20], [45, 41, "B", 44, 31], [48, 51, "B", 58, 52], [70, 69, "B", 74, 73]]
first_array = arr_of_arrs.shift
=> [5, 13, "B", 4, 10]
first_array.each_with_index do |element, index|
arr_of_arrs.map {|arr| arr[index] == element }.all?
end.any?
=> true
This is not really different from what you have now, as far as performance - in fact, it may be a bit slower. However, it allows for a dynamic number of incoming key/value pairs.
I ended up using the following:
fivebs = ["B","B","B","B","B"]
if hash.values.transpose.any? {|array| array == fivebs}
puts "Hello World"
If efficiency, rather than readability, is most important, I expect this decidedly un-Ruby-like and uninteresting solution probably would do well:
arr = hash.values
arr.first.size.times.any? { |i| arr.all? { |e| e[i] == ?B } }
#=> true
Only one intermediate array (arr) is constructed (e.g, no transposed array), and it quits if and when a match is found.
More Ruby-like is the solution I mentioned in a comment on your question:
hash.values.transpose.any? { |arr| arr.all? { |e| e == ?B } }
As you asked for an explanation of #Phrogz's solution to the earlier question, which is similar to this one, let me explain the above line of code, by stepping through it:
a = hash.values
#=> [[ 5, 13, "B", 4, 10],
# [27, 19, "B", 18, 20],
# [45, 41, "B", 44, 31],
# [48, 51, "B", 58, 52],
# [70, 69, "B", 74, 73]]
b = a.transpose
#=> [[ 5, 27, 45, 48, 70],
# [ 13, 19, 41, 51, 69],
# ["B", "B", "B", "B", "B"],
# [ 4, 18, 44, 58, 74],
# [ 10, 20, 31, 52, 73]]
In the last step:
b.any? { |arr| arr.all? { |e| e == ?B } }
#=> true
(where ?B is shorthand for the one-character string "B") an enumerator is created:
c = b.to_enum(:any?)
#=> #<Enumerator: [[ 5, 27, 45, 48, 70],
# [ 13, 19, 41, 51, 69],
# ["B", "B", "B", "B", "B"],
# [ 4, 18, 44, 58, 74],
# [ 10, 20, 31, 52, 73]]:any?>
When the enumerator (any enumerator) is acting on an array, the elements of the enumerator are passed into the block (and assigned to the block variable, here arr) by Array#each. The first element passed into the block is:
arr = [5, 27, 45, 48, 70]
and the following is executed:
arr.all? { |e| e == ?B }
#=> [5, 27, 45, 48, 70].all? { |e| e == ?B }
#=> false
Notice that false is returned to each right after:
5 == ?B
#=> false
is evaluated. Since false is returned, we move on to the second element of the enumerator:
[13, 19, 41, 51, 69].all? { |e| e == ?B }
#=> false
so we continue. But
["B", "B", "B", "B", "B"].all? { |e| e == ?B }
#=> true
so when true is returned to each, the latter returns true and we are finished.

How to pick top 5 values from a hash?

I have a hash of ids and their scores, it's something like this:
#objects = {1=>57, 4=>12, 3=>9, 5=>3, 55=>47, 32=>39, 17=>27, 29=>97, 39=>58}
How can I pick the top five and drop the rest ?
I'm doing this:
#orderedObject = #objects.sort_by {|k,v| v}.reverse
=>[[29, 97], [39, 58], [1, 57], [55, 47], [32, 39], [17, 27], [4, 12], [3, 9], [5, 3]]
Then I do this:
only Keys of the #orderedObjects:
#keys = #orderedObject.map { |key, value| key }
which gives me:
=>[29, 39, 1, 55, 32, 17, 4, 3, 5]
ALL I need is [29, 39, 1, 55, 32] the first 5 indexes. But I'm stuck I don't know how to do this.
You can do
#objects = {1=>57, 4=>12, 3=>9, 5=>3, 55=>47, 32=>39, 17=>27, 29=>97, 39=>58}
#objects.sort_by { |_, v| -v }[0..4].map(&:first)
# => [29, 39, 1, 55, 32]
#objects.sort_by { |_, v| -v }.first(5).map(&:first)
# => [29, 39, 1, 55, 32]
May i suggest this more verbose requires ruby > 1.9
Hash[#objects.sort_by{|k,v| -v}.first(5)].keys
A variant of Prof. Arup's answer:
objects = {1=>57, 4=>12, 3=>9, 5=>3, 55=>47, 32=>39, 17=>27, 29=>97, 39=>58}
objects.sort_by { |k,v| -v }.first(5).to_h.keys #=> [29, 39, 1, 55, 32]
Now suppose 3=>9 were instead 3=>39 and you wanted the keys corresponding to the top 5 values (which, in this case, would be 6 keys, as 39 is the fifth largest value, 3=>39 and 32=>39), you could first compute:
threshold = objects.values.sort.last(5).min #=> 39
If you wanted the keys to be ordered by the order of values threshold or larger,
objects.select { |_,v| v >= threshold }.sort_by { |_,v| -v }.map(&:first)
#=> [29, 39, 1, 55, 3, 32]
If you don't care about the order,
objects.select { |_,v| v >= threshold }.keys #=> [1, 3, 55, 32, 29, 39]

Array select to get true and false arrays?

I know I can get this easily:
array = [45, 89, 23, 11, 102, 95]
lower_than_50 = array.select{ |n| n<50}
greater_than_50 = array.select{ |n| !n<50}
But is there a method (or an elegant manner) to get this by only running select once?
[lower_than_50, greater_than_50] = array.split_boolean{ |n| n<50}
over, under_or_equal = [45, 89, 23, 11, 102, 95].partition{|x| x>50 }
Or simply:
result = array.partition{|x| x>50 }
p result #=> [[89, 102, 95], [45, 23, 11]]
if you rather want the result as one array with two sub-arrays.
Edit: As a bonus, here is how you would to it if you have more than two alternatives and want to split the numbers:
my_custom_grouping = -> x do
case x
when 1..50 then :small
when 51..100 then :large
else :unclassified
end
end
p [-1,2,40,70,120].group_by(&my_custom_grouping) #=> {:unclassified=>[-1, 120], :small=>[2, 40], :large=>[70]}
The answer above is spot on!
Here is a general solution for more than two partitions (for example: <20, <50, >=50):
arr = [45, 89, 23, 11, 102, 95]
arr.group_by { |i| i < 20 ? 'a' : i < 50 ? 'b' : 'c' }.sort.map(&:last)
=> [[11], [45, 23], [89, 102, 95]]
This can be very useful if you're grouping by chunks (or any mathematically computable index such as modulo):
arr.group_by { |i| i / 50 }.sort.map(&:last)
=> [[45, 23, 11], [89, 95], [102]]

Using Enumerable#zip on an Array of Arrays

I am trying to use Enumerable#zip on an array of arrays in order to group the elements of the first nested array with the corresponding elements of each subsequent nested array. This is my array:
roster = [["Number", "Name", "Position", "Points per Game"],
["12","Joe Schmo","Center",[14, 32, 7, 0, 23] ],
["9", "Ms. Buckets ", "Point Guard", [19, 0, 11, 22, 0] ],
["31", "Harvey Kay", "Shooting Guard", [0, 30, 16, 0, 25] ],
["7", "Sally Talls", "Power Forward", [18, 29, 26, 31, 19] ],
["22", "MK DiBoux", "Small Forward", [11, 0, 23, 17, 0] ]]
I want to group "Number" with "12", "9", "31", "7", and "22", and then do the same for "Name", "Position", etc. using zip. The following gives me the output I want:
roster[0].zip(roster[1], roster[2], roster[3], roster[4], roster[5])
How can I reformat this so that if I added players to my roster, they would be automatically included in the zip without me having to manually type in roster[6], roster[7], etc. I've tried using ranges in a number of ways but nothing seems to have worked yet.
First extract the head and tail of the list (header and rows, respectively) using a splat, then zip them together:
header, *rows = roster
header.zip(*rows)
This is the same as using transpose on the original roster:
header, *rows = roster
zipped = header.zip(*rows)
roster.transpose == zipped #=> true
:zip.to_proc[*roster]
a bit more flexible than transpose:
:zip.to_proc[*[(0..2), [:a, :b, :c]]] #=> [[0, :a], [1, :b], [2, :c]]
p roster.transpose()
.......................
roster[0].zip(*(roster[1..-1]))
Doesn't matter how many are in the roster array.

Resources