How does <=> work for different sorting strategies? - ruby

I'm going through some tutorials on CodeAcademy, and came across this scenario:
books = ["Charlie and the Chocolate Factory", "War and Peace", "Utopia", "A Brief History of Time", "A Wrinkle in Time"]
# To sort our books in ascending order, in-place
books.sort! { |firstBook, secondBook| firstBook <=> secondBook }
# Sort your books in descending order, in-place below
# this lin initially left blank
books.sort! {|firstBook, secondBook| secondBook <=> firstBook}
Instead of using if/else blocks, I gave this a shot, and it worked, but I don't know why. I assume it doesn't matter which order you place the items in the check (i.e., a <=> b vs. b <=> a). Could someone explain what's happening here?

If you reverse the elements in <=> you reverse its value. If the elements are equal this operator returns 0, but if the first one is smaller it returns a negative value, if the first is greater it returns a positive value. Thus if temp = a <=> b then b <=> a is -temp. So you reverse the order of sorting if you write the arguments in reverse order.

Here's some simple visual ways of seeing what <=> does, and how reversing the order of the comparison variables affects the order of the output.
Starting with a basic Array:
foo = %w[a z b x]
We can do an ascending sort:
foo.sort { |i, j| i <=> j } # => ["a", "b", "x", "z"]
Or a descending sort by reversing the two variables being compared:
foo.sort { |i, j| j <=> i } # => ["z", "x", "b", "a"]
The <=> operator returns -1, 0 or 1, depending on whether the comparison is <, == or > respectively.
We can test that by negating the result of the comparison, which will reverse the order if the theory holds true.
foo.sort { |i, j| -(i <=> j) } # => ["z", "x", "b", "a"]
foo.sort { |i, j| -(j <=> i) } # => ["a", "b", "x", "z"]
By negating the result of the comparisons the order does reverse. But, for clarity in code, just reverse the order of the variables.
That all said, using sort, or its destructive sibling sort!, isn't always the fastest way to sort complex objects. Simple objects, like strings and characters, and numerics, sort extremely quickly because their classes implement the necessary methods to perform <=> tests quickly.
Some of the answers and comments mention sort_by, so let's go there.
Complex objects don't usually sort correctly, so we end up using getters/accessors to retrieve some value we want to compare against, and that action has a cost in CPU time. sort repeatedly compares the values so that retrieval would occur repeatedly, and add up as wasted time when sorting wasn't happening.
To fix that, a smart guy named Randall Schwartz, who's a major player in the Perl world, started using an algorithm that precomputes once the value to be used to sort; That algorithm is commonly called a Schwartzian Transform as a result. That value, and the actual object, are bundled together in a small sub-array, and then sorted. Because the sort occurs against the pre-computed value, it, and its associated object, are moved around in the ordering, until the sort completes. At that point, the actual objects are retrieved and returned as the result of the method. Ruby implements that type of sort using sort_by.
sort_by doesn't use <=> externally, so you can sort by simply telling it how to get at the value you want to compare against:
class Foo
attr_reader :i, :c
def initialize(i, c)
#i = i
#c = c
end
end
Here's the array of objects. Note that they are in the order they were created, but not sorted:
foo = [[1, 'z'], [26, 'a'], [2, 'x'], [25, 'b'] ].map { |i, c| Foo.new(i, c) }
# => [#<Foo:0x007f97d1061d80 #c="z", #i=1>,
# #<Foo:0x007f97d1061d58 #c="a", #i=26>,
# #<Foo:0x007f97d1061d30 #c="x", #i=2>,
# #<Foo:0x007f97d1061ce0 #c="b", #i=25>]
Sorting them by the integer value:
foo.sort_by{ |f| f.i }
# => [#<Foo:0x007f97d1061d80 #c="z", #i=1>,
# #<Foo:0x007f97d1061d30 #c="x", #i=2>,
# #<Foo:0x007f97d1061ce0 #c="b", #i=25>,
# #<Foo:0x007f97d1061d58 #c="a", #i=26>]
Sorting them by the character value:
foo.sort_by{ |f| f.c }
# => [#<Foo:0x007f97d1061d58 #c="a", #i=26>,
# #<Foo:0x007f97d1061ce0 #c="b", #i=25>,
# #<Foo:0x007f97d1061d30 #c="x", #i=2>,
# #<Foo:0x007f97d1061d80 #c="z", #i=1>]
sort_by doesn't respond as well to using a negated value as sort and <=>, so, based on some benchmarks done a while back on Stack Overflow, we know that using reverse on the resulting value is the fastest way to switch the order from ascending to descending:
foo.sort_by{ |f| f.i }.reverse
# => [#<Foo:0x007f97d1061d58 #c="a", #i=26>,
# #<Foo:0x007f97d1061ce0 #c="b", #i=25>,
# #<Foo:0x007f97d1061d30 #c="x", #i=2>,
# #<Foo:0x007f97d1061d80 #c="z", #i=1>]
foo.sort_by{ |f| f.c }.reverse
# => [#<Foo:0x007f97d1061d80 #c="z", #i=1>,
# #<Foo:0x007f97d1061d30 #c="x", #i=2>,
# #<Foo:0x007f97d1061ce0 #c="b", #i=25>,
# #<Foo:0x007f97d1061d58 #c="a", #i=26>]
They're somewhat interchangable, but you have to remember that sort_by does have overhead, which is apparent when you compare its times against sort times when running against simple objects. Use the right method at the right time and you can see dramatic speed-ups.

Its called a spaceship operator
If you have something like this
my_array = ["b","c","a"]
my_array.sort! does the compare the elements of the array since it knows that the letters of english alphabet have natural ordering, likewise if you have array of integers
my_array2 = [3,1,2]
my_array2.sort! will compare the elements and gives the result as [1,2,3]
but if you want to change how the comparison is made in an array of strings or complex objects you specify it using the <=> operator..
my_array3 = ["hello", "world how are" , "you"]
my_array3.sort! { |first_element, second_element| first_element <=> second_element }
so it will tell the sort method to compare like this:
Is first_element < second_element?
Is first_element = second_element?
Is first_element > second_element?
but if you take this stmt,
my_array3.sort! { |first_element, second_element| first_element <=> second_element }
the comparison is made as follows:
Is second_element < first_element?
Is second_element = first_element?
Is second_element > first_element?
So it does make a difference if you change the elements to be considered.

Related

How to sort hash in Ruby

I have the following hash:
scores = {
"Charlie" => 0
"Delta" => 5
"Beta" => 2
"Alpha" => 0
}
The numbers listed are integers, and the teams are represented as strings.
How can I sort the list by score, and if there is a tie, list it alphabetically, and then output it?
I imagine the intended output to look like:
1. Delta, 5 pts
2. Beta, 2 pts
3. Alpha, 0 pts
4. Charlie, 0 pts
I sorted the hash, but am not sure how to sort by alphabetical order if there is a tie. The code I used is below:
scores = Hash[ scores.sort_by { |team_name, scores_array| scores_array.sum.to_s } ]
In Ruby, Arrays are lexicographically ordered. You can make use of that fact for sorting by multiple keys: just create an array of the keys you want to sort by:
scores.sort_by {|team, score| [-score, team] }.to_h
#=> {'Delta' => 5, 'Beta' => 2, 'Alpha' => 0, 'Charlie' => 0}
The general pattern for sorting a Hash into a sorted Array of key/value pairs looks like this.
sorted_array_of_tuples = hash.sort { |a,b| ... }
A Hash has the Enumerable mixin which means it can use sort. Used on a Hash sort compares tuples (two element Arrays) of the key and value, so we can craft block that compares both the key and value easily.
Usually you then use the sorted Array.
sorted = hash.sort { ... }
sorted.each { |t|
puts "#{t[0]}: #{t[1]}"
}
But you can turn it back into a Hash with to_h. Since Hashes in Ruby remember the order their keys were inserted, the newly minted Hash created from the sorted Array will retain its sorting.
sorted_hash = hash.sort { |a,b| ... }.to_h
...but if more keys are added to sorted_hash they will not be sorted, they'll just go at the end. So you'll probably want to freeze the sorted hash to prevent modifications ruining the sorting.
sorted_hash.freeze
As for the sort block, in other languages an idiom of "compare by this, and if they're equal, compare by that" would look like this:
sorted_scores = scores.sort { |a,b|
# Compare values or compare keys
a[1] <=> b[1] || a[0] <=> b[0]
}
This takes advantage that <=> returns 0 when they're equal. In other languages 0 is false so you can use logical operators to create a whole chain of primary, secondary, and tertiary comparisons.
But in Ruby 0 isn't false. Only false is false. Usually this is a good thing, but here it means we need to be more specific.
sorted_scores = scores.sort { |a,b|
# Compare values
cmp = a[1] <=> b[1]
# Compare keys if the values were equal
cmp = a[0] <=> b[0] if cmp == 0
# Return the comparison
cmp
}
You can use <=> to compare and do it in a block where you first compare by value (score) and if those match, then compare by key (name).
scores = {
"Alpha" => 0,
"Beta" => 2,
"Charlie" => 0,
"Delta" => 5,
}
# convert to an array of arrays
# => [[Alpha, 0], [Beta, 2], [Charlie, 0], [Delta, 5]]
# then sort by value and, if needed, by key
scores = scores.to_a.sort! { |a,b|
cmp = a[1] <=> b[1] # sort by value (index = 1)
cmp = a[0] <=> b[0] if cmp == 0 # sort by key (index = 0) if values matched
cmp
}.to_h # convert back to a hash
puts scores
Or if you want to extract the comparison code into a method for reuse/clarity, you can have it call the method.
# Compare entries from the scores hash.
# Each entry has a name (key) and score (value) (e.g. ["Alpha", 0].
# First compare by scores, then by names (if scores match).
def compare_score_entries(a, b)
cmp = a[1] <=> b[1]
cmp = a[0] <=> b[0] if cmp == 0
return cmp
end
scores = scores.sort(&method(:compare_score_entries)).to_h
You can sort it like this
scores = {
"Charlie" => 0,
"Delta" => 5,
"Beta" => 2,
"Alpha" => 0
}
puts scores
.map{|name, score| [-score, name]}
.sort
.zip((0...scores.size).to_a)
.map{|(score, name), i| "#{i + 1}. #{name}, #{-score} pts"}
notice the minus score. It a trick to sort integer reversely.
You can try this
scores = {
'Charlie' => 0,
'Delta' => 5,
'Beta' => 2,
'Alpha' => 0
}
scores_sorted = scores.sort_by { |_key, value| -value }
scores_sorted.each.with_index(1) do |value, index|
puts "#{index}. #{value[0]}, #{value[1]} pts"
end

ruby syntax code involving hashes

I was looking at code regarding how to return a mode from an array and I ran into this code:
def mode(array)
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
answer.select { |k,v| v == answer.values.max}.keys
end
I'm trying to conceptualize what the syntax means behind it as I am fairly new to Ruby and don't exactly understand how hashes are being used here. Any help would be greatly appreciated.
Line by line:
answer = array.inject ({}) { |k, v| k[v]=array.count(v);k}
This assembles a hash of counts. I would not have called the variable answer because it is not the answer, it is an intermediary step. The inject() method (also known as reduce()) allows you to iterate over a collection, keeping an accumulator (e.g. a running total or in this case a hash collecting counts). It needs a starting value of {} so that the hash exists when attempting to store a value. Given the array [1,2,2,2,3,4,5,6,6] the counts would look like this: {1=>1, 2=>3, 3=>1, 4=>1, 5=>1, 6=>2}.
answer.select { |k,v| v == answer.values.max}.keys
This selects all elements in the above hash whose value is equal to the maximum value, in other words the highest. Then it identifies the keys associated with the maximum values. Note that it will list multiple values if they share the maximum value.
An alternative:
If you didn't care about returning multiple, you could use group_by as follows:
array.group_by{|x|x}.values.max_by(&:size).first
or, in Ruby 2.2+:
array.group_by{&:itself}.values.max_by(&:size).first
The inject method acts like an accumulator. Here is a simpler example:
sum = [1,2,3].inject(0) { |current_tally, new_value| current_tally + new_value }
The 0 is the starting point.
So after the first line, we have a hash that maps each number to the number of times it appears.
The mode calls for the most frequent element, and that is what the next line does: selects only those who are equal to the maximum.
I believe your question has been answered, and #Mark mentioned different ways to do the calculations. I would like to just focus on other ways to improve the first line of code:
answer = array.inject ({}) { |k, v| k[v] = array.count(v); k }
First, let's create some data:
array = [1,2,1,4,3,2,1]
Use each_with_object instead of inject
My suspicion is that the code might be fairly old, as Enumerable#each_with_object, which was introduced in v. 1.9, is arguably a better choice here than Enumerable#inject (aka reduce). If we were to use each_with_object, the first line would be:
answer = array.each_with_object ({}) { |v,k| k[v] = array.count(v) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
each_with_object returns the object, a hash held by the block variable v.
As you see, each_with_object is very similar to inject, the only differences being:
it is not necessary to return v from the block to each_with_object, as it is with inject (the reason for that annoying ; v at the end of inject's block);
the block variable for the object (k) follows v with each_with_object, whereas it proceeds v with inject; and
when not given a block, each_with_object returns an enumerator, meaning it can be chained to other other methods (e.g., arr.each_with_object.with_index ....
Don't get me wrong, inject remains an extremely powerful method, and in many situations it has no peer.
Two more improvements
In addition to replacing inject with each_with_object, let me make two other changes:
answer = array.uniq.each_with_object ({}) { |k,h| h[k] = array.count(k) }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
In the original expression, the object returned by inject (sometimes called the "memo") was represented by the block variable k, which I am using to represent a hash key ("k" for "key"). Simlarly, as the object is a hash, I chose to use h for its block variable. Like many others, I prefer to keep the block variables short and use names that indicate object type (e.g., a for array, h for hash, s for string, sym for symbol, and so on).
Now suppose:
array = [1,1]
then inject would pass the first 1 into the block and then compute k[1] = array.count(1) #=> 2, so the hash k returned to inject would be {1=>2}. It would then pass the second 1 into the block, again compute k[1] = array.count(1) #=> 2, overwriting 1=>1 in k with 1=>1; that is, not changing it at all. Doesn't it make more sense to just do this for the unique values of array? That's why I have: array.uniq....
Even better: use a counting hash
This is still quite inefficient--all those counts. Here's a way that reads better and is probably more efficient:
array.each_with_object(Hash.new(0)) { |k,h| h[k] += 1 }
#=> {1=>3, 2=>2, 4=>1, 3=>1}
Let's have a look at this in gory detail. Firstly, the docs for Hash#new read, "If obj is specified [i.e., Hash.new(obj)], this single object will be used for all default values." This means that if:
h = Hash.new('cat')
and h does not have a key dog, then:
h['dog'] #=> 'cat'
Important: The last expression is often misunderstood. It merely returns the default value. str = "It does *not* add the key-value pair 'dog'=>'cat' to the hash." Let me repeat that: puts str.
Now let's see what's happening here:
enum = array.each_with_object(Hash.new(0))
#=> #<Enumerator: [1, 2, 1, 4, 3, 2, 1]:each_with_object({})>
We can see the contents of the enumerator by converting it to an array:
enum.to_a
#=> [[1, {}], [2, {}], [1, {}], [4, {}], [3, {}], [2, {}], [1, {}]]
These seven elements are passed into the block by the method each:
enum.each { |k,h| h[k] += 1 }
=> {1=>3, 2=>2, 4=>1, 3=>1}
Pretty cool, eh?
We can simulate this using Enumerator#next. The first value of enum ([1, {}]) is passed to the block and assigned to the block variables:
k,h = enum.next
#=> [1, {}]
k #=> 1
h #=> {}
and we compute:
h[k] += 1
#=> h[k] = h[k] + 1 (what '+=' means)
# = 0 + 1 = 1 (h[k] on the right equals the default value
# of 1 since `h` has no key `k`)
so now:
h #=> {1=>1}
Next, each passes the second value of enum into the block and similar calculations are performed:
k,h = enum.next
#=> [2, {1=>1}]
k #=> 2
h #=> {1=>1}
h[k] += 1
#=> 1
h #=> {1=>1, 2=>1}
Things are a little different when the third element of enum is passed in, because h now has a key 1:
k,h = enum.next
#=> [1, {1=>1, 2=>1}]
k #=> 1
h #=> {1=>1, 2=>1}
h[k] += 1
#=> h[k] = h[k] + 1
#=> h[1] = h[1] + 1
#=> h[1] = 1 + 1 => 2
h #=> {1=>1, 2=>1}
The remaining calculations are performed similarly.

Having trouble with the sort method

I am building a histogram based on of the amount of words in a text file. I have an array of hashes whose keys are the words and the values are the amount of times the word appears per line. I need to use the sort method on this array of hashes to sort the values in order of the most occurring word to the least. This is what my sort line looks like:
twoOfArray.sort { |k, v| v <=> k }
twoOfArray.each { |key, value| puts "#{key} occurs #{value} times" "\n"}
Full code is here. If I use the sort! method, I get an undefined method error. Does anyone know why?
I would convert your data structure (an array of hashes) into just one large hash. If you want to sort the words, there's no reason to have them in separate hashes.
Then, if your hash is something like {'the' => 5, 'and' => 23, 'beer' => 2} you can sort via:
> h = {'the' => 5, 'and' => 23, 'beer' => 2}
> a = h.sort {|a, b| b[1] <=> a[1] } # sort converts a hash into an array of arrays.
> a
#=> [['and', 23], ['the', 5], ['beer', 2]]

Check to see if an array is already sorted?

I know how to put an array in order, but in this case I just want to see if it is in order. An array of strings would be the easiest, I imagine, and answers on that front are appreciated, but an answer that includes the ability to check for order based on some arbitrary parameter is optimal.
Here's an example dataset. The name of:
[["a", 3],["b",53],["c",2]]
Where the elements are themselves arrays containing several elements, the first of which is a string. I want to see if the elements are in alphabetical order based on this string.
It looks like a generic abstraction, let's open Enumerable:
module Enumerable
def sorted?
each_cons(2).all? { |a, b| (a <=> b) <= 0 }
end
end
[["a", 3], ["b", 53],["c", 2]].sorted? #=> true
Notice that we have to write (a <=> b) <= 0 instead of a <= b because there are classes that support <=> but not the comparator operators (i.e. Array), since they do not include the module Comparable.
You also said you'd like to have the ability "to check for order based on some arbitrary parameter":
module Enumerable
def sorted_by?
each_cons(2).all? { |a, b| ((yield a) <=> (yield b)) <= 0 }
end
end
[["a", 3], ["b", 1], ["c", 2]].sorted_by? { |k, v| v } #=> false
Using lazy enumerables (Ruby >= 2.1), we can reuse Enumerable#sorted?:
module Enumerable
def sorted_by?(&block)
lazy.map(&block).sorted?
end
end
You can compare them two by two:
[["a", 3],["b",53],["c",2]].each_cons(2).all?{|p, n| (p <=> n) != 1} # => true
reduce can compare each element to the one before, and stop when it finds one out of order:
array.reduce{|prev,l| break unless l[0] >= prev[0]; l}
If it turns out the array isn't sorted, will your next action always be to sort it? For that use case (though of course depending on the number of times the array will already be sorted), you may not want to check whether it is sorted, but instead simply choose to always sort the array. Sorting an already sorted array is pretty efficient with many algorithms and merely checking whether an array is already sorted is not much less work, making checking + sorting more work than simply always sorting.
def ascending? (array)
yes = true
array.reduce { |l, r| break unless yes &= (l[0] <= r[0]); l }
yes
end
def descending? (array)
yes = true
array.reduce { |l, r| break unless yes &= (l[0] >= r[0]); l }
yes
end
Iterate over the objects and make sure each following element is >= the current element (or previous is <=, obviously) the current element.
For this to work efficiently you will want to sort during insertion.
If you are dealing with unique items, a SortedSet is also an option.
For clarification, if we patch array to allow for a sorted insertion, then we can keep the array in a sorted state:
class Array
def add_sorted(o)
size = self.size
if size == 0
self << o
elsif self.last < o
self << o
elsif self.first > o
self.insert(0, o)
else
# This portion can be improved by using a binary search instead of linear
self.each_with_index {|n, i| if n > o; self.insert(i, o); break; end}
end
end
end
a = []
12.times{a.add_sorted(Random.rand(10))}
p a # => [1, 1, 2, 2, 3, 4, 5, 5, 5, 5, 7]
or to use the built in sort:
class Array
def add_sorted2(o)
self << o
self.sort
end
end
or, if you are dealing with unique items:
require "set"
b = SortedSet.new
12.times{b << Random.rand(10)}
p b # => #<SortedSet: {1, 3, 4, 5, 6, 7, 8, 9}>
These are all way too hard. You don't have to sort, but you can use sort to check. Scrambled array below for demonstration purposes.
arr = [["b",3],["a",53],["c",2]]
arr.sort == arr # => false
p arr.sort # => [["a",53],["b",3],["c",2]]

ruby noob: are hashes speedy and optimal for storage or should I make a tuple?

This is a pretty simple problem im working on in Ruby, but im a total noob so I want to learn the most correct solution. I have a list of data with a name and a value. I need to remember all those (obvious answer: hash). But, i also need to remember the order of this data. SO it looks like:
x=1
y=2
z=3
I was thinking about making an array of 1 element hashes:
[0 => {'x' => 1},
1 => {'y' => 2},
2 => {'z' => 3}]
Are hashes the best choice in this situation? Is there some reason they would be slow, or not optimal?
Use Ruby 1.9. Hashes are ordered here.
You could try OrderedHash from ActiveSupport or Dictionary using Ruby Facets.
Performance permitting, an associate array would work. An associative array is an array of arrays, with each sub-array containing two elements:
a = [
[:x, 1],
[:y, 2],
[:z, 3],
]
You can use assoc to find a sub-array by its first element:
p a.assoc(:x) # [:x, 1]
p a.assoc(:x).last # 1
Or rassoc to find a sub-array by its last element:
p a.rassoc(2) # [:y, 2]
p a.rassoc(2).first # :y
Whether or not this approach will work for you depend upon the size of the list and how often you have to search it. On my machine, finding the last element in a 1000 element list takes about 100 microseconds.
Another approach is to use a plain, unordered (in Ruby <= 1.8.7) hash:
h = {
:x => 1,
:y => 2,
:z => 3,
}
And order it at the time you do an operation where order matters:
sorted_h = h.sort_by do |key, value|
key.to_s
end.each do |key, value|
p [key, value]
end
# [:x, 1]
# [:y, 2]
# [:z, 3]
This approach is good for alphabetical or numerical ordering (for example). It's not so good for order of insertion.

Resources