Ruby koans's assertion on test_hash_is_unordered - ruby

In Ruby 1.9 a Hash is sorted on the basis of order of insertion.
Why the Ruby koans's assertion on test_hash_is_unordered method returns true?
To me, the method's title is quite misleading... maybe it refers to the fact that Ruby will recognize 2 equal hashes that were created with different keys order insetions.
But, theorically, this kind of assertion:
hash1 = { :one => "uno", :two => "dos" }
hash2 = { :two => "dos", :one => "uno" }
assert_equal ___, hash1 == hash2
Should return false. Or not?

From the fine manual:
hsh == other_hash → true or false
Equality—Two hashes are equal if they each contain the same number of keys and if each key-value pair is equal to (according to Object#==) the corresponding elements in the other hash.
So two Hashes are considered equal if they have the same key/value pairs regardless of order.
The examples in the documentation even contain this:
h2 = { 7 => 35, "c" => 2, "a" => 1 }
h3 = { "a" => 1, "c" => 2, 7 => 35 }
h2 == h3 #=> true
Yes, the test_hash_is_unordered title is somewhat misleading as order isn't specifically being testing, only order with respect to equality is being demonstrated.

I think it's just a question of what 'unordered' means in such a context.
As a human, I would find it very difficult to compare two sets if they were not in order. The problem is that I can not easily match up identical elements and see if the sets are equivalent. Unless the sets happened to be listed in the same order, I would see them as unequal. This seems to be the conclusion that you came to also.
However, the thing is that the order of items in a mathematical concept of a set is simply unimportant. There is no way to 'order' the items, so two sets are identical if they contain the same elements. The set items are unordered, but they are are not 'out of order'; the concept of order does not apply.
I suppose that this is encapsulated entirely in the expression 'hash_is_unordered' but that was not immediately obvious to me, at least!

Related

Detecting if a key-value pair exists within a hash

I cannot find a way to determine if a key-value pair exists in a hash.
h4 = { "a" => 1, "d" => 2, "f" => 35 }
I can use Hash#has_value? and Hash#has_key? to find information about a key or a value individually, but how can I check if a pair exists?
Psuedo-code of what I'm after:
if h4.contains_pair?("a", 1)
Just use this:
h4['a'] == 1
It seems excessive to me, but you could monkey-patch Hash with a method like so:
class Hash
def contains_pair?(key, value)
key?(key) && self[key] == value
end
end
I confess to starting down a road and then wondering where it might take me. This may not be the best way of determining if a key/value pair is present in a hash (how could one improve on #Jordan's answer?), but I learned something along the way.
Code
def pair_present?(h,k,v)
Enumerable.instance_method(:include?).bind(h).call([k,v])
end
Examples
h = { "a"=>1, "d"=>2, "f"=>35 }
pair_present?(h,'a',1)
#=> true
pair_present?(h,'f',36)
#=> false
pair_present?(h,'hippopotamus',2)
#=> false
Discussion
We could of course convert the hash to an array and then apply Array#include?:
h.to_a.include?(['a', 1])
#=> true
but that has the downside of creating a temporary array. It would be nice if the class Hash had such an instance method, but it does not. One might think Hash#include? might be used for that, but it just takes one argument, a key.1.
The method Enumerable#include? does what we want, and of course Hash includes the Enumerable module. We can invoke that method in various ways.
Bind Enumerable#include? to the hash and call it
This was of course my answer:
Enumerable.instance_method(:include?).bind(h).call([k,v])
Use the method Method#super_method, which was introduced in v2.2.
h.method(:include?).super_method.call(['a',1])
#=> true
h.method(:include?).super_method.call(['a',2])
#=> false
Note that:
h.method(:include?).super_method
#=> #<Method: Object(Enumerable)#include?>
Do the alias_method/remove_method merry-go-round
Hash.send(:alias_method, :temp, :include?)
Hash.send(:remove_method, :include?)
h.include?(['a',1])
#=> true
h.include?(['a',2])
#=> false
Hash.send(:alias_method, :include?, :temp)
Hash.send(:remove_method, :temp)
Convert the hash to an enumerator and invoke Enumerable#include?
h.to_enum.include?(['a',1])
#=> true
h.to_enum.include?(['a',2])
#=> false
This works because the class Enumerator also includes Enumerable.
1 Hash#include? is the same as both Hash#key? and Hash#has_key?. It makes me wonder why include? isn't used for the present purpose, since determining if a hash has a given key is well-covered.
How about using Enumerable any?
h4 = { "a" => 1, "d" => 2, "f" => 35 }
h4.any? {|k,v| k == 'a' && v == 1 }

What's a clean way to sort a hash in Ruby without returning an array of key-value pair arrays?

When I sort a hash in Ruby it returns an array of key-value pair arrays.
I would like it to return a hash.
What's a clean way to do this? inject?
Hashes aren't really sortable objects. Since Ruby 1.9, they maintain keys in the order in which they were added, which is convenient, but in terms of the data structure, order is not relevant.
You can test this by comparing { a: 1, b: 2 } == { b: 2, a: 1 } #=> true. The same is not true for arrays, in which the order is an important feature.
You'll find many methods in Ruby actually convert hashes to enumerables, which are closer to arrays in that they have a defined order. When returning a value from something like sort, you get an array.
You can easily convert it back into a hash using Hash[...]:
Hash[hash.sort(...)]
You can use Hash#[]
h = {a: 1, b:0, c: 3}
arr = h.sort {|(_,v),(_, v2) | v <=> v2 }
h2 = Hash[arr] #=> {:b=>0, :a=>1, :c=>3}
Note that this is possible because hashes are enumerated based on insertion order in Ruby. This is not normally the case for hashes in other languages
As others have warned, it generally is not good practice to rely on the order of hash elements.
Assuming you want to sort on values, here's another way:
h = {a: 2, b: 1, c: 4, d: 0}
Hash[h.to_a.sort_by(&:last)]
# => {:d=>0, :b=>1, :a=>2, :c=>4}
If you want to have an associative array that is intrinsically sorted by its keys (instead of having to manually insert in the right order as in the other solutions proposed), you should have a look at the RBTree gem which implements a red-black tree.

Getting an array of hash values given specific keys

Given certain keys, I want to get an array of values from a hash (in the order I gave the keys). I had done this:
class Hash
def values_for_keys(*keys_requested)
result = []
keys_requested.each do |key|
result << self[key]
end
return result
end
end
I modified the Hash class because I do plan to use it almost everywhere in my code.
But I don't really like the idea of modifying a core class. Is there a builtin solution instead? (couldn't find any, so I had to write this).
You should be able to use values_at:
values_at(key, ...) → array
Return an array containing the values associated with the given keys. Also see Hash.select.
h = { "cat" => "feline", "dog" => "canine", "cow" => "bovine" }
h.values_at("cow", "cat") #=> ["bovine", "feline"]
The documentation doesn't specifically say anything about the order of the returned array but:
The example implies that the array will match the key order.
The standard implementation does things in the right order.
There's no other sensible way for the method to behave.
For example:
>> h = { :a => 'a', :b => 'b', :c => 'c' }
=> {:a=>"a", :b=>"b", :c=>"c"}
>> h.values_at(:c, :a)
=> ["c", "a"]
i will suggest you do this:
your_hash.select{|key,value| given_keys.include?(key)}.values

Efficient way to verify a large hash in Ruby tests

What is an efficient way to test that a hash contains specific keys and values?
By efficient I mean the following items:
easy to read output when failing
easy to read source of test
shortest test to still be functional
Sometimes in Ruby I must create a large hash. I would like to learn an efficient way to test these hashes:
expected_hash = {
:one => 'one',
:two => 'two',
:sub_hash1 => {
:one => 'one',
:two => 'two'
},
:sub_hash2 => {
:one => 'one',
:two => 'two'
}
}
I can test this has several ways. The two ways I use the most are the whole hash at once, or a single item:
assert_equal expected_hash, my_hash
assert_equal 'one', my_hash[:one]
These work for small hashes like our example hash, but for a very large hash these methods break down. The whole hash test will display too much information on a failure. And the single item test would make my test code too large.
I was thinking an efficient way would be to break up the tests into many smaller tests that validated only part of the hash. Each of these smaller tests could use the whole hash style test. Unfortunately I don't know how to get Ruby to do this for items not in a sub-hash. The sub-hashes can be done like the following:
assert_equal expected_hash[:sub_hash1], my_hash[:sub_hash1]
assert_equal expected_hash[:sub_hash2], my_hash[:sub_hash2]
How do I test the remaining parts of the hash?
When testing, you need to use manageable chunks. As you found, testing against huge hashes makes it difficult to track what is happening.
Consider converting your hashes to arrays using to_a. Then you can easily use the set operators to compare arrays, and find out what is missing/changed.
For instance:
[1,2] - [2,3]
=> [1]
[2,3] - [1,2]
=> [3]
You can use hash method on hashes to compare 2 of them or their parts:
h = {:test => 'test'}
=> {:test=>"test"}
h1 = {:test => 'test2'}
=> {:test=>"test2"}
h.hash
=> -1058076452551767024
h1.hash
=> 1300393442551759555
h.hash == h1.hash
=> false

Convert array-of-hashes to a hash-of-hashes, indexed by an attribute of the hashes

I've got an array of hashes representing objects as a response to an API call. I need to pull data from some of the hashes, and one particular key serves as an id for the hash object. I would like to convert the array into a hash with the keys as the ids, and the values as the original hash with that id.
Here's what I'm talking about:
api_response = [
{ :id => 1, :foo => 'bar' },
{ :id => 2, :foo => 'another bar' },
# ..
]
ideal_response = {
1 => { :id => 1, :foo => 'bar' },
2 => { :id => 2, :foo => 'another bar' },
# ..
}
There are two ways I could think of doing this.
Map the data to the ideal_response (below)
Use api_response.find { |x| x[:id] == i } for each record I need to access.
A method I'm unaware of, possibly involving a way of using map to build a hash, natively.
My method of mapping:
keys = data.map { |x| x[:id] }
mapped = Hash[*keys.zip(data).flatten]
I can't help but feel like there is a more performant, tidier way of doing this. Option 2 is very performant when there are a very minimal number of records that need to be accessed. Mapping excels here, but it starts to break down when there are a lot of records in the response. Thankfully, I don't expect there to be more than 50-100 records, so mapping is sufficient.
Is there a smarter, tidier, or more performant way of doing this in Ruby?
Ruby <= 2.0
> Hash[api_response.map { |r| [r[:id], r] }]
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
However, Hash::[] is pretty ugly and breaks the usual left-to-right OOP flow. That's why Facets proposed Enumerable#mash:
> require 'facets'
> api_response.mash { |r| [r[:id], r] }
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
This basic abstraction (convert enumerables to hashes) was asked to be included in Ruby long ago, alas, without luck.
Note that your use case is covered by Active Support: Enumerable#index_by
Ruby >= 2.1
[UPDATE] Still no love for Enumerable#mash, but now we have Array#to_h. It creates an intermediate array, but it's better than nothing:
> object = api_response.map { |r| [r[:id], r] }.to_h
Something like:
ideal_response = api_response.group_by{|i| i[:id]}
#=> {1=>[{:id=>1, :foo=>"bar"}], 2=>[{:id=>2, :foo=>"another bar"}]}
It uses Enumerable's group_by, which works on collections, returning matches for whatever key value you want. Because it expects to find multiple occurrences of matching key-value hits it appends them to arrays, so you end up with a hash of arrays of hashes. You could peel back the internal arrays if you wanted but could run a risk of overwriting content if two of your hash IDs collided. group_by avoids that with the inner array.
Accessing a particular element is easy:
ideal_response[1][0] #=> {:id=>1, :foo=>"bar"}
ideal_response[1][0][:foo] #=> "bar"
The way you show at the end of the question is another valid way of doing it. Both are reasonably fast and elegant.
For this I'd probably just go:
ideal_response = api_response.each_with_object(Hash.new) { |o, h| h[o[:id]] = o }
Not super pretty with the multiple brackets in the block but it does the trick with just a single iteration of the api_response.

Resources