The Ruby docs read as follows:
The eql? method returns true if obj and other refer to the same hash key.
So in order to use #eql? to compare two objects (or use objects as Hash keys), the object has to implement #hash in a meaningful manner.
How come the following happens?
class EqlTest
def hash
123
end
end
a = EqlTest.new
b = EqlTest.new
a.hash == b.hash # => true
a.eql? b # => false
I could of course implement EqlTest#eql? but shouldn't the implementation inherited from Object be something along the lines of hash == other.hash already?
Thanks for your hints!
This seems to be actually the other way around. eql? is expected to return true for objects returning the same hash value, but it is not defined to compare these values. You are simply expected to override both.
The eql? method returns true if obj and other refer to the same hash key. This is used by Hash to test members for equality. For any pair of objects where eql? returns true, the hash value of both objects must be equal. So any subclass that overrides eql? should also override hash appropriately.
Related
In this presentation the speaker has created a value class.
In implementing it, he overrides #eql? and says that in Java development, the idiom is that whenever you override #eql? you must override #hash.
class Weight
# ...
def hash
pounds.hash
end
def eql?(other)
self.class == other.class &&
self.pounds == other.pounds
end
alias :== eql?
end
Firstly, what is the #hash method? I can see it returns an integer.
> 1.hash
=> -3708808305943022538
> 2.hash
=> 1196896681607723080
> 1.hash
=> -3708808305943022538
Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from. It's not defined on Numeric or Object. If I knew what this method did, I would probably understand why it needs to be overridden at the same time as #eql?.
So, why does #hash need to be overridden whenever eql? is overridden?
Firstly, what is the #hash method? I can see it returns an integer.
The #hash method is supposed to return a hash of the receiver. (The name of the method is a bit of a giveaway).
Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from.
There are dozens of questions of the type "Where does this method come from" on [so], and the answer is always the same: the best way to know where a method comes from, is to simply ask it:
hash_method = 1.method(:hash)
hash_method.owner #=> Kernel
So, #hash is inherited from Kernel. Note however, that there is a bit of a peculiar relationship between Object and Kernel, in that some methods that are implemented in Kernel are documented in Object or vice versa. This probably has historic reasons, and is now an unfortunate fact of life in the Ruby community.
Unfortunately, for reasons I don't understand, the documentation for Object#hash was deleted in 2017 in a commit ironically titled "Add documents". It is, however, still available in Ruby 2.4 (bold emphasis mine):
hash → integer
Generates an Integer hash value for this object. This function must have the property that a.eql?(b) implies a.hash == b.hash.
The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key. […]
So, as you can see, there is a deep and important relationship between #eql? and #hash, and in fact the correct behavior of methods that use #eql? and #hash depends on the fact that this relationship is maintained.
So, we know that the method is called #hash and thus likely computes a hash. We know it is used together with eql?, and we know that it is used in particular by the Hash class.
What does it do, exactly? Well, we all know what a hash function is: it is a function that maps a larger, potentially infinite, input space into a smaller, finite, output space. In particular, in this case, the input space is the space of all Ruby objects, and the output space is the "fast integers" (i.e. the ones that used to be called Fixnum).
And we know how a hash table works: values are placed in buckets based on the hash of their keys, if I want to find a value, then I only need to compute the hash of the key (which is fast) and know which bucket I find the value in (in constant time), as opposed to e.g. an array of key-value-pairs, where I need to compare the key against every key in the array (linear search) to find the value.
However, there is a problem: Since the output space of a hash is smaller than the input space, there are different objects which have the same hash value and thus end up in the same bucket. Thus, when two objects have different hash values, I know for a fact that they are different, but if they have the same hash value, then they could still be different, and I need to compare them for equality to be sure – and that's where the relationship between hash and equality comes from. Also note that when many keys and up in the same bucket, I will again have to compare the search key against every key in the bucket (linear search) to find the value.
From all this we can conclude the following properties of the #hash method:
It must return an Integer.
Not only that, it must return a "fast integer" (equivalent to the old Fixnums).
It must return the same integer for two objects that are considered equal.
It may return the same integer for two objects that are considered unequal.
However, it only should do so with low probability. (Otherwise, a Hash may degenerate into a linked list with highly degraded performance.)
It also should be hard to construct objects that are unequal but have the same hash value deliberately. (Otherwise, an attacker can force a Hash to degenerate into a linked list as a form of Degradation-of-Service attack.)
The #hash method returns a numeric hash value for the receiving object:
:symbol.hash # => 2507
Ruby Hashes are an implementation of the hash map data structure, and they use the value returned by #hash to determine if the same key is being referenced.
Hashes leverage the #eql? method in conjunction with #hash values to determine equality.
Given that these two methods work together to provide Hashes with information about equality, if you override #eql?, you need to also override #hash to keep your object's behavior consistent with other Ruby objects.
If you do NOT override it, this happens:
class Weight
attr_accessor :pounds
def eql?(other)
self.class == other.class && self.pounds == other.pounds
end
alias :== eql?
end
w1 = Weight.new
w2 = Weight.new
w1.pounds = 10
w2.pounds = 10
w1 == w2 # => true, these two objects should now be considered equal
weights_map = Hash.new
weights_map[w1] = '10 pounds'
weights_map[w2] = '10 pounds'
weights_map # => {#<Weight:0x007f942d0462f8 #pounds=10>=>"10 pounds", #<Weight:0x007f942d03c3c0 #pounds=10>=>"10 pounds"}
If w1 and w2 are considered equal, there should only be one key value pair in the hash. However, the Hash class is calling #hash which we did NOT override.
To fix this and truly make w1 and w2 equals, we override #hash to:
class Weight
def hash
pounds.hash
end
end
weights_map = Hash.new
weights_map[w1] = '10 pounds'
weights_map[w2] = '10 pounds'
weights_map # => {#<Weight:0x007f942d0462f8 #pounds=10>=>"10 pounds"}
Now hash knows these objects are equal and therefore stores only one key-value pair
Ruby API says:
The eql? method returns true if obj and other refer to the same hash key.
I changed the hash method for Object:
class Object
def hash
1
end
end
Object.new.hash == Object.new.hash
# => true
Object.new.eql? Object.new
# => false
I don't understand why the second statement returns false; according to Ruby Object API above, it should return true.
That's not what the docs say, and "the same hash key" isn't really relevant to the code you post.
hash creates a hash key, with the implication that a.eql?(b) means a.hash == b.hash. That's different than breaking hash and expecting an unmodified eql? to work the way you expect.
eql? must be overridden to provide the semantics you want, e.g., a custom class could override eql? to provide a domain-specific equivalency. The above hash contract implications would still need to be followed if you want other code to work appropriately.
(This is similar to the Java mantra "override hashCode if you override equals, e.g., http://www.xyzws.com/javafaq/why-always-override-hashcode-if-overriding-equals/20.)
This is a documentation bug. You read it correctly, but the documentation is contradictory.
On the one hand, the documentation says:
The eql? method returns true if obj and other refer to the same hash key.
from which you can expect as you did in your question:
Object.new.eql? Object.new
# => true
On the other hand, it also says:
For objects of class Object, eql? is synonymous with ==.
where the definition of == is given as:
At the Object level, == returns true only if obj and other are the same object.
It logically follows that:
For objects of class Object, eql? returns true only if obj and other are the same object.
from which you should expect:
Object.new.eql? Object.new
# => false
So the documentation makes contradictory claims. You relied on one of them, and made an expectation, but looking at the actual result, the reality seems to support the second claim.
You're creating two new objects, they will never be the same.
a = Object.new
=> #<Object:0x007fd16b35c8b8>
b = Object.new
=> #<Object:0x007fd16b355540>
And I will refer you back to this SO question
I'm learning Ruby. I've got the O'Reilly book, "The Ruby Programming Language," which states unequivocally:
"Object class implements the hash method to simply return an object’s ID."
I've also seen this assertion in other books: http://my.safaribooksonline.com/book/web-development/ruby/9780321700308/create-classes-that-understand-equality/ch12lev1sec8
But when I run this code, the two lines do not generate the same number:
myObject = Object.new
puts myObject.hash
puts myObject.object_id
So what's the deal? I'm running Ruby 1.9.3.
The Object implementation hashes the object_id. The value isn't the object_id, but the object_id is the input to the hash function.
Via https://github.com/ruby/ruby/blob/trunk/object.c#L110
VALUE
rb_obj_hash(VALUE obj)
{
VALUE oid = rb_obj_id(obj);
st_index_t h = rb_hash_end(rb_hash_start(NUM2LONG(oid)));
return LONG2FIX(h);
}
object_id → fixnum
Returns an integer identifier for obj. The same number will be returned on all calls to id for a given object, and no two active objects will share an id.
hash()
Generates a Fixnum hash value for this object. This function must have the property that a.eql?(b) implies a.hash == b.hash. The hash value is used by class Hash. Any hash value that exceeds the capacity of a Fixnum will be truncated before being used.
Resuming: integer identifier is not generated hash.
http://ruby-doc.org/core-1.9.3/Object.html#method-i-hash
http://ruby-doc.org/core-1.9.3/Object.html#method-i-object_id
In Ruby object_id is an instance method of Object.
hash is also an instance method of Object, but it has been re-writed in each subclasses.
like String:
There has no evidence to show that hash should return the save value as object_id. They are created for difference purposes.
I have a class Foo with a few member variables. When all values in two instances of the class are equal I want the objects to be 'equal'. I'd then like these objects to be keys in my hash. When I currently try this, the hash treats each instance as unequal.
h = {}
f1 = Foo.new(a,b)
f2 = Foo.new(a,b)
f1 and f2 should be equal at this point.
h[f1] = 7
h[f2] = 8
puts h[f1]
should print 8
See http://ruby-doc.org/core/classes/Hash.html
Hash uses key.eql? to test keys for
equality. If you need to use instances
of your own classes as keys in a Hash,
it is recommended that you define both
the eql? and hash methods. The hash
method must have the property that
a.eql?(b) implies a.hash == b.hash.
The eql? method is easy to implement: return true if all member variables are the same. For the hash method, use [#data1, #data2].hash as Marc-Andre suggests in the comments.
Add a method called 'hash' to your class:
class Foo
def hash
return whatever_munge_of_instance_variables_you_like
end
end
This will work the way you requested and won't generate different hash keys for different, but identical, objects.
I know eql? is used by Hashes to see if an object matches a key*, and you do
def ==(rb)
if you want to support the == operator, but there must be a good reason that Hashes don't use == instead. Why is that? When are you going to have definitions for == and eql? that are not equivalent (e.g. one is an alias to the other) ?
Similarly, why have to_ary in addition to to_a?
This question came up in response to an answer someone gave me on another question.
* Of course, a Hash also assumes eql? == true implies that the hashes codes are equal.
Also, is it basically a terribly idea to override equal? ?
== checks if two values are equal, while eql? checks if they are equal AND of the same type.
irb(main):001:0> someint = 17
=> 17
irb(main):002:0> someint == 17
=> true
irb(main):003:0> someint.eql? 17
=> true
irb(main):004:0> someint.eql? 17.0
=> false
irb(main):005:0> someint == 17.0
=> true
irb(main):006:0>
as you can see above, eql? will also test if both values are the same type. In the case of comparing to 17.0, which equates to false, it is because someint was not a floating point value.
I don't know the reasoning for this particular choice in ruby, but I'll just point out that equality is a difficult concept.
Common Lisp, for example has eq, eql, equal, equalp, and for that matter =
It can be very useful to be able to tell the difference between two references to the same object, two different objects of the same type with the same value, two objects with the same value but of different types, etc. How many variations make sense depends on what makes sense in the language.
If I recall it correctly (I don't use ruby), rubys predicates are implementing three of these cases
== is equality of value
eql? is equality of value and type
equal? is true only for the same object
This mentions that to_a and to_ary (and to_s and to_str , and to_i and to_int) have different levels of strictness. For example,
17.to_s
makes sense,
17.to_str
doesn't.
It seems that there is no to_ary method for the Hash class (no to_a), but for the Array class, to_a and to_ary have different behavior :
to_a :
Returns self. If called on a subclass of Array, converts the receiver to an Array object.
to_ary :
Returns self.
The answers above more that answer about eql? but here is something on to_a and to_ary.
In Ruby's duck-typing scheme, objects can be converted two ways--loosely and firmly. Loose conversion is like saying:
Can foo represent itself as an array (to_a). This is what to_a, to_s, to_i and other single letter ones are for. So a String can represent itself as an array, so it implements to_a. Firm conversion says something very different: Is foo a string (to_ary). Note that this is not wheather the class of foo is String, but whether foo and strings are interchangeable--whether anywhere a string is expected a foo could be logically used. The example in my Ruby book is of a Roman numeral class. We can use a Roman numeral anywhere we could use a positive integer, so Roman can implement to_int.
Classes that have an are interchangeable relationship need to implement firm conversion, while loose is for almost all classes. Make sure not to use firm conversion where loose is right--code built into the interpreter will severely misunderstand you and you'll end up with bugs comparable to C++'s reinterpet_cast<>. Not good.