eql? behavior after redefining Object#hash - ruby

Ruby API says:
The eql? method returns true if obj and other refer to the same hash key.
I changed the hash method for Object:
class Object
def hash
1
end
end
Object.new.hash == Object.new.hash
# => true
Object.new.eql? Object.new
# => false
I don't understand why the second statement returns false; according to Ruby Object API above, it should return true.

That's not what the docs say, and "the same hash key" isn't really relevant to the code you post.
hash creates a hash key, with the implication that a.eql?(b) means a.hash == b.hash. That's different than breaking hash and expecting an unmodified eql? to work the way you expect.
eql? must be overridden to provide the semantics you want, e.g., a custom class could override eql? to provide a domain-specific equivalency. The above hash contract implications would still need to be followed if you want other code to work appropriately.
(This is similar to the Java mantra "override hashCode if you override equals, e.g., http://www.xyzws.com/javafaq/why-always-override-hashcode-if-overriding-equals/20.)

This is a documentation bug. You read it correctly, but the documentation is contradictory.
On the one hand, the documentation says:
The eql? method returns true if obj and other refer to the same hash key.
from which you can expect as you did in your question:
Object.new.eql? Object.new
# => true
On the other hand, it also says:
For objects of class Object, eql? is synonymous with ==.
where the definition of == is given as:
At the Object level, == returns true only if obj and other are the same object.
It logically follows that:
For objects of class Object, eql? returns true only if obj and other are the same object.
from which you should expect:
Object.new.eql? Object.new
# => false
So the documentation makes contradictory claims. You relied on one of them, and made an expectation, but looking at the actual result, the reality seems to support the second claim.

You're creating two new objects, they will never be the same.
a = Object.new
=> #<Object:0x007fd16b35c8b8>
b = Object.new
=> #<Object:0x007fd16b355540>
And I will refer you back to this SO question

Related

How exactly does "#eql?" rely on "#hash"?

The Ruby docs read as follows:
The eql? method returns true if obj and other refer to the same hash key.
So in order to use #eql? to compare two objects (or use objects as Hash keys), the object has to implement #hash in a meaningful manner.
How come the following happens?
class EqlTest
def hash
123
end
end
a = EqlTest.new
b = EqlTest.new
a.hash == b.hash # => true
a.eql? b # => false
I could of course implement EqlTest#eql? but shouldn't the implementation inherited from Object be something along the lines of hash == other.hash already?
Thanks for your hints!
This seems to be actually the other way around. eql? is expected to return true for objects returning the same hash value, but it is not defined to compare these values. You are simply expected to override both.
The eql? method returns true if obj and other refer to the same hash key. This is used by Hash to test members for equality. For any pair of objects where eql? returns true, the hash value of both objects must be equal. So any subclass that overrides eql? should also override hash appropriately.

Is the order of the equality operator important in Ruby?

I have used the bcrypt library in my Ruby program. I noticed that the order of the equality operator seems to be important. Depending on which variable is left or right of the '==' I get a different result.
Here is an example program:
require 'bcrypt'
my_pw = "pw1"
puts "This is my unhashed password: #{my_pw}"
hashed_pw = BCrypt::Password.create(my_pw)
puts "This is my hashed password: #{hashed_pw}"
20.times{print"-"}
puts
puts "my_pw == hashed_pw equals:"
if (my_pw == hashed_pw)
puts "TRUE"
else
puts "FALSE"
end
puts "hashed_pw == my_pw equals:"
if (hashed_pw == my_pw)
puts "TRUE"
else
puts "FALSE"
end
Regards
schande
Yes, there is a difference.
my_pw == hashed_pw calls the == method on the my_pw string and passes hashed_pw as an argument. That means you are using the String#== method. From the docs of String#==:
string == object → true or false
Returns true if object has the same length and content; as self; false otherwise
Whereas hashed_pw == my_pw calls the == method on an instance of BCrypt::Password and passes my_pw as an argument. From the docs of BCrypt::Password#==:
#==(secret) ⇒ Object
Compares a potential secret against the hash. Returns true if the secret is the original secret, false otherwise.
This doesn't really have anything to do with equality. This is just fundamentals of Object-Oriented Programming.
In OOP, all computation is done by objects sending messages to other objects. And one of the fundamental properties of OOP is that the receiver object and only the receiver object decides how to respond to this message. This is how encapsulation, data hiding, and abstraction are achieved in OOP.
So, if you send the message m to object a passing b as the argument, then a gets to decide how to interpret this message. If you send the message m to object b passing a as the argument, then it is b which gets to decide how to interpret this message. There is no built-in mechanism that would guarantee that a and b interpret this message the same. Only if the two objects decide to coordinate with each other, will the response actually be the same.
If you think about, it would be very weird if 2 - 3 and 3 - 2 had the same result.
That is exactly what is happening here: In the first example, you are sending the message == to my_pw, passing hashed_pw as the argument. my_pw is an instance of the String class, thus the == message will be dispatched to the String#== method. This method knows how to compare the receiver object to another String. It does, however, not know how to compare the receiver to a BCrypt::Password, which is what the class of hashed_pw is.
And if you think about it, that makes sense: BCrypt::Password is a third-party class from outside of Ruby, how could a built-in Ruby class know about something that didn't even exist at the time the String class was implemented?
In your second example, on the other hand, you are sending the message == to hashed_pw, passing my_pw as the argument. This message gets dispatched to the BCrypt::Password#== method, which does know how to compare the receiver with a String:
Method: BCrypt::Password#==
Defined in: lib/bcrypt/password.rb
#==(secret) ⇒ Object
Also known as: is_password?
Compares a potential secret against the hash. Returns true if the secret is the original secret, false otherwise.
Actually, the problem in this particular case is even more subtle than it may at first appear.
I wrote above that String#== doesn't know what to do with a BCrypt::Password as an argument, because it only knows how to compare Strings. Well, actually BCrypt::Password inherits from String, meaning that a BCrypt::Password IS-A String, so String#== should know how to handle it!
But think about what String#== does:
string == object → true or false
Returns true if object has the same length and content; as self; false otherwise […]
Think about this: "returns true if object has the same length and content". For a hash, this will practically never be true. self will be something like 'P#$$w0rd!' and object will be something like '$2y$12$bxWYpd83lWyIr.dF62eO7.gp4ldf2hMxDofXPwdDZsnc2bCE7hN9q', so clearly, they are neither the same length nor the same content. And object cannot possibly be the same content because the whole point of a cryptographically secure password hash is that you cannot reverse it. So, even if object somehow wanted to present itself as the original password, it couldn't do it.
The only way this would work, is if String and BCrypt::Password could somehow "work together" to figure out equality.
Now, if we look close at the documentation of String#==, there is actually a way to make this work:
If object is not an instance of String but responds to to_str, then the two strings are compared using object.==.
So, if the author of BCrypt::Password had made a different design decision, then it would work:
Do not let BCrypt::Password inherit from String.
Implement BCrypt::Password#to_str. This will actually allow BCrypt::Password to be used practically interchangeably with String because any method that accepts Strings should also accept objects that respond to to_str.
Now, as per the documentation of String#==, if you write my_pw == hashed_pw, the following happens:
String#== notices that hashed_pw is not a String.
String#== notices that hashed_pw does respond to to_str.
Therefore, it will send the message object == self (which in our case is equivalent to hashed_pw == my_pw), which means we are now in the second scenario from your question, which works just fine.
Here's an example of how that would work:
class Pwd
def initialize(s)
#s = s.downcase.freeze
end
def to_str
p __callee__
#s.dup.freeze
end
def ==(other)
p __callee__, other
#s == other.downcase
end
end
p = Pwd.new('HELLO')
s = 'hello'
p == s
# :==
# "hello"
#=> true
s == p
# :==
# "hello"
#=> true
As you can see, we are getting the results we are expecting, and Pwd#== gets called both times. Also, to_str never gets called, it only gets inspected by String#==.
So, it turns out that ironically, the problem is not so much that String#== doesn't know how to deal with BCrypt::Password objects, but actually the problem is that it does know how to deal with them as generic Strings. If they weren't Strings but merely responded to to_str, then String#== would actually know to ask them for help.
Numeric objects in Ruby have a whole coercion protocol to make sure arithmetic operations between different "number-like" operand types are supported even for third-party numerics libraries.
The expressions would be equivalent if both operands were, for instance, of type String. In your case, one operand is a String and the other one is a BCrypt::Password. Therefore my_pw == hashed_pw invokes the equality method defined in the String class, while hashed_pw == my_pw invokes the one defined in BCrypt::Password.
I have never worked with BCrypt::Password, but would expect that you get false for the former and true for the latter.
In Ruby you can override the equality method for a given class or instance:
class Test
define_method(:==) do |_other|
true
end
end
Test.new == 'foo' # => true
Test.new == nil # => true
Test.new == 42 # => true
'foo' == Test.new # => false
nil == Test.new # => false
42 == Test.new # => true
Generally speaking, it's considered bad practice to override equality without making it symetric, but you sometimes see it in the wild.

Ruby: Why does `#hash` need to overridden whenever `#eql?` is overridden?

In this presentation the speaker has created a value class.
In implementing it, he overrides #eql? and says that in Java development, the idiom is that whenever you override #eql? you must override #hash.
class Weight
# ...
def hash
pounds.hash
end
def eql?(other)
self.class == other.class &&
self.pounds == other.pounds
end
alias :== eql?
end
Firstly, what is the #hash method? I can see it returns an integer.
> 1.hash
=> -3708808305943022538
> 2.hash
=> 1196896681607723080
> 1.hash
=> -3708808305943022538
Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from. It's not defined on Numeric or Object. If I knew what this method did, I would probably understand why it needs to be overridden at the same time as #eql?.
So, why does #hash need to be overridden whenever eql? is overridden?
Firstly, what is the #hash method? I can see it returns an integer.
The #hash method is supposed to return a hash of the receiver. (The name of the method is a bit of a giveaway).
Using pry I can see that an integer responds to #hash but I cannot see where it inherits the method from.
There are dozens of questions of the type "Where does this method come from" on [so], and the answer is always the same: the best way to know where a method comes from, is to simply ask it:
hash_method = 1.method(:hash)
hash_method.owner #=> Kernel
So, #hash is inherited from Kernel. Note however, that there is a bit of a peculiar relationship between Object and Kernel, in that some methods that are implemented in Kernel are documented in Object or vice versa. This probably has historic reasons, and is now an unfortunate fact of life in the Ruby community.
Unfortunately, for reasons I don't understand, the documentation for Object#hash was deleted in 2017 in a commit ironically titled "Add documents". It is, however, still available in Ruby 2.4 (bold emphasis mine):
hash → integer
Generates an Integer hash value for this object. This function must have the property that a.eql?(b) implies a.hash == b.hash.
The hash value is used along with eql? by the Hash class to determine if two objects reference the same hash key. […]
So, as you can see, there is a deep and important relationship between #eql? and #hash, and in fact the correct behavior of methods that use #eql? and #hash depends on the fact that this relationship is maintained.
So, we know that the method is called #hash and thus likely computes a hash. We know it is used together with eql?, and we know that it is used in particular by the Hash class.
What does it do, exactly? Well, we all know what a hash function is: it is a function that maps a larger, potentially infinite, input space into a smaller, finite, output space. In particular, in this case, the input space is the space of all Ruby objects, and the output space is the "fast integers" (i.e. the ones that used to be called Fixnum).
And we know how a hash table works: values are placed in buckets based on the hash of their keys, if I want to find a value, then I only need to compute the hash of the key (which is fast) and know which bucket I find the value in (in constant time), as opposed to e.g. an array of key-value-pairs, where I need to compare the key against every key in the array (linear search) to find the value.
However, there is a problem: Since the output space of a hash is smaller than the input space, there are different objects which have the same hash value and thus end up in the same bucket. Thus, when two objects have different hash values, I know for a fact that they are different, but if they have the same hash value, then they could still be different, and I need to compare them for equality to be sure – and that's where the relationship between hash and equality comes from. Also note that when many keys and up in the same bucket, I will again have to compare the search key against every key in the bucket (linear search) to find the value.
From all this we can conclude the following properties of the #hash method:
It must return an Integer.
Not only that, it must return a "fast integer" (equivalent to the old Fixnums).
It must return the same integer for two objects that are considered equal.
It may return the same integer for two objects that are considered unequal.
However, it only should do so with low probability. (Otherwise, a Hash may degenerate into a linked list with highly degraded performance.)
It also should be hard to construct objects that are unequal but have the same hash value deliberately. (Otherwise, an attacker can force a Hash to degenerate into a linked list as a form of Degradation-of-Service attack.)
The #hash method returns a numeric hash value for the receiving object:
:symbol.hash # => 2507
Ruby Hashes are an implementation of the hash map data structure, and they use the value returned by #hash to determine if the same key is being referenced.
Hashes leverage the #eql? method in conjunction with #hash values to determine equality.
Given that these two methods work together to provide Hashes with information about equality, if you override #eql?, you need to also override #hash to keep your object's behavior consistent with other Ruby objects.
If you do NOT override it, this happens:
class Weight
attr_accessor :pounds
def eql?(other)
self.class == other.class && self.pounds == other.pounds
end
alias :== eql?
end
w1 = Weight.new
w2 = Weight.new
w1.pounds = 10
w2.pounds = 10
w1 == w2 # => true, these two objects should now be considered equal
weights_map = Hash.new
weights_map[w1] = '10 pounds'
weights_map[w2] = '10 pounds'
weights_map # => {#<Weight:0x007f942d0462f8 #pounds=10>=>"10 pounds", #<Weight:0x007f942d03c3c0 #pounds=10>=>"10 pounds"}
If w1 and w2 are considered equal, there should only be one key value pair in the hash. However, the Hash class is calling #hash which we did NOT override.
To fix this and truly make w1 and w2 equals, we override #hash to:
class Weight
def hash
pounds.hash
end
end
weights_map = Hash.new
weights_map[w1] = '10 pounds'
weights_map[w2] = '10 pounds'
weights_map # => {#<Weight:0x007f942d0462f8 #pounds=10>=>"10 pounds"}
Now hash knows these objects are equal and therefore stores only one key-value pair

Removing Identical Objects in Ruby?

I am writing a Ruby app at the moment which is going to search twitter for various things. One of the problems I am going to face is shared results between searches in close proximity to each other time-wise. The results are returned in an array of objects each of which is a single tweet. I know of the Array.uniq method in ruby which returns an array with all the duplicates removed.
My question is this. Does the uniq method remove duplicates in so far as these objects point to the same space in memory or that they contain identical information?
If the former, whats the best way of removing duplicates from an array based on their content?
Does the uniq method remove duplicates
in so far as these objects point to
the same space in memory or that they
contain identical information?
The method relies on the eql? method so it removes all the elements where a.eql?(b) returns true.
The exact behavior depends on the specific object you are dealing with.
Strings, for example, are considered equal if they contain the same text regardless they share the same memory allocation.
a = b = "foo"
c = "foo"
[a, b, c].uniq
# => ["foo"]
This is true for the most part of core objects but not for ruby objects.
class Foo
end
a = Foo.new
b = Foo.new
a.eql? b
# => false
Ruby encourages you to redefine the == operator depending on your class context.
In your specific case I would suggest to create an object representing a twitter result and implement your comparison logic so that Array.uniq will behave as you expect.
class Result
attr_accessor :text, :notes
def initialize(text = nil, notes = nil)
self.text = text
self.notes = notes
end
def ==(other)
other.class == self.class &&
other.text == self.text
end
alias :eql? :==
end
a = Result.new("first")
b = Result.new("first")
c = Result.new("third")
[a, b, c].uniq
# => [a, c]
For anyone else stumbling upon this question, it looks like things have changed a bit since this question was first asked and in newer Ruby versions (1.9.3 at least), Array.uniq assumes that your object also has a meaningful implementation of the #hash method, in addition to .eql? or ==.
uniq uses eql?, as documented in this thread.
See the official ruby documentation for the distinction between ==, equal?, and eql?.
I believe that Array.uniq detects duplicates via the objects' eql? or == methods, which means its comparing based on content, not location in memory (assuming the objects provide a meaningful implementation of eql? based on content).

Any good reason for Ruby to have == AND eql? ? (similarly with to_a and to_ary)

I know eql? is used by Hashes to see if an object matches a key*, and you do
def ==(rb)
if you want to support the == operator, but there must be a good reason that Hashes don't use == instead. Why is that? When are you going to have definitions for == and eql? that are not equivalent (e.g. one is an alias to the other) ?
Similarly, why have to_ary in addition to to_a?
This question came up in response to an answer someone gave me on another question.
* Of course, a Hash also assumes eql? == true implies that the hashes codes are equal.
Also, is it basically a terribly idea to override equal? ?
== checks if two values are equal, while eql? checks if they are equal AND of the same type.
irb(main):001:0> someint = 17
=> 17
irb(main):002:0> someint == 17
=> true
irb(main):003:0> someint.eql? 17
=> true
irb(main):004:0> someint.eql? 17.0
=> false
irb(main):005:0> someint == 17.0
=> true
irb(main):006:0>
as you can see above, eql? will also test if both values are the same type. In the case of comparing to 17.0, which equates to false, it is because someint was not a floating point value.
I don't know the reasoning for this particular choice in ruby, but I'll just point out that equality is a difficult concept.
Common Lisp, for example has eq, eql, equal, equalp, and for that matter =
It can be very useful to be able to tell the difference between two references to the same object, two different objects of the same type with the same value, two objects with the same value but of different types, etc. How many variations make sense depends on what makes sense in the language.
If I recall it correctly (I don't use ruby), rubys predicates are implementing three of these cases
== is equality of value
eql? is equality of value and type
equal? is true only for the same object
This mentions that to_a and to_ary (and to_s and to_str , and to_i and to_int) have different levels of strictness. For example,
17.to_s
makes sense,
17.to_str
doesn't.
It seems that there is no to_ary method for the Hash class (no to_a), but for the Array class, to_a and to_ary have different behavior :
to_a :
Returns self. If called on a subclass of Array, converts the receiver to an Array object.
to_ary :
Returns self.
The answers above more that answer about eql? but here is something on to_a and to_ary.
In Ruby's duck-typing scheme, objects can be converted two ways--loosely and firmly. Loose conversion is like saying:
Can foo represent itself as an array (to_a). This is what to_a, to_s, to_i and other single letter ones are for. So a String can represent itself as an array, so it implements to_a. Firm conversion says something very different: Is foo a string (to_ary). Note that this is not wheather the class of foo is String, but whether foo and strings are interchangeable--whether anywhere a string is expected a foo could be logically used. The example in my Ruby book is of a Roman numeral class. We can use a Roman numeral anywhere we could use a positive integer, so Roman can implement to_int.
Classes that have an are interchangeable relationship need to implement firm conversion, while loose is for almost all classes. Make sure not to use firm conversion where loose is right--code built into the interpreter will severely misunderstand you and you'll end up with bugs comparable to C++'s reinterpet_cast<>. Not good.

Resources