What's the meaning of an object's hash value? And in which case does two object has the same hash value?? Also it is said that Array|Hash can't be Hash keys, this has something to do with object's hash value, why?
For objects to be stored in hashmaps or hashsets the following must hold true:
If two objects are considered equal, their hash value must also be equal.
If two objects are not considered equal, their hash value should be likely to be different (the more often two different objects have the same hash value, the worse the performance of operations on the hashmap/set).
So if two objects have the same hash value there is a good chance (but no guarantee) that they are equal.
What exactly "equal" means in the above is up to the implementor of the hash method. However you should always implement eql? to use the same definition of equality as hash.
For classes that don't override hash (i.e. classes using Object's hash implementation) hash equality is defined in terms of object identity. I.e. two objects are considered equal if and only if the reside at the same location in memory.
In ruby up to 1.8.6 Array and Hash did not override hash. So if you used arrays (or hashes) as hash keys, you could only retrieve the value for a key, if you used the exact same array as a key for retrieval (not an array with the same contents).
In ruby 1.8.7+ Array#hash and Hash#hash (as well as their eql? methods) are defined so that they are equal when their elements are equal.
A hash value has no inherent meaning, but it is a way of representing that object such that it can be differentiated from other objects of the same type. When you create an object, it needs to implement hash such that if two objects have the same hash value, they will also be equal. What it means for two objects to be equal depends on the object; if you define, say, a Person object, you might want to say that two instances of Person are equal if they have the same name, id number, and birthdate. Or whatever criteria you choose.
Using an array or a hash as a hash key will now work since both do implement hash (such that the hash value is based on their contents). However, you can run into trouble when using a modifiable object such as an array as a key if there's any chance you might modify it. For example, if you have a variable of type Array, and you use it as a key to put something into a hash, and then you add something to the array, and try to use that variable as the key to get the something back out of the hash, it won't work (as the array's hash value has changed). The solution to this issue is to call Rehash on your hash after you modify the array.
What you're looking for is the concept of hashing.
It's not just for objects, is a broader concept.
Let's start with basics:
There are 3 things to be considered before understanding Object.hash.
memory address = Address of an Object in RAM.
value = value of an Object.
Type = Type of an Object.
Now let's understand different Object comparison operators.
"eql?" checks if the value of two operands are equal.
"==" checks if the value and type of two operands are the same.
"equal?" checks is this the exact same object?
Example 1: If address of two Object is same then they are pointing to same memory location whose value and Type is same as well.
a="5"
b=a
a.object_id => 194639
b.object_id => 194639
a.eql?(b) => true
a==b => true
a.equal?(b) => true
Example 2: Below example demonstrates hash of string and integer "5" is different.
> "5".eql?(5) => false
> 5.eql?(5) => true
> "5".eql?("5") => true
> 5.hash => -3271834327245180286
> "5".hash => -3126022673147098896
Conclusion:
If value and type of object is same then it will have same hash.
Related
In Ruby, the to_s on an object includes an encoding of the object's id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
In the documentation it says
Returns a string representing obj. The default to_s prints the object’s class and an encoding of the object id.
https://apidock.com/ruby/Object/to_s
In the example above, the encoding of the object id is 0x00007fac5eb6afc8.
In How does object_id assignment work? they explain
In MRI the object_id of an object is the same as the VALUE that represents the object on the C level.
So I compared to the object_id and it is not the same as the encoding of the object id.
[2] pry(main)> shape = Shape.new(4,4)
=> #<Shape:0x00007fac5eb6afc8 #num_sides=4, #side_length=4>
[3] pry(main)> shape.object_id
=> 70189150066660
What exactly is the encoding of the object id? It does not appear to be the object_id.
Think of the object_id, or __id__ as the "pointer" for the object. It is not technically a pointer, but does contain a unique value that can be used to retrieve the internal C VALUE.
There are patterns to the value it has for some data types, as you can see with its hexadecimal representation with to_s. I am will not go into all the details, as there are already numerous answers on SO explaining, and already linked from comments, but integers (up to a FIXNUM_MAX, have predictable values, and special constants like true, false, and nil will always have the same object_id in every run.
To put simply, it is nothing more than a number, and shown as a hexadecimal (base 16) value, not any actual "encoding" or cypher.
Going to expand upon this a bit more in light of your latest edits to the question. As you posted, the hexadecimal number you see in to_s is the value of the internal C VALUE of the object. VALUE is a C data type (unsigned, pointer size number) that every Ruby object is represented as in C code. As #Stefan pointed out in a comment, for non-integer types (I speak only for MRI version), it is twice the value of the object_id. Not that you probably care, but you can shift the bits of an integer to predict the value for those.
Therefore, using you example.
A value of 0x00007fac5eb6afc8 is simple hexadecimal notation for a number. It uses a base 16 counting system as opposed to the base 10 decimal system we are more used to in everyday life. It is simply a different way of looking at the same number.
So, using that logic.
a = 0x00007fac5eb6afc8
#=> 140378300133320 # Decimal representation
a /= 2 # Remember, non-integers are half of this value
#=> 70189150066660 # Your object_id
The best answer you can get is: You don't know, and you shouldn't need to.
Ruby guarantees exactly three things about object IDs:
An object has the same ID during its lifetime.
No two objects have the same ID at the same time.
IDs are integers.
In particular, this means that you cannot rely on a specific object having a specific ID (for example, nil having ID 8). It also means that IDs can be re-used. You should think of it as nothing but opaque identifier.
And, as you quoted, the default Object#to_s uses "some" encoding of the ID.
And that is all you know, and all you should ever rely on. In particular, you should never try to parse IDs or Object#to_s.
So, the ID part of Object#to_s is "some unspecified encoding" of the ID, which itself is "some opaque identifier".
Everything else is deliberately left unspecified, so that different implementations can make different choices that make sense for their specific needs. For example, it would be stupid to tie object IDs to memory addresses, because implementations like JRuby, Opal, IronPython, MagLev, and Topaz run on platforms where the concept of "memory address" doesn't even exist! And Rubinius uses a moving garbage collector, where objects can move around in memory and thus their address changes.
I use Array.wrap(x) all the time in order to ensure that Array methods actually exist on an object before calling them.
What is the best way to similarly ensure a Hash?
Example:
def ensure_hash(x)
# TODO: this is what I'm looking for
end
values = [nil,1,[],{},'',:a,1.0]
values.all?{|x| ensure_hash(x).respond_to?(:keys) } # true
The best I've been able to come up with so far is:
Hash::try_convert(x) || {}
However, I would prefer something more elegant.
tl; dr: In an app with proper error handling, there is no "easy, care-free" way to handle something that may or may not be hashy.
From a conceptual standpoint, the answer is no. There is no similar solution as Array.wrap(x) for hashes.
An array is a collection of values. Single values can be stored outside of arrays (e.g. x = 42) , so it's a straight-forward task to wrap a value in an array (a = [42]).
A hash is a collection of key-value pairs. In ruby, single key-value pairs can't exist outside of a hash. The only way to express a key-value pair is with a hash: h = { v: 42 }
Of course, there are a thousand ways to express a key-value pair as a single value. You could use an array [k, v] or a delimited string `"k:v" or some more obscure method.
But at that point, you're no longer wrapping, you're parsing. Parsing relies on properly formatted data and has multiple points of failure. No matter how you look at it, if you find yourself in a situation where you may or may not have a hash, that means you need to write a proper chunk of code for data validation and parsing (or refactor your upstream code so that you can always expect a hash).
How does Ruby look for keys in a hash? I thought that as soon as it finds a key inside a hash, it returns its value without evaluating the other key/value pairs? But I guess I am wrong.
Eg,
test = {"a" => 10, "b" => 20, "c" => 30, "d" => 1/0}
now if i do test["a"], it returns error because of d's infinite value, if i remove "d", it works fine(which means it checks all the key/value pairs even if it finds a match in the first key). So if i search for a key in a really large hash, does Ruby evaluate every key/value for validity before returning the value for that particular hash? If that is the case, is there a way to break out of the hash as soon as it finds the key?
UPDATE
Just to clarify, I am trying to understand how it works in Ruby. So, for eg, if i have a hash with 500 key/value pairs(all valid not like 1/0), and lets say "a" is the first key. So if i do test["a"] on that large hash, does Ruby load all the key/value pairs in memory under the hood or just break out after it finds the key "a"?
The error you are getting occurs when ruby is creating the hash, not while accessing it - inserting the values into the hash clearly requires evaluating them.
There is no "loading" going on when fetching a value from a hash: the entirety of the hash is always in memory. A full explanation of hash tables is a bit out of scope but in a nutshell a hash works by hashing the key from which ruby derives which of the hashes buckets should contain the value. That bucket is then searched and the value is returned if the key is found.
I need to assign the numbers 0-100000 to a hash without giving a key.
Ruby uses Murmur as hash function. How can I add a value without having a key, like in C, letting it handle collision and other things. Is it possible? Can I give just the value to hash and let it evaluate the key, then insert to itself?
In a normal hashing operation, we have a hash function, and a table. We use value as the argument of the hash function, then we get a key in return. The value is inserted to the key location in the table (if a collision happens, double hashing or something else used).
Is it possible to do the same type of hashing in Ruby? Or am I stuck with default ways? Can I just throw the value into a function, then it evaluates the key, and inserts the value to the hash table or not?
Just store into the hash using the calculated hash of the key, rather than the key itself:
hash[hash_func(key)] = value
That is, instead of mapping key -> value directly, this maps hash_func(key) -> value. The implementation may pass your hashed key value through another hash function internally, but you needn't care about that.
However, in comments it now comes to light that you want to apply the hash function to the value, not any other key. In that case, just use a set and be done with it. Then, all you need to do is add values to the set:
s = Set.new
s.add(value)
There's no need to calculate the hash of anything; Set will take care of it for you.
In short, this seems to be a case of the XY Problem. You needed to store a set of values in a data structure (and presumably be able to check if those values were stored in an efficient manner). Instead of asking about this, you asked about hash functions and tables. If you had asked about what you really needed, instead of asking about something else that you thought you could use to solve the original problem, you would have had a useful answer much more quickly.
The common solution is to simply use the value as a key. Hence:
value = "xxx"
hash[value] = 1
This way, you clearly document that the actual values (all 1) of this particular hash are of no use, and you will get de-duplicated values. Hash will do the usual hashing internally, you don't need to worry about it at all.
I use 1 as value here, but the actual value is completely irrelevant. I don't use nil as that is the default return value of hash[nonexistant_value].
If your values are more complex, check out http://docs.ruby-lang.org/en/2.0.0/Hash.html for specifics about them.
In Ruby, there is a built-in class called Hash. According to the docs:
A Hash is a dictionary-like collection of unique keys and their values. Also called associative arrays, they are similar to Arrays, but where an Array uses integers as its index, a Hash allows you to use any object type.
...
A Hash can be easily created by using its implicit form:
grades = { "Jane Doe" => 10, "Jim Doe" => 6 }
So basically, they're associative arrays. If that's the case, then why are they called hashes? In Computer Science, isn't the term "hash" usually used to refer to a number or hexadecimal string of some kind generated by running some data through a hash function? In fact, Ruby objects even have a method called hash which "generates a Fixnum hash value for this object".
I'm aware that Hash in Ruby is implemented as a hash table, but given what I've said above that doesn't seem like enough justification for using the name Hash over something like Table or Map, especially given that Ruby doesn't seem to have any other built-in implementations of associative arrays. Why was this name chosen?
Ruby takes a large amount of inspiration from Perl, and Perl calls this a hash.
UPDATE:
Confirmed by Yukihiro Matsumoto, creator of Ruby: https://twitter.com/yukihiro_matz/status/547516495249428480