I need to assign the numbers 0-100000 to a hash without giving a key.
Ruby uses Murmur as hash function. How can I add a value without having a key, like in C, letting it handle collision and other things. Is it possible? Can I give just the value to hash and let it evaluate the key, then insert to itself?
In a normal hashing operation, we have a hash function, and a table. We use value as the argument of the hash function, then we get a key in return. The value is inserted to the key location in the table (if a collision happens, double hashing or something else used).
Is it possible to do the same type of hashing in Ruby? Or am I stuck with default ways? Can I just throw the value into a function, then it evaluates the key, and inserts the value to the hash table or not?
Just store into the hash using the calculated hash of the key, rather than the key itself:
hash[hash_func(key)] = value
That is, instead of mapping key -> value directly, this maps hash_func(key) -> value. The implementation may pass your hashed key value through another hash function internally, but you needn't care about that.
However, in comments it now comes to light that you want to apply the hash function to the value, not any other key. In that case, just use a set and be done with it. Then, all you need to do is add values to the set:
s = Set.new
s.add(value)
There's no need to calculate the hash of anything; Set will take care of it for you.
In short, this seems to be a case of the XY Problem. You needed to store a set of values in a data structure (and presumably be able to check if those values were stored in an efficient manner). Instead of asking about this, you asked about hash functions and tables. If you had asked about what you really needed, instead of asking about something else that you thought you could use to solve the original problem, you would have had a useful answer much more quickly.
The common solution is to simply use the value as a key. Hence:
value = "xxx"
hash[value] = 1
This way, you clearly document that the actual values (all 1) of this particular hash are of no use, and you will get de-duplicated values. Hash will do the usual hashing internally, you don't need to worry about it at all.
I use 1 as value here, but the actual value is completely irrelevant. I don't use nil as that is the default return value of hash[nonexistant_value].
If your values are more complex, check out http://docs.ruby-lang.org/en/2.0.0/Hash.html for specifics about them.
Related
I use Array.wrap(x) all the time in order to ensure that Array methods actually exist on an object before calling them.
What is the best way to similarly ensure a Hash?
Example:
def ensure_hash(x)
# TODO: this is what I'm looking for
end
values = [nil,1,[],{},'',:a,1.0]
values.all?{|x| ensure_hash(x).respond_to?(:keys) } # true
The best I've been able to come up with so far is:
Hash::try_convert(x) || {}
However, I would prefer something more elegant.
tl; dr: In an app with proper error handling, there is no "easy, care-free" way to handle something that may or may not be hashy.
From a conceptual standpoint, the answer is no. There is no similar solution as Array.wrap(x) for hashes.
An array is a collection of values. Single values can be stored outside of arrays (e.g. x = 42) , so it's a straight-forward task to wrap a value in an array (a = [42]).
A hash is a collection of key-value pairs. In ruby, single key-value pairs can't exist outside of a hash. The only way to express a key-value pair is with a hash: h = { v: 42 }
Of course, there are a thousand ways to express a key-value pair as a single value. You could use an array [k, v] or a delimited string `"k:v" or some more obscure method.
But at that point, you're no longer wrapping, you're parsing. Parsing relies on properly formatted data and has multiple points of failure. No matter how you look at it, if you find yourself in a situation where you may or may not have a hash, that means you need to write a proper chunk of code for data validation and parsing (or refactor your upstream code so that you can always expect a hash).
Let's put it via an intuitive example.
I don't want others to modify my source code, so I put a statement in my code:
if( hash_value_of(this_file) != "A_PRE-DEFINED_HASH_VALUE" )
output("Aha! You modified my file!")
So in this case, the pre-defined hash value will affect the actual hash value of the source file at the output stage. It's like a strange loop so that I have to find a way to calculate a hash value beforehand that exactly matches the output.
It is of note that actually I don't care if this method can protect my source file at all. It is just an example. What of concern is how to calculate such a hash value beforehand.
Is there any algorithm matches the need? I am not expecting to get answers like "why do you even think about it?", "what's the usage?". It's only an algorithm discussion. Thanks for any contribution!
I was going to comment on the original question but I don't have the reputation to do so yet....
I too was wondering how to easily update all the values in a hash, or if there was some kind of equivalent .map! method for hashes. Someone put up this elegant solution:
hash.update(hash){|key,v1| expresion}
on this question:
Ruby: What is the easiest method to update Hash values?
My questions is how does the block know to iterate over each element in the hash? For example, I'd have to call .each on a hash to access each element normally so why isn't it something like:
hash.update(hash.each) do |key ,value|
value+=1
end
In the block with {|key, value| expression} I am accessing each individual hash element yet I don't have to explicitly tell the system this? Why not? Thank you very much.
Hash#update is an alias for Hash#merge! which is more descriptive.
When calling the method with a block, the following happens (excerpt from the docs):
If [a] block is specified, [...] the value of each duplicate key is
determined by calling the block with the key [...]
So, the above code works like this:
The hash is merged with itself, and for each duplicate key the block is called. As we merge the hash with itself, every newly added key is a duplicate and therefore the block is invoked. The result is that every value in the hash gets replaced by expresion.
Hash#update takes a hash as the first parameter, and an optional block as the second parameter. If the second parameter is left out, the method will internally loop on each key-value pair in the supplied hash and use them to merge into the original hash.
If the block (second parameter) is supplied, the method does exactly the same thing. It loops over each key-value in the supplied hash and merges it in. The only difference is where a collision is found (the original hash already has an entry for a specific key). In this case the block is called to help resolve the conflict.
Based on this understanding, simply passing the hash into itself will cause it to loop over every key-value because that's how update always works. Calling .each would be redundant.
To see this more clearly, take a look at the source code for the #update method, and note the internal call to rb_hash_foreach in either logic branch.
Is there a better, faster, and more direct way to access the symbol key of :a2 in this array of one hash [{:a2=>nil}]?
I have tried #new_array.first.first.first and #new_array.first.keys.first.
I would use #new_array.first.keys.first, but this is assuming you always want the first key of the first hash.
What's the meaning of an object's hash value? And in which case does two object has the same hash value?? Also it is said that Array|Hash can't be Hash keys, this has something to do with object's hash value, why?
For objects to be stored in hashmaps or hashsets the following must hold true:
If two objects are considered equal, their hash value must also be equal.
If two objects are not considered equal, their hash value should be likely to be different (the more often two different objects have the same hash value, the worse the performance of operations on the hashmap/set).
So if two objects have the same hash value there is a good chance (but no guarantee) that they are equal.
What exactly "equal" means in the above is up to the implementor of the hash method. However you should always implement eql? to use the same definition of equality as hash.
For classes that don't override hash (i.e. classes using Object's hash implementation) hash equality is defined in terms of object identity. I.e. two objects are considered equal if and only if the reside at the same location in memory.
In ruby up to 1.8.6 Array and Hash did not override hash. So if you used arrays (or hashes) as hash keys, you could only retrieve the value for a key, if you used the exact same array as a key for retrieval (not an array with the same contents).
In ruby 1.8.7+ Array#hash and Hash#hash (as well as their eql? methods) are defined so that they are equal when their elements are equal.
A hash value has no inherent meaning, but it is a way of representing that object such that it can be differentiated from other objects of the same type. When you create an object, it needs to implement hash such that if two objects have the same hash value, they will also be equal. What it means for two objects to be equal depends on the object; if you define, say, a Person object, you might want to say that two instances of Person are equal if they have the same name, id number, and birthdate. Or whatever criteria you choose.
Using an array or a hash as a hash key will now work since both do implement hash (such that the hash value is based on their contents). However, you can run into trouble when using a modifiable object such as an array as a key if there's any chance you might modify it. For example, if you have a variable of type Array, and you use it as a key to put something into a hash, and then you add something to the array, and try to use that variable as the key to get the something back out of the hash, it won't work (as the array's hash value has changed). The solution to this issue is to call Rehash on your hash after you modify the array.
What you're looking for is the concept of hashing.
It's not just for objects, is a broader concept.
Let's start with basics:
There are 3 things to be considered before understanding Object.hash.
memory address = Address of an Object in RAM.
value = value of an Object.
Type = Type of an Object.
Now let's understand different Object comparison operators.
"eql?" checks if the value of two operands are equal.
"==" checks if the value and type of two operands are the same.
"equal?" checks is this the exact same object?
Example 1: If address of two Object is same then they are pointing to same memory location whose value and Type is same as well.
a="5"
b=a
a.object_id => 194639
b.object_id => 194639
a.eql?(b) => true
a==b => true
a.equal?(b) => true
Example 2: Below example demonstrates hash of string and integer "5" is different.
> "5".eql?(5) => false
> 5.eql?(5) => true
> "5".eql?("5") => true
> 5.hash => -3271834327245180286
> "5".hash => -3126022673147098896
Conclusion:
If value and type of object is same then it will have same hash.