How does Ruby look for keys in a hash? I thought that as soon as it finds a key inside a hash, it returns its value without evaluating the other key/value pairs? But I guess I am wrong.
Eg,
test = {"a" => 10, "b" => 20, "c" => 30, "d" => 1/0}
now if i do test["a"], it returns error because of d's infinite value, if i remove "d", it works fine(which means it checks all the key/value pairs even if it finds a match in the first key). So if i search for a key in a really large hash, does Ruby evaluate every key/value for validity before returning the value for that particular hash? If that is the case, is there a way to break out of the hash as soon as it finds the key?
UPDATE
Just to clarify, I am trying to understand how it works in Ruby. So, for eg, if i have a hash with 500 key/value pairs(all valid not like 1/0), and lets say "a" is the first key. So if i do test["a"] on that large hash, does Ruby load all the key/value pairs in memory under the hood or just break out after it finds the key "a"?
The error you are getting occurs when ruby is creating the hash, not while accessing it - inserting the values into the hash clearly requires evaluating them.
There is no "loading" going on when fetching a value from a hash: the entirety of the hash is always in memory. A full explanation of hash tables is a bit out of scope but in a nutshell a hash works by hashing the key from which ruby derives which of the hashes buckets should contain the value. That bucket is then searched and the value is returned if the key is found.
Related
I use Array.wrap(x) all the time in order to ensure that Array methods actually exist on an object before calling them.
What is the best way to similarly ensure a Hash?
Example:
def ensure_hash(x)
# TODO: this is what I'm looking for
end
values = [nil,1,[],{},'',:a,1.0]
values.all?{|x| ensure_hash(x).respond_to?(:keys) } # true
The best I've been able to come up with so far is:
Hash::try_convert(x) || {}
However, I would prefer something more elegant.
tl; dr: In an app with proper error handling, there is no "easy, care-free" way to handle something that may or may not be hashy.
From a conceptual standpoint, the answer is no. There is no similar solution as Array.wrap(x) for hashes.
An array is a collection of values. Single values can be stored outside of arrays (e.g. x = 42) , so it's a straight-forward task to wrap a value in an array (a = [42]).
A hash is a collection of key-value pairs. In ruby, single key-value pairs can't exist outside of a hash. The only way to express a key-value pair is with a hash: h = { v: 42 }
Of course, there are a thousand ways to express a key-value pair as a single value. You could use an array [k, v] or a delimited string `"k:v" or some more obscure method.
But at that point, you're no longer wrapping, you're parsing. Parsing relies on properly formatted data and has multiple points of failure. No matter how you look at it, if you find yourself in a situation where you may or may not have a hash, that means you need to write a proper chunk of code for data validation and parsing (or refactor your upstream code so that you can always expect a hash).
I am pretty new to Ruby and currently discovering its differences from Java, consider the following code snippet:
file = File.new('test.json', 'w')
hash = {}
hash['1234'] = 'onetwothreefour_str'
hash[1234] = 'onetwothreefour_num'
puts hash.to_json
file.write(hash.to_json)
file.close
str = File.read('test.json')
puts str
puts JSON.parse(str)
it outputs
{"1234":"onetwothreefour_str","1234":"onetwothreefour_num"}
{"1234":"onetwothreefour_str","1234":"onetwothreefour_num"}
{"1234"=>"onetwothreefour_num"}
so, after deserialization we have one less object in hash.
Now, the question - is it normal behaviour? I think that it is perfectly legal to store in hash keys of different types. If so, then shouldn't JSON.parse write to file keys as '1234' and 1234?
Just to be clear - I understand that it's better to have keys of the same type, I just saw that after restoring my object has them as strings instead of numbers.
Yes, ruby hashes can have keys of whatever type.
JSON spec, on the other hand, dictates that object keys must be strings, no other type allowed.
So that explains the output you observe: upon serializing, integer key is turned into a string, making it a duplicate of another key. When reading it back, duplicate keys are dropped (last one wins, IIRC). I'm pretty sure you would get the same behaviour if you tried to use that json from javascript.
I was going to comment on the original question but I don't have the reputation to do so yet....
I too was wondering how to easily update all the values in a hash, or if there was some kind of equivalent .map! method for hashes. Someone put up this elegant solution:
hash.update(hash){|key,v1| expresion}
on this question:
Ruby: What is the easiest method to update Hash values?
My questions is how does the block know to iterate over each element in the hash? For example, I'd have to call .each on a hash to access each element normally so why isn't it something like:
hash.update(hash.each) do |key ,value|
value+=1
end
In the block with {|key, value| expression} I am accessing each individual hash element yet I don't have to explicitly tell the system this? Why not? Thank you very much.
Hash#update is an alias for Hash#merge! which is more descriptive.
When calling the method with a block, the following happens (excerpt from the docs):
If [a] block is specified, [...] the value of each duplicate key is
determined by calling the block with the key [...]
So, the above code works like this:
The hash is merged with itself, and for each duplicate key the block is called. As we merge the hash with itself, every newly added key is a duplicate and therefore the block is invoked. The result is that every value in the hash gets replaced by expresion.
Hash#update takes a hash as the first parameter, and an optional block as the second parameter. If the second parameter is left out, the method will internally loop on each key-value pair in the supplied hash and use them to merge into the original hash.
If the block (second parameter) is supplied, the method does exactly the same thing. It loops over each key-value in the supplied hash and merges it in. The only difference is where a collision is found (the original hash already has an entry for a specific key). In this case the block is called to help resolve the conflict.
Based on this understanding, simply passing the hash into itself will cause it to loop over every key-value because that's how update always works. Calling .each would be redundant.
To see this more clearly, take a look at the source code for the #update method, and note the internal call to rb_hash_foreach in either logic branch.
What's the meaning of an object's hash value? And in which case does two object has the same hash value?? Also it is said that Array|Hash can't be Hash keys, this has something to do with object's hash value, why?
For objects to be stored in hashmaps or hashsets the following must hold true:
If two objects are considered equal, their hash value must also be equal.
If two objects are not considered equal, their hash value should be likely to be different (the more often two different objects have the same hash value, the worse the performance of operations on the hashmap/set).
So if two objects have the same hash value there is a good chance (but no guarantee) that they are equal.
What exactly "equal" means in the above is up to the implementor of the hash method. However you should always implement eql? to use the same definition of equality as hash.
For classes that don't override hash (i.e. classes using Object's hash implementation) hash equality is defined in terms of object identity. I.e. two objects are considered equal if and only if the reside at the same location in memory.
In ruby up to 1.8.6 Array and Hash did not override hash. So if you used arrays (or hashes) as hash keys, you could only retrieve the value for a key, if you used the exact same array as a key for retrieval (not an array with the same contents).
In ruby 1.8.7+ Array#hash and Hash#hash (as well as their eql? methods) are defined so that they are equal when their elements are equal.
A hash value has no inherent meaning, but it is a way of representing that object such that it can be differentiated from other objects of the same type. When you create an object, it needs to implement hash such that if two objects have the same hash value, they will also be equal. What it means for two objects to be equal depends on the object; if you define, say, a Person object, you might want to say that two instances of Person are equal if they have the same name, id number, and birthdate. Or whatever criteria you choose.
Using an array or a hash as a hash key will now work since both do implement hash (such that the hash value is based on their contents). However, you can run into trouble when using a modifiable object such as an array as a key if there's any chance you might modify it. For example, if you have a variable of type Array, and you use it as a key to put something into a hash, and then you add something to the array, and try to use that variable as the key to get the something back out of the hash, it won't work (as the array's hash value has changed). The solution to this issue is to call Rehash on your hash after you modify the array.
What you're looking for is the concept of hashing.
It's not just for objects, is a broader concept.
Let's start with basics:
There are 3 things to be considered before understanding Object.hash.
memory address = Address of an Object in RAM.
value = value of an Object.
Type = Type of an Object.
Now let's understand different Object comparison operators.
"eql?" checks if the value of two operands are equal.
"==" checks if the value and type of two operands are the same.
"equal?" checks is this the exact same object?
Example 1: If address of two Object is same then they are pointing to same memory location whose value and Type is same as well.
a="5"
b=a
a.object_id => 194639
b.object_id => 194639
a.eql?(b) => true
a==b => true
a.equal?(b) => true
Example 2: Below example demonstrates hash of string and integer "5" is different.
> "5".eql?(5) => false
> 5.eql?(5) => true
> "5".eql?("5") => true
> 5.hash => -3271834327245180286
> "5".hash => -3126022673147098896
Conclusion:
If value and type of object is same then it will have same hash.
Here is a clever trick to enable hash autovivification in ruby (taken from facets):
# File lib/core/facets/hash/autonew.rb, line 19
def self.autonew(*args)
leet = lambda { |hsh, key| hsh[key] = new( &leet ) }
new(*args,&leet)
end
Although it works (of course), I find it really frustrating that I can't figure out how this two liner does what it does.
leet is put as a default value. So that then just accessing h['new_key'] somehow brings it up and creates 'new_key' => {}
Now, I'd expect h['new_key'] returning default value object as opposed to evaluating it. That is, 'new_key' => {} is not automatically created. So how does leet actually get called? Especially with two parameters?
The standard new method for Hash accepts a block. This block is called in the event of trying to access a key in the Hash which does not exist. The block is passed the Hash itself and the key that was requested (the two parameters) and should return the value that should be returned for the requested key.
You will notice that the leet lambda does 2 things. It returns a new Hash with leet itself as the block for handling defaults. This is the behaviour which allows autonew to work for Hashes of arbitrary depth. It also assigns this new Hash to hsh[key] so that next time you request the same key you will get the existing Hash rather than a new one being created.
It's also worth noting that this code can be made into a one-liner as follows:
def self.autonew(*args)
new(*args){|hsh, key| hsh[key] = Hash.new(&hsh.default_proc) }
end
The call to Hash#default_proc returns the proc that was used to create the parent, so we have a nice recursive setup here.
I talk about a similar case to this on my blog.
Alternatively, you might consider my xkeys gem. It's a module that you can use to extend arrays or hashes to facilitate nested access.
If you look for something that doesn't exist yet, you get a nil value (or another value or an exception if you prefer) without creating anything by looking. It can also append to the end of arrays.
You can opt to autovivify either hashes or arrays for integer keys (but just once for the entire structure).