I was going through Ruby Koans tutorial series, when I came upon this in about_hashes.rb:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno", "dos"], hash[:one]
assert_equal ["uno", "dos"], hash[:two]
assert_equal ["uno", "dos"], hash[:three]
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
The values in assert_equals, is actually what the tutorial expected. But I couldn't understand how there is a difference between using << operator and = operator?
My expectation was that:
hash[:one] would be ["uno"]
hash[:two] would be ["dos"]
hash[:three] would be []
Can someone please explain why my expectation was wrong?
You have mixed up the way this works a bit. First off, a Hash doesn't have a << method, that method in your example exists on the array.
The reason your code is not erroring is because you are passing a default value to your hash via the constructor. http://ruby-doc.org/core-1.9.3/Hash.html#method-c-new
hash = Hash.new([])
This means that if a key does not exist, then it will return an array. If you run the following code:
hash = {}
hash[:one] << "uno"
Then you will get an undefined method error.
So in your example, what is actually happening is:
hash = Hash.new([])
hash[:one] << "uno" #hash[:one] does not exist so return an array and push "uno"
hash[:two] << "dos" #hash[:two] does not exist, so return the array ["uno"] and push "dos"
The reason it does not return an array with one element each time as you may expect, is because it stores a reference to the value that you pass through to the constructor. Meaning that each time an element is pushed, it modifies the initial array.
When you're doing hash = Hash.new([]) you are creating a Hash whose default value is the exact same Array instance for all keys. So whenever you are accessing a key that doesn't exist, you get back the very same Array.
h = Hash.new([])
h[:foo].object_id # => 12215540
h[:bar].object_id # => 12215540
If you want one array per key, you have to use the block syntax of Hash.new:
h = Hash.new { |h, k| h[k] = [] }
h[:foo].object_id # => 7791280
h[:bar].object_id # => 7790760
Edit: Also see what Gazler has to say with regard to the #<< method and on what object you are actually calling it.
Related
This question already has answers here:
Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])
(4 answers)
Closed 6 years ago.
I am new to Ruby and running Ruby Koans. In Ruby Koans, in about_hashes.rb file, there is an example of assigning default value to a hash.
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
puts hash[:one] # this is ["uno", "dos"]
here both hash[:one] & hash[:two] or any key like hash[:three] (non existing key) all have the value ["uno", and "dos"] I did not understand how "<<" is used here. Also, when I tried extracting keys & values of the hash, or print the keys/values, it is empty.
puts (hash.values.size) # size is 0 here
puts (hash.keys.size) # size is 0
puts hash.values # nothing gets printed
puts hash.keys #nothing gets printed.
So what is happening here? Where are the values getting stored, if they are not getting stored in the hash as keys or values.
in the next example, when Hash is defined as
hash = Hash.new {|hash, key| hash[key] = [] }
hash[:one] << "uno"
hash[:two] << "dos"
puts hash[:one] #this is "uno"
puts hash[:two] #this is "dos"
puts hash[:three] # this is undefined.
I guess in the second example, hash is initializing all the keys with a blank array. So "uno" is getting appended to the empty array when "<<" this operator is used? I am confused about both the examples. I don't know what is happening in both of them. I couldn't find much information on this example in google as well. If somebody can help me explain these 2 examples it will be helpful. Thanks in advance
hash = Hash.new([])
This statement creates an empty hash that has a default value of an empty array. If hash does not have a key k, hash[k] returns the default value, []. This is important: simply returning the default value does not modify the hash.
When you write:
hash[:one] << "uno" #=> ["uno"]
before hash has a key :one, hash[:one] is replaced by the default value, so we have:
[] << "uno" #=> ["uno"]
which explains why hash is not changed:
hash #=> {}
Now write:
hash[:two] << "dos" #=> ["uno", "dos"]
hash #=> {}
To see why we get this result, let's re-write the above as follows:
hash = Hash.new([]) #=> {}
hash.default #=> []
hash.default.object_id #=> 70209931469620
a = (hash[:one] << "uno") #=> ["uno"]
a.object_id #=> 70209931469620
hash.default #=> ["uno"]
b = (hash[:two] << "dos") #=> ["uno", "dos"]
b.object_id #=> 70209931469620
hash.default #=> ["uno", "dos"]
So you see that the default array whose object_id is 70209931469620 is the single array to which "uno" and "dos" are appended.
If we had instead written:
hash[:one] = hash[:one] << "uno"
#=> hash[:one] = [] << "uno" => ["uno"]
hash #=> { :one=>"uno" }
we get what we were hoping for, but not so fast:
hash[:two] = hash[:two] << "dos"
#=> ["uno", "dos"]
hash
#=> {:one=>["uno", "dos"], :two=>["uno", "dos"]}
which is still not what we want because both keys have values that are the same array.
hash = Hash.new {|hash, key| hash[key] = [] }
This statements causes the block to be executed when hash does not have a key key. This does change the hash, by adding a key value pair1.
So now:
hash[:one] << "uno"
causes the block:
{ |h,k| h[k] = [] }
to make the assignment:
hash[:one] = []
after which:
hash[:one] << "uno"
appends "uno" to an empty array that is the value for the key :one, which we can verify:
hash #=> { :one=>"uno" }
This has the same effect as writing:
hash[:one] = (hash[:one] || []) << "uno"
(the expanded version of (hash[:one] ||= []) << "uno") when there is no default value.
Similarly,
hash[:two] << "dos" #=> ["dos"]
hash #=> {:one=>["uno"], :two=>["dos"]}
which is usually the result we want.
1 The block need not change the hash, however. The block can contain any code, including, for example, { puts "Have a nice day" }.
hash = Hash.new(INITIAL_VALUE) is a syntax for creating hash, that has a default value. Default value is a “property” of the whole hash itself, it is to be returned when a non-existent key is accessed.
So, in your first example:
hash = Hash.new([]) # return a reference to an empty array for unknown keys
is the same as:
initial = []
hash = Hash.new(initial)
hence, when you call:
hash[:one] << "uno"
you in fact call hash[:one], that returns initial, and then call #<< method on it. In other words, these subsequent calls are the same as:
initial << "uno"
initial << "dos"
I guess, it is now clear, why all of them are sharing the same value. And the hash itself is still empty (on any call above, the initial is being used.) Look:
hash.merge! { one: "uno", three: "tres" }
hash[:one] # defined, since merge above
#⇒ "uno"
hash[:two] # undefined, initial will be returned
#⇒ []
hash[:two] << 'dos'
hash[:two] # defined, but defined as a reference to initial
#⇒ ["dos"]
hash[:two] = 'zwei' # redefined
#⇒ "zwei"
hash[:four] # undefined, initial
#⇒ ["dos"]
Hope it helps.
So let's say I do the following:
lph = Hash.new([]) #=> {}
lph["passed"] << "LCEOT" #=> ["LCEOT"]
lph #=> {} <-- Expected that to have been {"passed" => ["LCEOT"]}
lph["passed"] #=> ["LCEOT"]
lph["passed"] = lph["passed"] << "HJKL"
lph #=> {"passed"=>["LCEOT", "HJKL"]}
I'm surprised by this. A couple questions:
Why does it not get set until I push the second string on to the array? What is happening in the background?
What is the more idiomatic ruby way to essentially say. I have a hash, a key, and a value I want to to end up in the array associated with the key. How do I push the value in an array associated with a key into a hash the first time. In all future uses of the key, I just want to addd to the array.
Read the Ruby Hash.new documentation carefully - "if this hash is subsequently accessed by a key that doesn’t correspond to a hash entry, the value returned depends on the style of new used to create the hash".
new(obj) → new_hash
...If obj is specified, this single object will be used for all default values.
In your example you attempt to push something onto the value associated with a key which does not exist, so you end up mutating the same anonymous array you used to construct the hash initially.
the_array = []
h = Hash.new(the_array)
h['foo'] << 1 # => [1]
# Since the key 'foo' was not found
# ... the default value (the_array) is returned
# ... and 1 is pushed onto it (hence [1]).
the_array # => [1]
h # {} since the key 'foo' still has no value.
You probably want to use the block form:
new { |hash, key| block } → new_hash
...If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
For example:
h = Hash.new { |hash, key| hash[key] = [] } # Assign a new array as default for missing keys.
h['foo'] << 1 # => [1]
h['foo'] << 2 # => [1, 2]
h['bar'] << 3 # => [3]
h # => { 'foo' => [1, 2], 'bar' => [3] }
Why does it not get set until I push the second string on to the array?
In short; because you don't set anything in the hash until the point, where you also add the second string to the array.
What is happening in the background?
To see what's happening in the background, let's take this one line at a time:
lph = Hash.new([]) #=> {}
This creates an empty hash, configured to return the [] object whenever a non-existing key is accessed.
lph["passed"] << "LCEOT" #=> ["LCEOT"]
This can be written as
value = lph["passed"] #=> []
value << "LCEOT" #=> ["LCEOT"]
We see that lph["passed"] returns [] as expected, and we then proceed to append "LCEOT" to [].
lph #=> {}
lph is still an empty Hash. At no point have we added anything to the Hash. We have added something to its default value, but that doesn't change lph itself.
lph["passed"] #=> ["LCEOT"]
This is where it gets interesting. Remember above when we did value << ["LCEOT"]. That actually changed the default value that lph returns when a key isn't found. The default value is no longer [], but has become ["LCEOT"]. That new default value is returned here.
lph["passed"] = lph["passed"] << "HJKL"
This is our first change to lph. And what we actually assign to lph["passed"] is the default value (because "passed" is still a non-existing key in lph) with "HJKL" appended. Before this, the default value was ["LCEOT"], after this it is ["LCEOT", "HJKL"].
In other words lph["passed"] << "HJKL" returns ["LCEOT", "HJKL"] which is then assigned to lph["passed"].
What is the more idiomatic Ruby way
Using <<=:
>> lph = Hash.new { [] }
=> {}
>> lph["passed"] <<= "LCEOT"
=> ["LCEOT"]
>> lph
=> {"passed"=>["LCEOT"]}
Also note the change in how the Hash is initialized, using a block instead of a verbatim array. This ensures a new, blank array is created and returned whenever a new key is accessed, as opposed to the same array being used every time.
There's a nice idiom for adding to lists stored in a hash table:
(hash[key] ||= []) << new_value
Now, suppose I write a derivative hash class, like the ones found in Hashie, which does a deep-convert of any hash I store in it. Then what I store will not be the same object I passed to the = operator; Hash may be converted to Mash or Clash, and arrays may be copied.
Here's the problem. Ruby apparently returns, from the var= method, the value passed in, not the value that's stored. It doesn't matter what the var= method returns. The code below demonstrates this:
class C
attr_reader :foo
def foo=(value)
#foo = (value.is_a? Array) ? (value.clone) : value
end
end
c=C.new
puts "assignment: #{(c.foo ||= []) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo ||= []) << 6}"
puts "c.foo is #{c.foo}"
output is
assignment: [5]
c.foo is []
assignment: [6]
c.foo is [6]
When I posted this as a bug to Hashie, Danielle Sucher explained what was happening and pointed out that "foo.send :bar=, 1" returns the value returned by the bar= method. (Hat tip for the research!) So I guess I could do:
c=C.new
puts "clunky assignment: #{(c.foo || c.send(:foo=, [])) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo || c.send(:foo=, [])) << 6}"
puts "c.foo is #{c.foo}"
which prints
clunky assignment: [5]
c.foo is [5]
assignment: [5, 6]
c.foo is [5, 6]
Is there any more elegant way to do this?
Assignments evaluate to the value that is being assigned. Period.
In some other languages, assignments are statements, so they don't evaluate to anything. Those are really the only two sensible choices. Either don't evaluate to anything, or evaluate to the value being assigned. Everything else would be too surprising.
Since Ruby doesn't have statements, there is really only one choice.
The only "workaround" for this is: don't use assignment.
c.foo ||= []
c.foo << 5
Using two lines of code isn't the end of the world, and it's easier on the eyes.
The prettiest way to do this is to use default value for hash:
# h = Hash.new { [] }
h = Hash.new { |h,k| h[k] = [] }
But be ware that you cant use Hash.new([]) and then << because of way how Ruby store variables:
h = Hash.new([])
h[:a] # => []
h[:b] # => []
h[:a] << 10
h[:b] # => [10] O.o
it's caused by that Ruby store variables by reference, so as we created only one array instance, ad set it as default value then it will be shared between all hash cells (unless it will be overwrite, i.e. by h[:a] += [10]).
It is solved by using constructor with block (doc) Hash.new { [] }. With this each time when new key is created block is called and each value is different array.
EDIT: Fixed error that #Uri Agassi is writing about.
I'm going through Ruby Koans, and I hit #41 which I believe is this:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno","dos"], hash[:one]
assert_equal ["uno","dos"], hash[:two]
assert_equal ["uno","dos"], hash[:three]
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
It could not understand the behavior so I Googled it and found Strange ruby behavior when using Hash default value, e.g. Hash.new([]) that answered the question nicely.
So I understand how that works, my question is, why does a default value such as an integer that gets incremented not get changed during use? For example:
puts "Text please: "
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word] += 1 }
This will take user input and count the number of times each word is used, it works because the default value of 0 is always used.
I have a feeling it has to do with the << operator but I'd love an explanation.
The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.
The question is not whether Arrays are mutable, the question is whether you mutate it.
You can get both the behaviors you see above, just by using Arrays. Observe:
One default Array with mutation
hsh = Hash.new([])
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
One default Array without mutation
hsh = Hash.new([])
hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']
hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array
hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.
A new, different Array every time with mutation
hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }
It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):
get object referenced by i
get it internal value (lets name it raw_tmp)
create new object that internal value is raw_tmp + 1
assign reference to created object to i
So as you can see, we created new object, and i reference now to something different than at the beginning.
In the other hand, when we use Array#<< it works that way:
get object referenced by arr
to it's internal state append given element
So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).
From the doc, setting a default value has the following behaviour:
Returns the default value, the value that would be returned by hsh if key did not exist in hsh. See also Hash::new and Hash#default=.
Therefore, every time frequencies[word] is not set, the value for that individual key is set to 0.
The reason for the discrepancy between the two code blocks is that arrays are mutable in Ruby, while integers are not.
This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Ruby method Array#<< not updating the array in hash
Strange ruby behavior when using Hash.new([])
I've been doing Koans which is great, and as I go along I find no major trouble, but I stumbled upon this, and can't make any sense out of it:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno", "dos"], hash[:one] # But I only put "uno" for this key!
assert_equal ["uno", "dos"], hash[:two] # But I only put "dos" for this key!
assert_equal ["uno", "dos"], hash[:three] # I didn't shove anything for :three!
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
All the tests are passing (I just looked at the error which helped me guess the right assertions to write).
The last assert, ok, they both were not initialized so their values have got to have the same object ID since they both take the default.
I don't understand why the default value was altered, I'm not even entirely sure that's what happened.
I tried it out in IRB, thinking maybe some tampering on Hash/Array was done to make me crazy, but I get the same result.
I first thought hash[:one] << "uno" would imply hash to become { one: ["uno] }, but it remains { }.
Although I'm guessing << only calls push, and new keys are only added when you use the = sign
Please tell me what I missed.
EDIT: I'm using Ruby 1.9.3
When you use the default argument for a Hash, the same object is used for all keys that have not been explicitly set. This means that only one array is being used here, the one you passed into Hash.new. See below for evidence of that.
>> h = Hash.new([])
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:bar, :baz]
>> h[:foo].object_id
=> 2177177000
>> h[:bar].object_id
=> 2177177000
The weird thing is that as you found, if you inspect the hash, you'll find that it is empty! This is because only the default object has been modified, no keys have yet been assigned.
Fortunately, there is another way to do default values for hashes. You can provide a default block instead:
>> h = Hash.new { |h,k| h[k] = [] }
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:baz]
>> h[:foo].object_id
=> 2176949560
>> h[:bar].object_id
=> 2176921940
When you use this approach, the block gets executed every time an unassigned key is used, and it is provided the hash itself and the key as an argument. By assigning the default value within the block, you can be sure that a new object will get created for each distinct key, and that the assignment will happen automatically. This is the idiomatic way of creating a "Hash of Arrays" in Ruby, and is generally safer to use than the default argument approach.
That said, if you're working with immutable values (like numbers), doing something like Hash.new(0) is safe, as you'll only change those values by re-assignment. But because I prefer to keep fewer concepts in my head, I pretty much use the block form exclusively.
When you do
h = Hash.new(0)
h[:foo] += 1
you are directly modifying h. h[:foo] += 1 is the same as h[:foo] = h[:foo]+1. h[:foo] is being assigned 0+1.
When you do
h = Hash.new([])
h[:foo] << :bar
you are modifying h[:foo] which is [], which is the default value of h but is not a value to any key of h. After that [] becomes [:bar], the default value of h becomes [:bar], but that is not the value for h[:foo].