Ruby hash default value behavior - ruby

I'm going through Ruby Koans, and I hit #41 which I believe is this:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno","dos"], hash[:one]
assert_equal ["uno","dos"], hash[:two]
assert_equal ["uno","dos"], hash[:three]
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
It could not understand the behavior so I Googled it and found Strange ruby behavior when using Hash default value, e.g. Hash.new([]) that answered the question nicely.
So I understand how that works, my question is, why does a default value such as an integer that gets incremented not get changed during use? For example:
puts "Text please: "
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word] += 1 }
This will take user input and count the number of times each word is used, it works because the default value of 0 is always used.
I have a feeling it has to do with the << operator but I'd love an explanation.

The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.
The question is not whether Arrays are mutable, the question is whether you mutate it.
You can get both the behaviors you see above, just by using Arrays. Observe:
One default Array with mutation
hsh = Hash.new([])
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
One default Array without mutation
hsh = Hash.new([])
hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']
hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array
hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.
A new, different Array every time with mutation
hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }

It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):
get object referenced by i
get it internal value (lets name it raw_tmp)
create new object that internal value is raw_tmp + 1
assign reference to created object to i
So as you can see, we created new object, and i reference now to something different than at the beginning.
In the other hand, when we use Array#<< it works that way:
get object referenced by arr
to it's internal state append given element
So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).

From the doc, setting a default value has the following behaviour:
Returns the default value, the value that would be returned by hsh if key did not exist in hsh. See also Hash::new and Hash#default=.
Therefore, every time frequencies[word] is not set, the value for that individual key is set to 0.
The reason for the discrepancy between the two code blocks is that arrays are mutable in Ruby, while integers are not.

Related

Working with Hashes that have a default value

Am learning to code with ruby. I am learning about hashes and i dont understand this code: count = Hash.new(0). It says that the 0 is a default value, but when i run it on irb it gives me an empty hash {}. If 0 is a default value why can't i see something like count ={0=>0}. Or is the zero an accumulator but doesn't go to the keys or values? Thanks
0 will be the fallback if you try to access a key in the hash that doesn't exist
For example:
count = Hash.new -> count['key'] => nil
vs
count = Hash.new(0) -> count['key'] => 0
To expand on the answer from #jeremy-ramos and comment from #mu-is-too-short.
There are two common gotcha's with defaulting hash values in this way.
1. Accidentally shared references.
Ruby uses the exact same object in memory that you pass in as the default value for every missed key.
For an immutable object (like 0), there is no problem. However you might want to write code like:
hash = Hash.new([])
hash[key] << value
or
hash = Hash.new({})
hash[key][second_key] = value
This will not do what you'd expect. Instead of hash[unknown_key] returning a new, empty array or hash it will return the exact same array/hash object for every key.
so doing:
hash = Hash.new([])
hash[key1] << value1
hash[key2] << value2
results in a hash where key1 and key2 both point to the same array object containing [value1, value2]
See related question here
Solution
To solve this you can create a hash with a default block argument instead (which is called whenever a missing key is accessed and lets you assign a value to the missed key)
hash = Hash.new{|h, key| h[key] = [] }
2. Assignment of missed keys with default values
When you access a missing key that returns the default value, you might expect that the hash will now contain that key with the value returned. It does not. Ruby does not modify the hash, it simply returns the default value. So, for example:
hash = Hash.new(0) #$> {}
hash.keys.empty? #$> true
hash[:foo] #$> 0
hash[:foo] == 0 #$> true
hash #$> {}
hash.keys.empty? #$> true
Solution
This confusion is also addressed using the block approach, where they keys value can be explicitly set.
The Hash.new docs are not very clear on this. I hope that the example below clarifies the difference and one of the frequent uses of Hash.new(0).
The first chunk of code uses Hash.new(0). The hash has a default value of 0, and when new keys are encountered, their value is 0. This method can be used to count the characters in the array.
The second chunk of code fails, because the default value for the key (when not assigned) is nil. This value cannot be used in addition (when counting), and generates an error.
count = Hash.new(0)
puts "count=#{count}"
# count={}
%w[a b b c c c].each do |char|
count[char] += 1
end
puts "count=#{count}"
# count={"a"=>1, "b"=>2, "c"=>3}
count = Hash.new
puts "count=#{count}"
%w[a b b c c c].each do |char|
count[char] += 1
# Fails: in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)
end
puts "count=#{count}"
SEE ALSO:
What's the difference between "Hash.new(0)" and "{}"
TL;DR When you initialize hash using Hash.new you can setup default value or default proc (the value that would be returned if given key does not exist)
Regarding the question to understand this magic firstly you need to know that Ruby hashes have default values. To access default value you can use Hash#default method
This default value by default :) is nil
hash = {}
hash.default # => nil
hash[:key] # => nil
You can set default value with Hash#default=
hash = {}
hash.default = :some_value
hash[:key] # => :some_value
Very important note: it is dangerous to use mutable object as default because of side effect like this:
hash = {}
hash.default = []
hash[:key] # => []
hash[:other_key] << :some_item # will mutate default value
hash[:key] # => [:some_value]
hash.default # => [:some_value]
hash # => {}
To avoid this you can use Hash#default_proc and Hash#default_proc= methods
hash = {}
hash.default_proc # => nil
hash.default_proc = proc { [] }
hash[:key] # => []
hash[:other_key] << :some_item # will not mutate default value
hash[:other_key] # => [] # because there is no this key
hash[:other_key] = [:symbol]
hash[:other_key] << :some_item
hash[:other_key] # => [:symbol, :some_item]
hash[:key] # => [] # still empty array as default
Setting default cancels default_proc and vice versa
hash = {}
hash.default = :default
hash.default_proc = proc { :default_proc }
hash[:key] # => :default_proc
hash.default = :default
hash[:key] # => :default
hash.default_proc # => nil
Going back to Hash.new
When you pass argument to this method, you initialize default value
hash = Hash.new(0)
hash.default # => 0
hash.default_proc # => nil
When you pass block to this method, you initialize default proc
hash = Hash.new { 0 }
hash.default # => nil
hash[:key] # => 0

Iterating through hash of hashes converts elements into Array

I'm working on processing a response of a Webhook from GitHub. The response contains a hash web_hook_response that looks like this
{:commits=>
{:modified=>
["public/en/landing_pages/dc.json",
"draft/en/landing_pages/careers/ac123.json"]
}
}
Now I have a function that processes this hash.
modified_or_deleted_files = []
web_hook_response[:commits].map do |commit|
modified_or_deleted_files << commit[:removed] << commit[:modified]
end
I get this error
TypeError: no implicit conversion of Symbol into Integer
I tried to find out the value of commit when it's inside the map block and this is what got printed
[:modified,
["public/en/landing_pages/dc.json",
"draft/en/landing_pages/careers/ac123.json"]]
Why is the modified hash converting into an array of a symbol and an array inside the map block? I can't explain why this is happening. Can anyone explain why this is happening?
Data
This is the hash you are given (simplified slightly).
web_hook_response = {
:commits => { :modified => ["public", "draft"] }
}
It has one key-value pair, the key being :commits and the value being the hash
{ :modified => ["public", "draft"] }
which itself has one key (:modified) and one value (["public", "draft"]).
Error
Try this (with my definition of web_hook_response):
web_hook_response[:commits].map do |commit|
puts "commit = #{commit}"
modified_or_deleted_files << commit[:removed] << commit[:modified] # line 397
end
# commit = [:modified, ["public", "draft"]]
# TypeError: no implicit conversion of Symbol into Integer
from (irb):397:in `[]'
from (irb):397:in `block in irb_binding'
from (irb):395:in `each'
from (irb):395:in `map'
Note that commit equals a key-value pair from the hash web_hook_response[:commits]. As you see, an attempt is made to compute
commit[:removed]
#=> [:modified, ["public", "draft"]][:removed]
which is the syntactic sugar form of the conventional expression
[:modified, ["public", "draft"]].[](:removed)
Since [:modified, ["public", "draft"]] is an array, Array#[] is an instance method of the class Array. (Yes, it's a funny name for a method, but that's what it is.) As explained in its doc, the method's argument must be an integer, namely, the index of an element of the array that is to to be returned. Therefore, when Ruby discovers that the argument is a symbol (:removed), she raises the exception, "no implicit conversion of Symbol into Integer".
Computing modified_or_deleted_files
Given the keys :commits and :modified we may extract the hash
h = web_hook_response[:commits]
#=> { :modified=>["public", "draft"] }
and from that extract the array
a = h[:modified]
#=> ["public", "draft"]
We would normally chain these two operations to obtain the array in one statement.
web_hook_response[:commits][:modified]
#=> ["public", "draft"]
It appears you wish to simply set the value of the variable modified_or_deleted_files to this array, so simply write the following.
modified_or_deleted_files = web_hook_response[:commits][:modified]
#=> ["public", "draft"]
web_hook_response[:commits] is a hash, not an array, so when you call map on it, the block parameter commit gets key-value pairs, which are arrays of size 2.
I think what you need is to concatenate 2 arrays. You can
modified_or_deleted_files = web_hook_response[:commits].slice(:modified, :removed).values.flatten
Your :commits is a hash. When iterating through hashes, you usually use two block arguments, one for the key and one for the value, for example:
{ :foo => 'bar' }.each do |key, value|
puts "#{key} = #{value}"
}
# outputs:
# foo = bar
When you only use one block argument you will get a key-value pair in an array:
{ :foo => 'bar' }.each do |pair|
puts pair.inspect
end
# outputs:
# [:foo, "bar"]
In your example you could just do:
commits = web_hook_response[:commits]
modified_or_deleted_files = Array(commits[:removed]) + Array(commits[:modified])
(The Array(...) is used to avoid an error if commits[:removed] or commits[:modified] is nil. Array(nil) returns an empty array, Array(an_array) returns the array)
Or if you want to get fancy with the enumerators, iterators and such:
modified_or_deleted_files = web_hook_response[:commits].
values_at(:modified, :removed).
compact.
reduce(:+)

Ruby Hash Interaction With Pushing Onto Array

So let's say I do the following:
lph = Hash.new([]) #=> {}
lph["passed"] << "LCEOT" #=> ["LCEOT"]
lph #=> {} <-- Expected that to have been {"passed" => ["LCEOT"]}
lph["passed"] #=> ["LCEOT"]
lph["passed"] = lph["passed"] << "HJKL"
lph #=> {"passed"=>["LCEOT", "HJKL"]}
I'm surprised by this. A couple questions:
Why does it not get set until I push the second string on to the array? What is happening in the background?
What is the more idiomatic ruby way to essentially say. I have a hash, a key, and a value I want to to end up in the array associated with the key. How do I push the value in an array associated with a key into a hash the first time. In all future uses of the key, I just want to addd to the array.
Read the Ruby Hash.new documentation carefully - "if this hash is subsequently accessed by a key that doesn’t correspond to a hash entry, the value returned depends on the style of new used to create the hash".
new(obj) → new_hash
...If obj is specified, this single object will be used for all default values.
In your example you attempt to push something onto the value associated with a key which does not exist, so you end up mutating the same anonymous array you used to construct the hash initially.
the_array = []
h = Hash.new(the_array)
h['foo'] << 1 # => [1]
# Since the key 'foo' was not found
# ... the default value (the_array) is returned
# ... and 1 is pushed onto it (hence [1]).
the_array # => [1]
h # {} since the key 'foo' still has no value.
You probably want to use the block form:
new { |hash, key| block } → new_hash
...If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
For example:
h = Hash.new { |hash, key| hash[key] = [] } # Assign a new array as default for missing keys.
h['foo'] << 1 # => [1]
h['foo'] << 2 # => [1, 2]
h['bar'] << 3 # => [3]
h # => { 'foo' => [1, 2], 'bar' => [3] }
Why does it not get set until I push the second string on to the array?
In short; because you don't set anything in the hash until the point, where you also add the second string to the array.
What is happening in the background?
To see what's happening in the background, let's take this one line at a time:
lph = Hash.new([]) #=> {}
This creates an empty hash, configured to return the [] object whenever a non-existing key is accessed.
lph["passed"] << "LCEOT" #=> ["LCEOT"]
This can be written as
value = lph["passed"] #=> []
value << "LCEOT" #=> ["LCEOT"]
We see that lph["passed"] returns [] as expected, and we then proceed to append "LCEOT" to [].
lph #=> {}
lph is still an empty Hash. At no point have we added anything to the Hash. We have added something to its default value, but that doesn't change lph itself.
lph["passed"] #=> ["LCEOT"]
This is where it gets interesting. Remember above when we did value << ["LCEOT"]. That actually changed the default value that lph returns when a key isn't found. The default value is no longer [], but has become ["LCEOT"]. That new default value is returned here.
lph["passed"] = lph["passed"] << "HJKL"
This is our first change to lph. And what we actually assign to lph["passed"] is the default value (because "passed" is still a non-existing key in lph) with "HJKL" appended. Before this, the default value was ["LCEOT"], after this it is ["LCEOT", "HJKL"].
In other words lph["passed"] << "HJKL" returns ["LCEOT", "HJKL"] which is then assigned to lph["passed"].
What is the more idiomatic Ruby way
Using <<=:
>> lph = Hash.new { [] }
=> {}
>> lph["passed"] <<= "LCEOT"
=> ["LCEOT"]
>> lph
=> {"passed"=>["LCEOT"]}
Also note the change in how the Hash is initialized, using a block instead of a verbatim array. This ensures a new, blank array is created and returned whenever a new key is accessed, as opposed to the same array being used every time.

How to make Ruby var= return value assigned, not value passed in?

There's a nice idiom for adding to lists stored in a hash table:
(hash[key] ||= []) << new_value
Now, suppose I write a derivative hash class, like the ones found in Hashie, which does a deep-convert of any hash I store in it. Then what I store will not be the same object I passed to the = operator; Hash may be converted to Mash or Clash, and arrays may be copied.
Here's the problem. Ruby apparently returns, from the var= method, the value passed in, not the value that's stored. It doesn't matter what the var= method returns. The code below demonstrates this:
class C
attr_reader :foo
def foo=(value)
#foo = (value.is_a? Array) ? (value.clone) : value
end
end
c=C.new
puts "assignment: #{(c.foo ||= []) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo ||= []) << 6}"
puts "c.foo is #{c.foo}"
output is
assignment: [5]
c.foo is []
assignment: [6]
c.foo is [6]
When I posted this as a bug to Hashie, Danielle Sucher explained what was happening and pointed out that "foo.send :bar=, 1" returns the value returned by the bar= method. (Hat tip for the research!) So I guess I could do:
c=C.new
puts "clunky assignment: #{(c.foo || c.send(:foo=, [])) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo || c.send(:foo=, [])) << 6}"
puts "c.foo is #{c.foo}"
which prints
clunky assignment: [5]
c.foo is [5]
assignment: [5, 6]
c.foo is [5, 6]
Is there any more elegant way to do this?
Assignments evaluate to the value that is being assigned. Period.
In some other languages, assignments are statements, so they don't evaluate to anything. Those are really the only two sensible choices. Either don't evaluate to anything, or evaluate to the value being assigned. Everything else would be too surprising.
Since Ruby doesn't have statements, there is really only one choice.
The only "workaround" for this is: don't use assignment.
c.foo ||= []
c.foo << 5
Using two lines of code isn't the end of the world, and it's easier on the eyes.
The prettiest way to do this is to use default value for hash:
# h = Hash.new { [] }
h = Hash.new { |h,k| h[k] = [] }
But be ware that you cant use Hash.new([]) and then << because of way how Ruby store variables:
h = Hash.new([])
h[:a] # => []
h[:b] # => []
h[:a] << 10
h[:b] # => [10] O.o
it's caused by that Ruby store variables by reference, so as we created only one array instance, ad set it as default value then it will be shared between all hash cells (unless it will be overwrite, i.e. by h[:a] += [10]).
It is solved by using constructor with block (doc) Hash.new { [] }. With this each time when new key is created block is called and each value is different array.
EDIT: Fixed error that #Uri Agassi is writing about.

Hash.new([]) does not behave as expected [duplicate]

This question already has answers here:
Closed 10 years ago.
Possible Duplicate:
Ruby method Array#<< not updating the array in hash
Strange ruby behavior when using Hash.new([])
I've been doing Koans which is great, and as I go along I find no major trouble, but I stumbled upon this, and can't make any sense out of it:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno", "dos"], hash[:one] # But I only put "uno" for this key!
assert_equal ["uno", "dos"], hash[:two] # But I only put "dos" for this key!
assert_equal ["uno", "dos"], hash[:three] # I didn't shove anything for :three!
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
All the tests are passing (I just looked at the error which helped me guess the right assertions to write).
The last assert, ok, they both were not initialized so their values have got to have the same object ID since they both take the default.
I don't understand why the default value was altered, I'm not even entirely sure that's what happened.
I tried it out in IRB, thinking maybe some tampering on Hash/Array was done to make me crazy, but I get the same result.
I first thought hash[:one] << "uno" would imply hash to become { one: ["uno] }, but it remains { }.
Although I'm guessing << only calls push, and new keys are only added when you use the = sign
Please tell me what I missed.
EDIT: I'm using Ruby 1.9.3
When you use the default argument for a Hash, the same object is used for all keys that have not been explicitly set. This means that only one array is being used here, the one you passed into Hash.new. See below for evidence of that.
>> h = Hash.new([])
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:bar, :baz]
>> h[:foo].object_id
=> 2177177000
>> h[:bar].object_id
=> 2177177000
The weird thing is that as you found, if you inspect the hash, you'll find that it is empty! This is because only the default object has been modified, no keys have yet been assigned.
Fortunately, there is another way to do default values for hashes. You can provide a default block instead:
>> h = Hash.new { |h,k| h[k] = [] }
=> {}
>> h[:foo] << :bar
=> [:bar]
>> h[:bar] << :baz
=> [:baz]
>> h[:foo].object_id
=> 2176949560
>> h[:bar].object_id
=> 2176921940
When you use this approach, the block gets executed every time an unassigned key is used, and it is provided the hash itself and the key as an argument. By assigning the default value within the block, you can be sure that a new object will get created for each distinct key, and that the assignment will happen automatically. This is the idiomatic way of creating a "Hash of Arrays" in Ruby, and is generally safer to use than the default argument approach.
That said, if you're working with immutable values (like numbers), doing something like Hash.new(0) is safe, as you'll only change those values by re-assignment. But because I prefer to keep fewer concepts in my head, I pretty much use the block form exclusively.
When you do
h = Hash.new(0)
h[:foo] += 1
you are directly modifying h. h[:foo] += 1 is the same as h[:foo] = h[:foo]+1. h[:foo] is being assigned 0+1.
When you do
h = Hash.new([])
h[:foo] << :bar
you are modifying h[:foo] which is [], which is the default value of h but is not a value to any key of h. After that [] becomes [:bar], the default value of h becomes [:bar], but that is not the value for h[:foo].

Resources