Ruby Hash Interaction With Pushing Onto Array - ruby

So let's say I do the following:
lph = Hash.new([]) #=> {}
lph["passed"] << "LCEOT" #=> ["LCEOT"]
lph #=> {} <-- Expected that to have been {"passed" => ["LCEOT"]}
lph["passed"] #=> ["LCEOT"]
lph["passed"] = lph["passed"] << "HJKL"
lph #=> {"passed"=>["LCEOT", "HJKL"]}
I'm surprised by this. A couple questions:
Why does it not get set until I push the second string on to the array? What is happening in the background?
What is the more idiomatic ruby way to essentially say. I have a hash, a key, and a value I want to to end up in the array associated with the key. How do I push the value in an array associated with a key into a hash the first time. In all future uses of the key, I just want to addd to the array.

Read the Ruby Hash.new documentation carefully - "if this hash is subsequently accessed by a key that doesn’t correspond to a hash entry, the value returned depends on the style of new used to create the hash".
new(obj) → new_hash
...If obj is specified, this single object will be used for all default values.
In your example you attempt to push something onto the value associated with a key which does not exist, so you end up mutating the same anonymous array you used to construct the hash initially.
the_array = []
h = Hash.new(the_array)
h['foo'] << 1 # => [1]
# Since the key 'foo' was not found
# ... the default value (the_array) is returned
# ... and 1 is pushed onto it (hence [1]).
the_array # => [1]
h # {} since the key 'foo' still has no value.
You probably want to use the block form:
new { |hash, key| block } → new_hash
...If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
For example:
h = Hash.new { |hash, key| hash[key] = [] } # Assign a new array as default for missing keys.
h['foo'] << 1 # => [1]
h['foo'] << 2 # => [1, 2]
h['bar'] << 3 # => [3]
h # => { 'foo' => [1, 2], 'bar' => [3] }

Why does it not get set until I push the second string on to the array?
In short; because you don't set anything in the hash until the point, where you also add the second string to the array.
What is happening in the background?
To see what's happening in the background, let's take this one line at a time:
lph = Hash.new([]) #=> {}
This creates an empty hash, configured to return the [] object whenever a non-existing key is accessed.
lph["passed"] << "LCEOT" #=> ["LCEOT"]
This can be written as
value = lph["passed"] #=> []
value << "LCEOT" #=> ["LCEOT"]
We see that lph["passed"] returns [] as expected, and we then proceed to append "LCEOT" to [].
lph #=> {}
lph is still an empty Hash. At no point have we added anything to the Hash. We have added something to its default value, but that doesn't change lph itself.
lph["passed"] #=> ["LCEOT"]
This is where it gets interesting. Remember above when we did value << ["LCEOT"]. That actually changed the default value that lph returns when a key isn't found. The default value is no longer [], but has become ["LCEOT"]. That new default value is returned here.
lph["passed"] = lph["passed"] << "HJKL"
This is our first change to lph. And what we actually assign to lph["passed"] is the default value (because "passed" is still a non-existing key in lph) with "HJKL" appended. Before this, the default value was ["LCEOT"], after this it is ["LCEOT", "HJKL"].
In other words lph["passed"] << "HJKL" returns ["LCEOT", "HJKL"] which is then assigned to lph["passed"].
What is the more idiomatic Ruby way
Using <<=:
>> lph = Hash.new { [] }
=> {}
>> lph["passed"] <<= "LCEOT"
=> ["LCEOT"]
>> lph
=> {"passed"=>["LCEOT"]}
Also note the change in how the Hash is initialized, using a block instead of a verbatim array. This ensures a new, blank array is created and returned whenever a new key is accessed, as opposed to the same array being used every time.

Related

Working with Hashes that have a default value

Am learning to code with ruby. I am learning about hashes and i dont understand this code: count = Hash.new(0). It says that the 0 is a default value, but when i run it on irb it gives me an empty hash {}. If 0 is a default value why can't i see something like count ={0=>0}. Or is the zero an accumulator but doesn't go to the keys or values? Thanks
0 will be the fallback if you try to access a key in the hash that doesn't exist
For example:
count = Hash.new -> count['key'] => nil
vs
count = Hash.new(0) -> count['key'] => 0
To expand on the answer from #jeremy-ramos and comment from #mu-is-too-short.
There are two common gotcha's with defaulting hash values in this way.
1. Accidentally shared references.
Ruby uses the exact same object in memory that you pass in as the default value for every missed key.
For an immutable object (like 0), there is no problem. However you might want to write code like:
hash = Hash.new([])
hash[key] << value
or
hash = Hash.new({})
hash[key][second_key] = value
This will not do what you'd expect. Instead of hash[unknown_key] returning a new, empty array or hash it will return the exact same array/hash object for every key.
so doing:
hash = Hash.new([])
hash[key1] << value1
hash[key2] << value2
results in a hash where key1 and key2 both point to the same array object containing [value1, value2]
See related question here
Solution
To solve this you can create a hash with a default block argument instead (which is called whenever a missing key is accessed and lets you assign a value to the missed key)
hash = Hash.new{|h, key| h[key] = [] }
2. Assignment of missed keys with default values
When you access a missing key that returns the default value, you might expect that the hash will now contain that key with the value returned. It does not. Ruby does not modify the hash, it simply returns the default value. So, for example:
hash = Hash.new(0) #$> {}
hash.keys.empty? #$> true
hash[:foo] #$> 0
hash[:foo] == 0 #$> true
hash #$> {}
hash.keys.empty? #$> true
Solution
This confusion is also addressed using the block approach, where they keys value can be explicitly set.
The Hash.new docs are not very clear on this. I hope that the example below clarifies the difference and one of the frequent uses of Hash.new(0).
The first chunk of code uses Hash.new(0). The hash has a default value of 0, and when new keys are encountered, their value is 0. This method can be used to count the characters in the array.
The second chunk of code fails, because the default value for the key (when not assigned) is nil. This value cannot be used in addition (when counting), and generates an error.
count = Hash.new(0)
puts "count=#{count}"
# count={}
%w[a b b c c c].each do |char|
count[char] += 1
end
puts "count=#{count}"
# count={"a"=>1, "b"=>2, "c"=>3}
count = Hash.new
puts "count=#{count}"
%w[a b b c c c].each do |char|
count[char] += 1
# Fails: in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)
end
puts "count=#{count}"
SEE ALSO:
What's the difference between "Hash.new(0)" and "{}"
TL;DR When you initialize hash using Hash.new you can setup default value or default proc (the value that would be returned if given key does not exist)
Regarding the question to understand this magic firstly you need to know that Ruby hashes have default values. To access default value you can use Hash#default method
This default value by default :) is nil
hash = {}
hash.default # => nil
hash[:key] # => nil
You can set default value with Hash#default=
hash = {}
hash.default = :some_value
hash[:key] # => :some_value
Very important note: it is dangerous to use mutable object as default because of side effect like this:
hash = {}
hash.default = []
hash[:key] # => []
hash[:other_key] << :some_item # will mutate default value
hash[:key] # => [:some_value]
hash.default # => [:some_value]
hash # => {}
To avoid this you can use Hash#default_proc and Hash#default_proc= methods
hash = {}
hash.default_proc # => nil
hash.default_proc = proc { [] }
hash[:key] # => []
hash[:other_key] << :some_item # will not mutate default value
hash[:other_key] # => [] # because there is no this key
hash[:other_key] = [:symbol]
hash[:other_key] << :some_item
hash[:other_key] # => [:symbol, :some_item]
hash[:key] # => [] # still empty array as default
Setting default cancels default_proc and vice versa
hash = {}
hash.default = :default
hash.default_proc = proc { :default_proc }
hash[:key] # => :default_proc
hash.default = :default
hash[:key] # => :default
hash.default_proc # => nil
Going back to Hash.new
When you pass argument to this method, you initialize default value
hash = Hash.new(0)
hash.default # => 0
hash.default_proc # => nil
When you pass block to this method, you initialize default proc
hash = Hash.new { 0 }
hash.default # => nil
hash[:key] # => 0

How do I add an object to an array, where the array is a value to a key in a hash?

So basically my code is as follows
anagrams = Hash.new([])
self.downcase.scan(/\b[a-z]+/i).each do |key|
anagrams[key.downcase.chars.sort] = #push key into array
end
so basically the hash would look like this
anagrams = { "abcdef" => ["fdebca", "edfcba"], "jklm" => ["jkl"]}
Basically what I don't understand is how to push "key" (which is obviously a string) as the value to "eyk"
I've been searching for awhile including documentation and other stackflow questions and this was my best guess
anagrams[key.downcase.chars.sort].push(key)
Your guess:
anagrams[key.downcase.chars.sort].push(key)
is right. The problem is your hash's default value:
anagrams = Hash.new([])
A default value doesn't automatically create an entry in the hash when you reference it, it just returns the value. That means that you can do this:
h = Hash.new([])
h[:k].push(6)
without changing h at all. The h[:k] gives you the default value ([]) but it doesn't add :k as a key. Also note that the same default value is used every time you try to access a key that isn't in the hash so this:
h = Hash.new([])
a = h[:k].push(6)
b = h[:x].push(11)
will leave you with [6,11] in both a and b but nothing in h.
If you want to automatically add defaults when you access them, you'll need to use a default_proc, not a simple default:
anagrams = Hash.new { |h, k] h[k] = [ ] }
That will create the entries when you access a non-existent key and give each one a different empty array.
It's not entirely clear what your method is supposed to do, but I think the problem is that you don't have an array to push a value onto.
In Ruby you can pass a block to Hash.new that tells it what to do when you try to access a key that doesn't exist. This is a handy way to automatically initialize values as empty arrays. For example:
hsh = Hash.new {|hsh, key| hsh[key] = [] }
hsh[:foo] << "bar"
p hsh # => { :foo => [ "bar" ] }
In your method (which I assume you're adding to the String class), you would use it like this:
class String
def my_method
anagrams = Hash.new {|hsh, key| hsh[key] = [] }
downcase.scan(/\b[a-z]+/i).each_with_object(anagrams) do |key|
anagrams[key.downcase.chars.sort.join] << key
end
end
end

Difference between Ruby's .push and << [duplicate]

This question already has answers here:
Ruby - Difference between Array#<< and Array#push
(5 answers)
Closed 8 years ago.
Here's an example with push:
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
puts #connections # => {1=>[2]}
Here's an example with <<
#connections = Hash.new []
#connections[1] << 2
puts #connections # => {}
For some reason the output (#connections) is different, but why? I'm guessing it has something to do with Ruby object model?
Perhaps the new hash object [] is being create each time, but not saved? But why?
The difference in your code isn't about << vs. push, it's about the fact that you re-assign in one case and don't in the other. The following two pieces of code are equivalent:
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
puts #connections # => {1=>[2]}
#connections = Hash.new []
#connections[1] = (#connections[1] << 2)
puts #connections # => {1=>[2]}
As are these two:
#connections = Hash.new []
#connections[1].push(2)
puts #connections # => {}
#connections = Hash.new []
#connections[1] << 2
puts #connections # => {}
The reason that re-assignment makes a difference here is that accessing a default value, does not automatically add an entry for it to the hash. That is if you have h = Hash.new(0) and then you do p h[0], you'll print 0, but the value of h will still be {} (not {0 => 0}) because the 0 is not added to the hash. If you do h[0] += 1, this will call the []= method on the hash and actually add an entry for 0 to it, so h becomes {0 => 1}.
So when you do #connections[1] << 2 in your code, you get the default array and perform << on it, but you don't store anything in #connections, so it stays {}. When you do #connections[i] = #connections[i].push(2) or #connections[i] = (#connections[i] << 2), you're calling []=, so the entry gets added to the hash.
However you should note that the hash will return a reference to the same array each time, so even if you do add the entry to the hash, it will likely still not behave as you expect once you add more than one entry (since all entries refer to the same array):
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
#connections[2] = #connections[2].push(42)
puts #connections # => {1 => [2, 42], 2 => [2, 42]}
What you really want is a hash that returns a reference to a new array each time that a new key is accessed and that automatically adds an entry for the new array when that happens. To do that you can use the block form of Hash.new like this:
#connections = Hash.new do |h, k|
h[k] = []
end
#connections[1].push(2)
#connections[2].push(42)
puts #connections # => {1 => [2], 2 => [42]}
Note that when you write
h = Hash.new |this_hash, non_existent_key| { this_hash[non_existent_key] = [] }
...Ruby will execute the block whenever you try to lookup a key that doesn't exist, and then return the block's return value. A block is like a def in that all variables inside it(including the parameter variables) are created anew every time the block is called. In addition, note that [] is an Array constructor, and each time it is called, it creates a new array.
A block returns the result of the last statement that was executed in the block, which is the assignment statement:
this_hash[non_existent_key] = []
And an assignment statement returns the right hand side, which will be a reference to the same Array that was assigned to the key in the hash, so any changes to the returned Array will change the Array in the hash.
On the other hand, when you write:
Hash.new([])
The [] constructor creates a new, empty Array; and that Array becomes the argument for Hash.new(). There is no block for ruby to call every time you look up a non existent key, so ruby just returns that one Array as the value for ALL non-existent keys--and very importantly nothing is done to the hash.

How to make Ruby var= return value assigned, not value passed in?

There's a nice idiom for adding to lists stored in a hash table:
(hash[key] ||= []) << new_value
Now, suppose I write a derivative hash class, like the ones found in Hashie, which does a deep-convert of any hash I store in it. Then what I store will not be the same object I passed to the = operator; Hash may be converted to Mash or Clash, and arrays may be copied.
Here's the problem. Ruby apparently returns, from the var= method, the value passed in, not the value that's stored. It doesn't matter what the var= method returns. The code below demonstrates this:
class C
attr_reader :foo
def foo=(value)
#foo = (value.is_a? Array) ? (value.clone) : value
end
end
c=C.new
puts "assignment: #{(c.foo ||= []) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo ||= []) << 6}"
puts "c.foo is #{c.foo}"
output is
assignment: [5]
c.foo is []
assignment: [6]
c.foo is [6]
When I posted this as a bug to Hashie, Danielle Sucher explained what was happening and pointed out that "foo.send :bar=, 1" returns the value returned by the bar= method. (Hat tip for the research!) So I guess I could do:
c=C.new
puts "clunky assignment: #{(c.foo || c.send(:foo=, [])) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo || c.send(:foo=, [])) << 6}"
puts "c.foo is #{c.foo}"
which prints
clunky assignment: [5]
c.foo is [5]
assignment: [5, 6]
c.foo is [5, 6]
Is there any more elegant way to do this?
Assignments evaluate to the value that is being assigned. Period.
In some other languages, assignments are statements, so they don't evaluate to anything. Those are really the only two sensible choices. Either don't evaluate to anything, or evaluate to the value being assigned. Everything else would be too surprising.
Since Ruby doesn't have statements, there is really only one choice.
The only "workaround" for this is: don't use assignment.
c.foo ||= []
c.foo << 5
Using two lines of code isn't the end of the world, and it's easier on the eyes.
The prettiest way to do this is to use default value for hash:
# h = Hash.new { [] }
h = Hash.new { |h,k| h[k] = [] }
But be ware that you cant use Hash.new([]) and then << because of way how Ruby store variables:
h = Hash.new([])
h[:a] # => []
h[:b] # => []
h[:a] << 10
h[:b] # => [10] O.o
it's caused by that Ruby store variables by reference, so as we created only one array instance, ad set it as default value then it will be shared between all hash cells (unless it will be overwrite, i.e. by h[:a] += [10]).
It is solved by using constructor with block (doc) Hash.new { [] }. With this each time when new key is created block is called and each value is different array.
EDIT: Fixed error that #Uri Agassi is writing about.

Ruby hash default value behavior

I'm going through Ruby Koans, and I hit #41 which I believe is this:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno","dos"], hash[:one]
assert_equal ["uno","dos"], hash[:two]
assert_equal ["uno","dos"], hash[:three]
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
It could not understand the behavior so I Googled it and found Strange ruby behavior when using Hash default value, e.g. Hash.new([]) that answered the question nicely.
So I understand how that works, my question is, why does a default value such as an integer that gets incremented not get changed during use? For example:
puts "Text please: "
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word] += 1 }
This will take user input and count the number of times each word is used, it works because the default value of 0 is always used.
I have a feeling it has to do with the << operator but I'd love an explanation.
The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.
The question is not whether Arrays are mutable, the question is whether you mutate it.
You can get both the behaviors you see above, just by using Arrays. Observe:
One default Array with mutation
hsh = Hash.new([])
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
One default Array without mutation
hsh = Hash.new([])
hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']
hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array
hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.
A new, different Array every time with mutation
hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }
It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):
get object referenced by i
get it internal value (lets name it raw_tmp)
create new object that internal value is raw_tmp + 1
assign reference to created object to i
So as you can see, we created new object, and i reference now to something different than at the beginning.
In the other hand, when we use Array#<< it works that way:
get object referenced by arr
to it's internal state append given element
So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).
From the doc, setting a default value has the following behaviour:
Returns the default value, the value that would be returned by hsh if key did not exist in hsh. See also Hash::new and Hash#default=.
Therefore, every time frequencies[word] is not set, the value for that individual key is set to 0.
The reason for the discrepancy between the two code blocks is that arrays are mutable in Ruby, while integers are not.

Resources