Difference between Ruby's .push and << [duplicate] - ruby

This question already has answers here:
Ruby - Difference between Array#<< and Array#push
(5 answers)
Closed 8 years ago.
Here's an example with push:
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
puts #connections # => {1=>[2]}
Here's an example with <<
#connections = Hash.new []
#connections[1] << 2
puts #connections # => {}
For some reason the output (#connections) is different, but why? I'm guessing it has something to do with Ruby object model?
Perhaps the new hash object [] is being create each time, but not saved? But why?

The difference in your code isn't about << vs. push, it's about the fact that you re-assign in one case and don't in the other. The following two pieces of code are equivalent:
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
puts #connections # => {1=>[2]}
#connections = Hash.new []
#connections[1] = (#connections[1] << 2)
puts #connections # => {1=>[2]}
As are these two:
#connections = Hash.new []
#connections[1].push(2)
puts #connections # => {}
#connections = Hash.new []
#connections[1] << 2
puts #connections # => {}
The reason that re-assignment makes a difference here is that accessing a default value, does not automatically add an entry for it to the hash. That is if you have h = Hash.new(0) and then you do p h[0], you'll print 0, but the value of h will still be {} (not {0 => 0}) because the 0 is not added to the hash. If you do h[0] += 1, this will call the []= method on the hash and actually add an entry for 0 to it, so h becomes {0 => 1}.
So when you do #connections[1] << 2 in your code, you get the default array and perform << on it, but you don't store anything in #connections, so it stays {}. When you do #connections[i] = #connections[i].push(2) or #connections[i] = (#connections[i] << 2), you're calling []=, so the entry gets added to the hash.
However you should note that the hash will return a reference to the same array each time, so even if you do add the entry to the hash, it will likely still not behave as you expect once you add more than one entry (since all entries refer to the same array):
#connections = Hash.new []
#connections[1] = #connections[1].push(2)
#connections[2] = #connections[2].push(42)
puts #connections # => {1 => [2, 42], 2 => [2, 42]}
What you really want is a hash that returns a reference to a new array each time that a new key is accessed and that automatically adds an entry for the new array when that happens. To do that you can use the block form of Hash.new like this:
#connections = Hash.new do |h, k|
h[k] = []
end
#connections[1].push(2)
#connections[2].push(42)
puts #connections # => {1 => [2], 2 => [42]}

Note that when you write
h = Hash.new |this_hash, non_existent_key| { this_hash[non_existent_key] = [] }
...Ruby will execute the block whenever you try to lookup a key that doesn't exist, and then return the block's return value. A block is like a def in that all variables inside it(including the parameter variables) are created anew every time the block is called. In addition, note that [] is an Array constructor, and each time it is called, it creates a new array.
A block returns the result of the last statement that was executed in the block, which is the assignment statement:
this_hash[non_existent_key] = []
And an assignment statement returns the right hand side, which will be a reference to the same Array that was assigned to the key in the hash, so any changes to the returned Array will change the Array in the hash.
On the other hand, when you write:
Hash.new([])
The [] constructor creates a new, empty Array; and that Array becomes the argument for Hash.new(). There is no block for ruby to call every time you look up a non existent key, so ruby just returns that one Array as the value for ALL non-existent keys--and very importantly nothing is done to the hash.

Related

Working with Hashes that have a default value

Am learning to code with ruby. I am learning about hashes and i dont understand this code: count = Hash.new(0). It says that the 0 is a default value, but when i run it on irb it gives me an empty hash {}. If 0 is a default value why can't i see something like count ={0=>0}. Or is the zero an accumulator but doesn't go to the keys or values? Thanks
0 will be the fallback if you try to access a key in the hash that doesn't exist
For example:
count = Hash.new -> count['key'] => nil
vs
count = Hash.new(0) -> count['key'] => 0
To expand on the answer from #jeremy-ramos and comment from #mu-is-too-short.
There are two common gotcha's with defaulting hash values in this way.
1. Accidentally shared references.
Ruby uses the exact same object in memory that you pass in as the default value for every missed key.
For an immutable object (like 0), there is no problem. However you might want to write code like:
hash = Hash.new([])
hash[key] << value
or
hash = Hash.new({})
hash[key][second_key] = value
This will not do what you'd expect. Instead of hash[unknown_key] returning a new, empty array or hash it will return the exact same array/hash object for every key.
so doing:
hash = Hash.new([])
hash[key1] << value1
hash[key2] << value2
results in a hash where key1 and key2 both point to the same array object containing [value1, value2]
See related question here
Solution
To solve this you can create a hash with a default block argument instead (which is called whenever a missing key is accessed and lets you assign a value to the missed key)
hash = Hash.new{|h, key| h[key] = [] }
2. Assignment of missed keys with default values
When you access a missing key that returns the default value, you might expect that the hash will now contain that key with the value returned. It does not. Ruby does not modify the hash, it simply returns the default value. So, for example:
hash = Hash.new(0) #$> {}
hash.keys.empty? #$> true
hash[:foo] #$> 0
hash[:foo] == 0 #$> true
hash #$> {}
hash.keys.empty? #$> true
Solution
This confusion is also addressed using the block approach, where they keys value can be explicitly set.
The Hash.new docs are not very clear on this. I hope that the example below clarifies the difference and one of the frequent uses of Hash.new(0).
The first chunk of code uses Hash.new(0). The hash has a default value of 0, and when new keys are encountered, their value is 0. This method can be used to count the characters in the array.
The second chunk of code fails, because the default value for the key (when not assigned) is nil. This value cannot be used in addition (when counting), and generates an error.
count = Hash.new(0)
puts "count=#{count}"
# count={}
%w[a b b c c c].each do |char|
count[char] += 1
end
puts "count=#{count}"
# count={"a"=>1, "b"=>2, "c"=>3}
count = Hash.new
puts "count=#{count}"
%w[a b b c c c].each do |char|
count[char] += 1
# Fails: in `block in <main>': undefined method `+' for nil:NilClass (NoMethodError)
end
puts "count=#{count}"
SEE ALSO:
What's the difference between "Hash.new(0)" and "{}"
TL;DR When you initialize hash using Hash.new you can setup default value or default proc (the value that would be returned if given key does not exist)
Regarding the question to understand this magic firstly you need to know that Ruby hashes have default values. To access default value you can use Hash#default method
This default value by default :) is nil
hash = {}
hash.default # => nil
hash[:key] # => nil
You can set default value with Hash#default=
hash = {}
hash.default = :some_value
hash[:key] # => :some_value
Very important note: it is dangerous to use mutable object as default because of side effect like this:
hash = {}
hash.default = []
hash[:key] # => []
hash[:other_key] << :some_item # will mutate default value
hash[:key] # => [:some_value]
hash.default # => [:some_value]
hash # => {}
To avoid this you can use Hash#default_proc and Hash#default_proc= methods
hash = {}
hash.default_proc # => nil
hash.default_proc = proc { [] }
hash[:key] # => []
hash[:other_key] << :some_item # will not mutate default value
hash[:other_key] # => [] # because there is no this key
hash[:other_key] = [:symbol]
hash[:other_key] << :some_item
hash[:other_key] # => [:symbol, :some_item]
hash[:key] # => [] # still empty array as default
Setting default cancels default_proc and vice versa
hash = {}
hash.default = :default
hash.default_proc = proc { :default_proc }
hash[:key] # => :default_proc
hash.default = :default
hash[:key] # => :default
hash.default_proc # => nil
Going back to Hash.new
When you pass argument to this method, you initialize default value
hash = Hash.new(0)
hash.default # => 0
hash.default_proc # => nil
When you pass block to this method, you initialize default proc
hash = Hash.new { 0 }
hash.default # => nil
hash[:key] # => 0

Why do default Arrays in Hash.new require key value specification? [duplicate]

This question already has answers here:
Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])
(4 answers)
Closed 1 year ago.
If...
variable = Hash.new(0)
...will default to new values being the integer zero without having to specify the associated key, why do I have to use a block and specify the associated key for the new values to default to an array, like so...
variable = Hash.new { |h, k| h[k] = [] }
I read ruby-doc.org but can't seem to find an answer. Perhaps its "under the hood" and I can't see/comprehend it.
For context, the question came up when I couldn't reconcile why the first method didn't work and the second method did:
def find_duplicates1(array)
indices = Hash.new([])
array.each_with_index { |ele, i| indices[ele] << i }
indices.select { |ele, indices| indices.length > 1 }
end
def find_duplicates2(array)
indices = Hash.new { |h, k| h[k] = [] }
array.each_with_index { |ele, i| indices[ele] << i }
indices.select { |ele, indices| indices.length > 1 }
end
Because indices = Hash.new([]) means that when calling it with an unknown key then the [] will be returned. But that empty default array will not be assigned to the former unknown key.
Here an example:
indices = Hash.new([])
indices[:foo] << :bar
indeces
#=> {}
But even worse, because we added a value to the default hash that hash is now not empty anymore and will return the changed default value for all other unknown keys too:
indices[:baz]
#=> [:bar]
Whereas indices = Hash.new { |h, k| h[k] = [] } means that the block will run for all unknown keys and within the block, a new empty array is initialized and that new array is actually assigned to the former unknown key.
indices = Hash.new { |h, k| h[k] = [] }
indices[:foo] << :bar
indices
#=> {:foo=>[:bar]}
indices[:bar]
#=> []
Btw you might be interested in the Enumerable#tally method. By using it your method can be simplified to:
def find_duplicates(array)
array.tally.select { |k, v| v > 1 }.keys
end
It's because the default (whatever object it is) is used as the default. That object will be presented for EVERY undefined instance. They're all pointing to the same object.
For immutable objects (like the integer 0) it doesn't matter because if you replace 0 with 1 for a given key, then the key is pointing to a new object (the integer 1).
But if it's an array object and you "mutate" (change) it like array << "added" then that object... now with added "added", is the default for all future new keys and is likely the object that all existing keys are pointing to. All keys point to the single array object that looks like: ["added"]
By using a block, you are defaulting a NEW array object to the key. If you change the array object by adding an element, the other keys' objects are unchanged (they're different objects).

How do I add an object to an array, where the array is a value to a key in a hash?

So basically my code is as follows
anagrams = Hash.new([])
self.downcase.scan(/\b[a-z]+/i).each do |key|
anagrams[key.downcase.chars.sort] = #push key into array
end
so basically the hash would look like this
anagrams = { "abcdef" => ["fdebca", "edfcba"], "jklm" => ["jkl"]}
Basically what I don't understand is how to push "key" (which is obviously a string) as the value to "eyk"
I've been searching for awhile including documentation and other stackflow questions and this was my best guess
anagrams[key.downcase.chars.sort].push(key)
Your guess:
anagrams[key.downcase.chars.sort].push(key)
is right. The problem is your hash's default value:
anagrams = Hash.new([])
A default value doesn't automatically create an entry in the hash when you reference it, it just returns the value. That means that you can do this:
h = Hash.new([])
h[:k].push(6)
without changing h at all. The h[:k] gives you the default value ([]) but it doesn't add :k as a key. Also note that the same default value is used every time you try to access a key that isn't in the hash so this:
h = Hash.new([])
a = h[:k].push(6)
b = h[:x].push(11)
will leave you with [6,11] in both a and b but nothing in h.
If you want to automatically add defaults when you access them, you'll need to use a default_proc, not a simple default:
anagrams = Hash.new { |h, k] h[k] = [ ] }
That will create the entries when you access a non-existent key and give each one a different empty array.
It's not entirely clear what your method is supposed to do, but I think the problem is that you don't have an array to push a value onto.
In Ruby you can pass a block to Hash.new that tells it what to do when you try to access a key that doesn't exist. This is a handy way to automatically initialize values as empty arrays. For example:
hsh = Hash.new {|hsh, key| hsh[key] = [] }
hsh[:foo] << "bar"
p hsh # => { :foo => [ "bar" ] }
In your method (which I assume you're adding to the String class), you would use it like this:
class String
def my_method
anagrams = Hash.new {|hsh, key| hsh[key] = [] }
downcase.scan(/\b[a-z]+/i).each_with_object(anagrams) do |key|
anagrams[key.downcase.chars.sort.join] << key
end
end
end

Ruby Hash Interaction With Pushing Onto Array

So let's say I do the following:
lph = Hash.new([]) #=> {}
lph["passed"] << "LCEOT" #=> ["LCEOT"]
lph #=> {} <-- Expected that to have been {"passed" => ["LCEOT"]}
lph["passed"] #=> ["LCEOT"]
lph["passed"] = lph["passed"] << "HJKL"
lph #=> {"passed"=>["LCEOT", "HJKL"]}
I'm surprised by this. A couple questions:
Why does it not get set until I push the second string on to the array? What is happening in the background?
What is the more idiomatic ruby way to essentially say. I have a hash, a key, and a value I want to to end up in the array associated with the key. How do I push the value in an array associated with a key into a hash the first time. In all future uses of the key, I just want to addd to the array.
Read the Ruby Hash.new documentation carefully - "if this hash is subsequently accessed by a key that doesn’t correspond to a hash entry, the value returned depends on the style of new used to create the hash".
new(obj) → new_hash
...If obj is specified, this single object will be used for all default values.
In your example you attempt to push something onto the value associated with a key which does not exist, so you end up mutating the same anonymous array you used to construct the hash initially.
the_array = []
h = Hash.new(the_array)
h['foo'] << 1 # => [1]
# Since the key 'foo' was not found
# ... the default value (the_array) is returned
# ... and 1 is pushed onto it (hence [1]).
the_array # => [1]
h # {} since the key 'foo' still has no value.
You probably want to use the block form:
new { |hash, key| block } → new_hash
...If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
For example:
h = Hash.new { |hash, key| hash[key] = [] } # Assign a new array as default for missing keys.
h['foo'] << 1 # => [1]
h['foo'] << 2 # => [1, 2]
h['bar'] << 3 # => [3]
h # => { 'foo' => [1, 2], 'bar' => [3] }
Why does it not get set until I push the second string on to the array?
In short; because you don't set anything in the hash until the point, where you also add the second string to the array.
What is happening in the background?
To see what's happening in the background, let's take this one line at a time:
lph = Hash.new([]) #=> {}
This creates an empty hash, configured to return the [] object whenever a non-existing key is accessed.
lph["passed"] << "LCEOT" #=> ["LCEOT"]
This can be written as
value = lph["passed"] #=> []
value << "LCEOT" #=> ["LCEOT"]
We see that lph["passed"] returns [] as expected, and we then proceed to append "LCEOT" to [].
lph #=> {}
lph is still an empty Hash. At no point have we added anything to the Hash. We have added something to its default value, but that doesn't change lph itself.
lph["passed"] #=> ["LCEOT"]
This is where it gets interesting. Remember above when we did value << ["LCEOT"]. That actually changed the default value that lph returns when a key isn't found. The default value is no longer [], but has become ["LCEOT"]. That new default value is returned here.
lph["passed"] = lph["passed"] << "HJKL"
This is our first change to lph. And what we actually assign to lph["passed"] is the default value (because "passed" is still a non-existing key in lph) with "HJKL" appended. Before this, the default value was ["LCEOT"], after this it is ["LCEOT", "HJKL"].
In other words lph["passed"] << "HJKL" returns ["LCEOT", "HJKL"] which is then assigned to lph["passed"].
What is the more idiomatic Ruby way
Using <<=:
>> lph = Hash.new { [] }
=> {}
>> lph["passed"] <<= "LCEOT"
=> ["LCEOT"]
>> lph
=> {"passed"=>["LCEOT"]}
Also note the change in how the Hash is initialized, using a block instead of a verbatim array. This ensures a new, blank array is created and returned whenever a new key is accessed, as opposed to the same array being used every time.

How to make Ruby var= return value assigned, not value passed in?

There's a nice idiom for adding to lists stored in a hash table:
(hash[key] ||= []) << new_value
Now, suppose I write a derivative hash class, like the ones found in Hashie, which does a deep-convert of any hash I store in it. Then what I store will not be the same object I passed to the = operator; Hash may be converted to Mash or Clash, and arrays may be copied.
Here's the problem. Ruby apparently returns, from the var= method, the value passed in, not the value that's stored. It doesn't matter what the var= method returns. The code below demonstrates this:
class C
attr_reader :foo
def foo=(value)
#foo = (value.is_a? Array) ? (value.clone) : value
end
end
c=C.new
puts "assignment: #{(c.foo ||= []) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo ||= []) << 6}"
puts "c.foo is #{c.foo}"
output is
assignment: [5]
c.foo is []
assignment: [6]
c.foo is [6]
When I posted this as a bug to Hashie, Danielle Sucher explained what was happening and pointed out that "foo.send :bar=, 1" returns the value returned by the bar= method. (Hat tip for the research!) So I guess I could do:
c=C.new
puts "clunky assignment: #{(c.foo || c.send(:foo=, [])) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo || c.send(:foo=, [])) << 6}"
puts "c.foo is #{c.foo}"
which prints
clunky assignment: [5]
c.foo is [5]
assignment: [5, 6]
c.foo is [5, 6]
Is there any more elegant way to do this?
Assignments evaluate to the value that is being assigned. Period.
In some other languages, assignments are statements, so they don't evaluate to anything. Those are really the only two sensible choices. Either don't evaluate to anything, or evaluate to the value being assigned. Everything else would be too surprising.
Since Ruby doesn't have statements, there is really only one choice.
The only "workaround" for this is: don't use assignment.
c.foo ||= []
c.foo << 5
Using two lines of code isn't the end of the world, and it's easier on the eyes.
The prettiest way to do this is to use default value for hash:
# h = Hash.new { [] }
h = Hash.new { |h,k| h[k] = [] }
But be ware that you cant use Hash.new([]) and then << because of way how Ruby store variables:
h = Hash.new([])
h[:a] # => []
h[:b] # => []
h[:a] << 10
h[:b] # => [10] O.o
it's caused by that Ruby store variables by reference, so as we created only one array instance, ad set it as default value then it will be shared between all hash cells (unless it will be overwrite, i.e. by h[:a] += [10]).
It is solved by using constructor with block (doc) Hash.new { [] }. With this each time when new key is created block is called and each value is different array.
EDIT: Fixed error that #Uri Agassi is writing about.

Resources