Ruby Hash initialised with each_with_object behaving weirdly - ruby

Initializing Ruby Hash like:
keys = [0, 1, 2]
hash = Hash[keys.each_with_object([]).to_a]
is behaving weirdly when trying to insert a value into a key.
hash[0].push('a')
# will result into following hash:
=> {0=>["a"], 1=>["a"], 2=>["a"]}
I am just trying to insert into one key, but it's updating the value of all keys.

Yes, that each_with_object is super-weird in itself. That's not how it should be used. And the problem arises precisely because you mis-use it.
keys.each_with_object([]).to_a
# => [[0, []], [1, []], [2, []]]
You see, even though it looks like these arrays are separate, it's actually the same array in all three cases. That's why if you push an element into one, it appears in all others.
Here's a better way:
h = keys.each_with_object({}) {|key, h| h[key] = []}
# => {0=>[], 1=>[], 2=>[]}
Or, say
h = keys.zip(Array.new(keys.size) { [] }).to_h
Or a number of other ways.
If you don't care about hash having this exact set of keys and simply want all keys to have empty array as default value, that's possible too.
h = Hash.new { |hash, key| hash[key] = [] }

All your keys reference the same array.
A simplified version that explains the problem:
a = []
b = a
a.push('something')
puts a #=> ['something']
puts b #=> ['something']
Even though you have two variables (a and b) there is only one Array Object. So any changes to the array referenced by variable a will change the array referenced by variable b as well. Because it is the same object.
The long version of what you are trying to achieve would be:
keys = [1, 2, 3]
hash = {}
keys.each do |key|
hash[key] = []
end
And a shorter version:
[1, 2 ,3].each_with_object({}) do |key, accu|
accu[key] = []
end

Related

Does Ruby Hash's keep a separate list of read values vs assigned values? [duplicate]

This question already has answers here:
Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])
(4 answers)
Closed 4 years ago.
This is related to Ruby hash default value behavior
But maybe the explanation there doesn't include this part: it seems that Ruby's Hash default value are separate whether you "read it", or see "what is set"?
One example is:
foo = Hash.new([])
foo[123].push("hi")
p foo # => {}
p foo[123] # => ["hi"]
p foo # => {}
How is it that foo[123] has a value, but foo is all empty, is somewhat beyond my comprehension... the only way I can understand it is that Ruby Hash keeps a separate list for the "read" or "getter", while somehow the "internal" assigned value are different.
If one of Ruby's design principles is "to have the least amount of surprise to the programmers", then the foo is empty but foo[123] is something, is somewhat in this case, a surprise to me.
(I haven't seen that in other languages actually... if there is a case where another language has similar behavior, maybe it is easier to make a connection.)
Suppose `
h = Hash.new(:cat)
h[:a] = 1
h[:b] = 2
h #=> {:a=>1, :b=>2}
Now
h[:a] #=> 1
h[:b] #=> 2
h[:c] #=> :cat
h[:d] #=> :cat
h #=> {:a=>1, :b=>2}
h = Hash.new(:cat) defines an empty hash h with a default value of :cat. This means that if h does not have a key k, h[k] will return :cat, nothing more, nothing less. As you can see above, executing h[k] does not change the hash when k is :c or :d.
On the other hand,
h[:c] = h[:c]
#=> :c
h #=> {:a=>1, :b=>2, :c=>:cat}
Confused? Let me write this without the syntactic sugar:
h.[]=(:d, h.[](:d))
#=> :cat
h #=> {:a=>1, :b=>2, :d=>:cat}
The default value is returned by h.[](:d) (i.e., h[:d]) whereas Hash#[]= is an assignment method (that takes two arguments, a key and a value) to which the default does not apply.
A common use of this default is to create a counting hash:
a = [1,3,1,4,2,5,4,4]
h = Hash.new(0)
a.each { |x| h[x] = h[x] + 1 }
h #=> {1=>2, 3=>1, 4=>3, 2=>1, 5=>1}
Initially, when h is empty and x #=> 1, h[1] = h[1] + 1 will evaluate to h[1] = 0 + 1, because (since h has no key 1) h[1] on the right side of the equality is set equal to the default value of zero. The next time 1 is passed to the block (x #=> 1), x[1] = x[1] + 1, which equals x[1] = 1 + 1. This time the default value is not used because h now has a key 1.
This would normally be written (incidentally):
a.each_with_object(Hash.new(0)) { |x,h| h[x] += 1 }
#=> {1=>2, 3=>1, 4=>3, 2=>1, 5=>1}
One generally does not want the default value to be a collection, such as an array or hash. Consider the following:
h = Hash.new([])
[1,2,3].map { |n| h[n] = h[n] }
h #=> {1=>[], 2=>[], 3=>[]}
Now observe:
h[1] << 2
h #=> {1=>[2], 2=>[2], 3=>[2]}
This is normally not the desired behaviour. It has happened because
h.map { |k,v| v.object_id }
#=> [25886508, 25886508, 25886508]
That is, all the values are the same object, so if the value of one key is changed the values of all other keys are changed as well.
The way around this is to use a block when defining the hash:
h = Hash.new { |h,k| h[k]=[] }
[1,2,3].each { |n| h[n] = h[n] }
h #=> {1=>[], 2=>[], 3=>[]}
h[1] << 2
h #=> {1=>[2], 2=>[], 3=>[]}
h.map { |k,v| v.object_id }
#=> [24172884, 24172872, 24172848]
When the hash h does not have a key k the block { |h,k| h[k]=[] } is executed and returns an empty array specific to that key.
The statement:
foo = Hash.new([])
creates a new Hash that has an empty array ([] as default value). The default value is the value returned by Hash::[] when its argument is not a key present in the hash.
The statement:
foo[123]
invokes Hash::[] and, because the hash is empty (the key 123 is not present in the hash), it returns a reference to the default value which is an object of type Array.
The statement above doesn't create the 123 key in the hash.
Ruby objects are always passed and returned by reference. This means that the statement above doesn't return a copy of the default value of the hash but a reference to it.
The statement:
foo[123].push("hi")
modifies the above mentioned array. Now, the default value of the foo hash is not an empty array any more; it is the array ["hi"]. But the has is still empty; none of the above statements added some (key, value) pair to it.
How is it that foo[123] has a value
foo[123] doesn't have any value, the key 123 is not present in the hash (the hash is empty). A subsequent call to foo[123] returns a reference to the default value again and the default value now it's ["hi"]. And a call to foo[456] or foo['abc'] also returns a reference to the same default value.
You didn't actually change the value of key 123, you're just accessing the default value [] you provided during initialization. You can confirm this if you inspect a different value like foo[0].
If you would do this:
foo[123] = ["hi"]
you could see the new entry, because you've created a new array under the key 123.
Edit
When you call foo[123].push("hi"), you're mutating the (default) value instead of adding a new entry.
Calling foo[123] += ["hi"] creates a new array under the given key, replacing the previous one if it existed, which will show the behavior you desire.
Printing out the hash with:
p foo
only prints the values stored in the hash. It does not display the default value (or anything added to the default array).
When you execute:
p foo[123]
Because 123 does not exist, it access the default value.
If you added two values to the default value:
foo[123].push("hi")
foo[456].push("hello")
your output would be:
p foo # => {}
p foo[123] # => ["hi","hello"]
p foo # => {}
Here, poo[123] does again still not exist, so it prints out the contents of the default value.

Modifying an existing hash value by x and returning the hash

I'm trying to increment all values of a hash by a given amount and return the hash. I am expecting:
add_to_value({"a" => 1, "c" => 2,"b"=> 3}, 1)
# => {"a" => 2, "c" => 3,"b"=> 4}
I'm thinking:
def add_to_value(hash, x)
hash.each {|key,value| value + x}
end
This returns:
{"a"=>1, "b"=>3, "c"=>2}
Why is hash sorted alphabetically?
You're super close, without any extra gems needed:
def add_to_value(hash, x)
hash.each {|key,value| hash[key] += x }
end
Just iterate the hash and update each value-by-key. #each returns the object being iterated on, so the result will be the original hash, which has been modified in place.
If you want a copy of the original hash, you can do that pretty easily, too:
def add_to_value(hash, x)
hash.each.with_object({}) {|(key, value), out| out[key] = value + x }
end
That'll define a new empty hash, pass it to the block, where it collects the new values. The new hash is returned from #with_object, and is thus returned out of add_to_value.
You can do the following to increment values:
hash = {}
{"a" => 1, "c" => 2,"b"=> 3}.each {|k,v| hash[k]=v+1}
hash
=>{"a"=>2, "c"=>3, "b"=>4}
And the hash will be sorted as you want.
The problem becomes trivial if you use certain gems, such as y_support. Type in your command line gem install y_support, and enjoy the extended hash iterators:
require 'y_support/core_ext/hash'
h = { "a"=>1, "c"=>3, "b"=>2 }
h.with_values do |v| v + 1 end
#=> {"a"=>2, "c"=>4, "b"=>3}
As for your sorting problem, I was unable to reproduce it.
Of course, a less elegant solution is possible without installing a gem:
h.each_with_object Hash.new do |(k, v), h| h[k] = v + 1 end
The gem also gives you Hash#with_keys (which modifies keys) and Hash#modify (which modifies both keys and values, kind of mapping from hash to hash), and banged versions Hash#with_values!, #with_keys! that modify the hash in place.

What is meant: "Hash.new takes a default value for the hash, which is the value of the hash for a nonexistent key"

I'm currently going through the Ruby on Rails tutorial by Michael Hartl
Not understanding the meaning of this statement found in section 4.4.1:
Hashes, in contrast, are different. While the array constructor
Array.new takes an initial value for the array, Hash.new takes a
default value for the hash, which is the value of the hash for a
nonexistent key:
Could someone help explain what is meant by this? I don't understand what the author is trying to get at regarding how hashes differ from arrays in the context of this section of the book
You can always try out the code in irb or rails console to find out what they mean.
Array.new
# => []
Array.new(7)
# => [nil, nil, nil, nil, nil, nil, nil]
h1 = Hash.new
h1['abc']
# => nil
h2 = Hash.new(7)
h2['abc']
# => 7
Arrays and hashes both have a constructor method that takes a value. What this value is used for is different between the two.
For arrays, the value is used to initialize the array (example taken from mentioned tutorial):
a = Array.new([1, 3, 2])
# `a` is equal to [1, 3, 2]
Unlike arrays, the new constructor for hashes doesn't use its passed arguments to initialize the hash. So, for example, typing h = Hash.new('a', 1) does not initialize the hash with a (key, value) pair of a and 1:
h = Hash.new('a', 1) # NO. Does not give you { 'a' => 1 }!
Instead, passing a value to Hash.new causes the hash to use that value as a default when a non-existent key is passed. Normally, hashes return nil for non-existent keys, but by passing a default value, you can have hashes return the default in those cases:
nilHash = { 'x' => 5 }
nilHash['x'] # Return 5, because the key 'x' exists in nilHash
nilHash['foo'] # Returns nil, because there is no key 'foo' in nilHash
defaultHash = Hash.new(100)
defaultHash['x'] = 5
defaultHash['x'] # Return 5, because the key 'x' exists in defaultHash
defaultHash['foo']
# Returns 100 instead of nil, because you passed 100
# as the default value for non-existent keys for this hash
Begin by reading the docs for the class method Hash#new. You will see there are three forms:
new → new_hash
new(obj) → new_hash
new {|hash, key| block } → new_hash
Creating an Empty Hash
The first form is used to create an empty hash:
h = Hash.new #=> {}
which is more commonly written:
h = {} #=> {}
The other two ways of creating a hash with Hash#new establish a default value for a key/value pair when the hash does not already contain the key.
Hash.new with an argument
You can create a hash with a default value in one of two ways:
Hash.new(<default value>)
or
h = Hash.new # or h = {}
h.default = <default value>
Suppose the default value for the hash were 4; that is:
h = Hash.new(4) #=> {}
h[:pop] = 7 #=> 7
h[:pop] += 1 #=> 8
h[:pop] #=> 8
h #=> {:pop=>8}
h[:chips] #=> 4
h #=> {:pop=>8}
h[:chips] += 1 #=> 5
h #=> {:pop=>8, :chips=>5}
h[:chips] #=> 5
Notice that the default value does not affect the value of :pop. That's because it was created with an assignment:
h[:pop] = 7
h[:chips] by itself merely returns the default value (4); it does not add the key/value pair :chips=>4 to the hash! I repeat: it does not add the key/value pair to the hash. That's important!
h[:chips] += 1
is shorthand for:
h[:chips] = h[:chips] + 1
Since the hash h does not have a key :chips when h[:chips] on the right side of the equals sign is evaluated, it returns the default value of 4, then 1 is added to make it 5 and that value is assigned to h[:chips], which adds the key value pair :chips=>5 to the hash, as seen in following line. The last line merely reports the value for the existing key :chips.
So why would you want to establish a default value? I would venture that the main reason is to be able to initialize it with zero, so you can use:
h[k] += 1
instead of
k[k] = (h.key?(k)) ? h[k] + 1 : 1
or the trick:
h[k] = (h[k] ||= 0) + 1
(which only works when hash values are intended to be non-nil). Incidentally, key? is aka has_key?.
Can we make the default a string instead? Of course:
h = Hash.new('magpie')
h[:bluebird] #=> "magpie"
h #=> {}
h[:bluebird] = h[:bluebird] #=> "magpie"
h #=> {:bluebird=>"magpie"}
h[:redbird] = h[:redbird] #=> "magpie"
h #=> {:bluebird=>"magpie", :redbird=>"magpie"}
h[:bluebird] << "jay" #=> "magpiejay"
h #=> {:bluebird=>"magpiejay", :redbird=>"magpiejay"}
You may be scratching your head over the last line: why did h[:bluebird] << "jay" cause h[:redbird] to change?? Perhaps this will explain what's going on here:
h[:robin] #=> "magpiejay"
h[:robin].object_id #=> 2156227520
h[:bluebird].object_id #=> 2156227520
h[:redbird].object_id #=> 2156227520
h[:robin] merely returns the default value, which we see has been changed from "magpie" to "magpiejay". Now look at the object_id's for the default value and for the values associated with the keys :bluebird and :redbird. As you see, all values are the same object, so if we change one, we change all the the others, including the default value. It is now evident why h[:bluebird] << "jay" changed the default value.
We can clarify this further by adding a stately eagle:
h[:eagle] #=> "magpiejay"
h[:eagle] += "starling" #=> "magpiejaystarling"
h[:eagle].object_id #=> 2157098780
h #=> {:bluebird=>"magpiejay", :redbird=>"magpiejay", :eagle=>"magpiejaystarling"}
Because
h[:eagle] += "starling" #=> "magpiejaystarling"
is equivalent to:
h[:eagle] = h[:eagle] + "starling"
we have created a new object on the right side of the equals sign and assigned it to h[:eagle]. That's why the values for the keys :bluebird and :redbird are unaffected and h[:eagle] has a different object_id.
We have the similar problems if we write: Hash.new([]) or Hash.new({}). If there are ever reasons to use those defaults, I'm not aware of them. It certainly can be very useful for the default value to be an empty string, array or hash, but for that you need the third form of Hash.new, which takes a block.
Hash.new with a block
We now consider the third and final version of Hash#new, which takes a block, like so:
Hash.new { |h,k| ??? }
You may be expecting this to be devilishly complex and subtle, certainly much harder to grasp than the other two forms of the method. If so, you'd be wrong. It's actually quite simple, if you think of it as looking like this:
Hash.new { |h,k| h[k] = ??? }
In other words, Ruby is saying to you, "The hash h doesn't have the key k. What would you like it's value to be? Now consider the following:
h7 = Hash.new { |h,k| h[k]=7 }
hs = Hash.new { |h,k| h[k]='cat' }
ha = Hash.new { |h,k| h[k]=[] }
hh = Hash.new { |h,k| h[k]={} }
h7[:a] += 3 #=> 10
hs[:b] << 'nip' #=> "catnip"
ha[:c] << 4 << 6 #=> [4, 6]
ha[:d] << 7 #=> [7]
ha #=> {:c=>[4, 6], :d=>[7]}
hh[:k].merge({b: 4}) #=> {:b=>4}
hh #=> {}
hh[:k].merge!({b: 4} ) #=> {:b=>4}
hh #=> {:k=>{:b=>4}}
Notice that you cannot write ha = Hash.new { |h,k| [] } (or equivalently, ha = Hash.new { [] }) and expect h[k] => [] to be added to the hash. You can do whatever you like within the block; you are neither required nor limited to specifying a value for the key. In effect, within the block Ruby is actually saying, "A key that is not in the hash has been referenced without a value. I'm giving you that reference and also a reference to the hash. That will allow you to add that key with a value to the hash, if that's what you want to do, but what you do in this block is entirely your business."
The default values for the hashes h7, hs, ha and hh are respectively the number 7 (though it would be easier to simply enter 7 as An argument), an empty string, an empty array or an empty hash. Probably the last two are the most common use of Hash#new with a block, as in:
array = [[:a, 1], [:b, 3], [:a, 4], [:b, 6]]
array.each_with_object(Hash.new {|h,k| h[k] = []}) { |(k,v),h| h[k] << v }
#=> {:a=>[1, 4], :b=>[3, 6]}
That's really about all there is to the last form of Hash#new.

How to merge array index values and create a hash

I'm trying to convert an array into a hash by using some matching. Before converting the array into a hash, I want to merge the values like this
"Desc,X1XXSC,C,CCCC4524,xxxs,xswd"
and create a hash from it. The rule is that, first value of the array is the key in Hash, in array there are repeating keys, for those keys I need to merge values and place it under one key. "Desc:" are keys. My program looks like this.
p 'test sample application'
str = "Desc:X1:C:CCCC:Desc:XXSC:xxxs:xswd:C:4524"
arr = Array.new
arr = str.split(":")
p arr
test_hash = Hash[*arr]
p test_hash
I could not find a way to figure it out. If any one can guide me, It will be thankful.
Functional approach with Facets:
require 'facets'
str.split(":").each_slice(2).map_by { |k, v| [k, v] }.mash { |k, vs| [k, vs.join] }
#=> {"Desc"=>"X1XXSC", "C"=>"CCCC4524", "xxxs"=>"xswd"}
Not that you cannot do it without Facets, but it's longer because of some basic abstractions missing in the core:
Hash[str.split(":").each_slice(2).group_by(&:first).map { |k, gs| [k, gs.map(&:last).join] }]
#=> {"Desc"=>"X1XXSC", "C"=>"CCCC4524", "xxxs"=>"xswd"}
A small variation on #Sergio Tulentsev's solution:
str = "Desc:X1:C:CCCC:Desc:XXSC:xxxs:xswd:C:4524"
str.split(':').each_slice(2).each_with_object(Hash.new{""}){|(k,v),h| h[k] += v}
# => {"Desc"=>"X1XXSC", "C"=>"CCCC4524", "xxxs"=>"xswd"}
str.split(':') results in an array; there is no need for initializing with arr = Array.new
each_slice(2) feeds the elements of this array two by two to a block or to the method following it, like in this case.
each_with_object takes those two elements (as an array) and passes them on to a block, together with an object, specified by:
(Hash.new{""}) This object is an empty Hash with special behaviour: when a key is not found then it will respond with a value of "" (instead of the usual nil).
{|(k,v),h| h[k] += v} This is the block of code which does all the work. It takes the array with the two elements and deconstructs it into two strings, assigned to k and v; the special hash is assigned to h. h[k] asks the hash for the value of key "Desc". It responds with "", to which "X1" is added. This is repeated until all elements are processed.
I believe you're looking for each_slice and each_with_object here
str = "Desc:X1:C:CCCC:Desc:XXSC:xxxs:xswd:C:4524"
hash = str.split(':').each_slice(2).each_with_object({}) do |(key, value), memo|
memo[key] ||= ''
memo[key] += value
end
hash # => {"Desc"=>"X1XXSC", "C"=>"CCCC4524", "xxxs"=>"xswd"}
Enumerable#slice_before is a good way to go.
str = "Desc:X1:C:CCCC:Desc:XXSC:xxxs:xswd:C:4524"
a = ["Desc","C","xxxs"] # collect the keys in a separate collection.
str.split(":").slice_before(""){|i| a.include? i}
# => [["Desc", "X1"], ["C", "CCCC"], ["Desc", "XXSC"], ["xxxs", "xswd"], ["C", "4524"]]
hsh = str.split(":").slice_before(""){|i| a.include? i}.each_with_object(Hash.new("")) do |i,h|
h[i[0]] += i[1]
end
hsh
# => {"Desc"=>"X1XXSC", "C"=>"CCCC4524", "xxxs"=>"xswd"}

Sort items in a nested hash by their values

I'm being sent a nested hash that needs to be sorted by its values. For example:
#foo = {"a"=>{"z"=>5, "y"=>3, "x"=>88}, "b"=>{"a"=>2, "d"=>-5}}
When running the following:
#foo["a"].sort{|a,b| a[1]<=>b[1]}
I get:
[["y", 3], ["z", 5], ["x", 88]]
This is great, it's exactly what I want. The problem is I'm not always going to know what all the keys are that are being sent to me so I need some sort of loop. I tried to do the following:
#foo.each do |e|
e.sort{|a,b| a[1]<=>b[1]}
end
This to me makes sense since if I manually call #foo.first[0] I get
"a"
and #foo.first[1] returns
{"z"=>5, "y"=>3, "x"=>8}
but for some reason this isn't sorting properly (e.g. at all). I assume this is because the each is calling sort on the entire hash object rather than on "a"'s values. How do I access the values of the nested hash without knowing what it's key is?
You might want to loop over the hash like this:
#foo.each do |key, value|
#foo[key] = value.sort{ |a,b| a[1]<=>b[1] }
end
#foo = {"a"=>{"z"=>5, "y"=>3, "x"=>88}, "b"=>{"a"=>2, "d"=>-5}}
#bar = Hash[ #foo.map{ |key,values| [ key, values.sort_by(&:last) ] } ]
Or, via a less-tricky path:
#bar = {}
#foo.each do |key,values|
#bar[key] = values.sort_by{ |key,value| value }
end
In both cases #bar turns out to be:
p #bar
#=> {
#=> "a"=>[["y", 3], ["z", 5], ["x", 88]],
#=> "b"=>[["d", -5], ["a", 2]]
#=> }
My coworker came up with a slightly more flexible solution that will recursively sort an array of any depth:
def deep_sort_by(&block)
Hash[self.map do |key, value|
[if key.respond_to? :deep_sort_by
key.deep_sort_by(&block)
else
key
end,
if value.respond_to? :deep_sort_by
value.deep_sort_by(&block)
else
value
end]
end.sort_by(&block)]
end
You can inject it into all hashes and then just call it like this:
myMap.deep_sort_by { |obj| obj }
The code would be similar for an array. We published it as a gem for others to use, see blog post for additional details.
Disclaimer: I work for this company.
in your example e is an temporary array containing a [key,value] pair. In this case, the character key and the nested hash. So e.sort{|a,b|...} is going to try to compare the character to the hash, and fails with a runtime error. I think you probably meant to type e[1].sort{...}. But even that is not going to work correctly, because you don't store the sorted hash anywhere: #foo.each returns the original #foo and leaves it unchanged.
The better solution is the one suggested by #Pan Thomakos:
#foo.each do |key, value|
#foo[key] = value.sort{ |a,b| a[1]<=>b[1] }
end

Resources