Why the functional differences in Hash initialization? - ruby

Is there documentation on the differences of initialization? The docs on Hash didn't have anything that would explain the difference.
foo = [1,2,3,4]
test1 = Hash.new([])
test2 = Hash.new{|h,k| h[k] = []}
foo.each do |i|
test1[i] << i
test2[i] << i
end
puts "test 1: #{test1.size}" #0
puts "test 2: #{test2.size}" #4

There is mentioning in the doc. Read the doc:
new(obj) → new_hash
new {|hash, key| block } → new_hash
[...] If obj is specified, this single object will be used for all default values. If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
h = Hash.new("Go Fish")
h["a"] = 100
h["b"] = 200
h["a"] #=> 100
h["c"] #=> "Go Fish"
# The following alters the single default object
h["c"].upcase! #=> "GO FISH"
h["d"] #=> "GO FISH"
h.keys #=> ["a", "b"]
# While this creates a new default object each time
h = Hash.new { |hash, key| hash[key] = "Go Fish: #{key}" }
h["c"] #=> "Go Fish: c"
h["c"].upcase! #=> "GO FISH: C"
h["d"] #=> "Go Fish: d"
h.keys #=> ["c", "d"]

This is a common gotcha. With test1 (the non-block) you are modifying the default object, the thing which you get when the key does not exist in the hash.
foo = [1,2,3,4]
test1 = Hash.new([])
test2 = Hash.new{|h,k| h[k] = []}
foo.each do |i|
test1[i] << i
test2[i] << i
p test1['doesnotexist'] #added line
end
puts "test 1: #{test1.size}" #0
puts "test 2: #{test2.size}" #4
Output:
[1]
[1, 2]
[1, 2, 3]
[1, 2, 3, 4]
test 1: 0
test 2: 4

There's a difference, in some situation it could be significant
test1 = Hash.new([])
test2 = Hash.new{|h,k| h[k] = []}
test1['foo'] #=> []
test2['foo'] #=> []
test1.keys == test2.keys #=> false
The first construction just returns the default value but doesn't do anything with current hash but second construction initialize the hash with key/value where value is calculated by given block.

Related

Interaction between hash value and `<<` operator [duplicate]

This question already has answers here:
Strange, unexpected behavior (disappearing/changing values) when using Hash default value, e.g. Hash.new([])
(4 answers)
Closed 2 years ago.
I expected:
h = Hash.new([])
h['a'] << 'b'
h['a'] << 'c'
h # => {}
to give {'a' => ['b','c']}, not an empty hash.
I also found out that the insert operation targets the default value, because after the code above it is euqal to ['b','c']:
h.default # => ['b','c']
I am looking for an explanation on why it does not work and how to do it optimally so it works.
The reason why your line didn't work is that Hash, upon accessing a missing key, simply returns the default value (whatever you specified), without assigning it to the key. And since your default value is a complex mutable object (and it's the very same object that is returned every time), you get what you observed: all values are shoveled straight into the default value, bypassing the hash. This is probably the most common mistake with hashes and mutable default values.
To do what you want, use the third form of Hash.new
new {|hash, key| block } → new_hash
like this, for example
h = Hash.new {|h, k| h[k] = [] }
It's because you modify this specific object you passed as a default value. So:
h = Hash.new([])
h['a'] << 'b'
h['a'] << 'c'
h['b'] # or h['a'] or h[:virtually_anything]
# => ["b", "c"]
It's because h has no key 'a', you need to initialize it before or it's just a default value reset:
h = Hash.new([])
h['a'] = ['b']
h['a'] << 'c'
h['a'] #=> ["b", "c"]
h #=> {"a"=>["b", "c"]}
This behave the same:
k = Hash.new
k.default = []
While, as explained by Sergio Tulentsev, (https://stackoverflow.com/a/53614695/5239030) this creates the key "on the fly", try this:
k = Hash.new {|h, k| puts "Just created a new key: #{k}"; h[k] = [] }
p k['a'] << 'a'
p k['a'] << 'a'
p k['b'] << 'b'
p k

Multiple sub-hashes out of one hash

I have a hash:
hash = {"a_1_a" => "1", "a_1_b" => "2", "a_1_c" => "3", "a_2_a" => "3",
"a_2_b" => "4", "a_2_c" => "4"}
What's the best way to get the following sub-hashes:
[{"a_1_a" => "1", "a_1_b" => "2", "a_1_c" => "3"},
{"a_2_a" => "3", "a_2_b" => "4", "a_2_c" => "4"}]
I want them grouped by the key, based on the regexp /^a_(\d+)/. I'll have 50+ key/value pairs in the original hash, so something dynamic would work best, if anyone has any suggestions.
If you're only concerned about the middle component you can use group_by to get you most of the way there:
hash.group_by do |k,v|
k.split('_')[1]
end.values.map do |list|
Hash[list]
end
# => [{"a_1_a"=>"1", "a_1_b"=>"2", "a_1_c"=>"3"}, {"a_2_a"=>"3", "a_2_b"=>"4", "a_2_c"=>"4"}]
The final step is extracting the grouped lists and combining those back into the required hashes.
Code
def partition_hash(hash)
hash.each_with_object({}) do |(k,v), h|
key = k[/(?<=_).+(?=_)/]
h[key] = (h[key] || {}).merge(k=>v)
end.values
end
Example
hash = {"a_1_a"=>"1", "a_1_b"=>"2", "a_1_c"=>"3", "a_2_a"=>"3", "a_2_b"=>"4", "a_2_c"=>"4"}
partition_hash(hash)
#=> [{"a_1_a"=>"1", "a_1_b"=>"2", "a_1_c"=>"3"},
# {"a_2_a"=>"3", "a_2_b"=>"4", "a_2_c"=>"4"}]
Explanation
The steps are as follows.
enum = hash.each_with_object({})
#=> #<Enumerator: {"a_1_a"=>"1", "a_1_b"=>"2", "a_1_c"=>"3", "a_2_a"=>"3",
# "a_2_b"=>"4", "a_2_c"=>"4"}:each_with_object({})>
The first element of this enumerator is generated and passed to the block, and the block variables are computed using parallel assignment.
(k,v), h = enum.next
#=> [["a_1_a", "1"], {}]
k #=> "a_1_a"
v #=> "1"
h #=> {}
and the block calculation is performed.
key = k[/(?<=_).+(?=_)/]
#=> "1"
h[key] = (h[key] || {}).merge(k=>v)
#=> h["1"] = (h["1"] || {}).merge("a_1_a"=>"1")
#=> h["1"] = (nil || {}).merge("a_1_a"=>"1")
#=> h["1"] = {}.merge("a_1_a"=>"1")
#=> h["1"] = {"a_1_a"=>"1"}
so now
h #=> {"1"=>{"a_1_a"=>"1"}}
The next value of enum is now generated and passed to the block, and the following calculations are performed.
(k,v), h = enum.next
#=> [["a_1_b", "2"], {"1"=>{"a_1_a"=>"1"}}]
k #=> "a_1_b"
v #=> "2"
h #=> {"1"=>{"a_1_a"=>"1"}}
key = k[/(?<=_).+(?=_)/]
#=> "1"
h[key] = (h[key] || {}).merge(k=>v)
#=> h["1"] = (h["1"] || {}).merge("a_1_b"=>"2")
#=> h["1"] = ({"a_1_a"=>"1"}} || {}).merge("a_1_b"=>"2")
#=> h["1"] = {"a_1_a"=>"1"}}.merge("a_1_b"=>"2")
#=> h["1"] = {"a_1_a"=>"1", "a_1_b"=>"2"}
After the remaining four elements of enum have been passed to the block the following has is returned.
h #=> {"1"=>{"a_1_a"=>"1", "a_1_b"=>"2", "a_1_c"=>"3"},
# "2"=>{"a_2_a"=>"3", "a_2_b"=>"4", "a_2_c"=>"4"}}
The final step is simply to extract the values.
h.values
#=> [{"a_1_a"=>"1", "a_1_b"=>"2", "a_1_c"=>"3"},
# {"a_2_a"=>"3", "a_2_b"=>"4", "a_2_c"=>"4"}]

Move elements of an array to a different array in Ruby

Simple ruby question. Lets say I have an array of 10 strings and I want to move elements at array[3] and array[5] into a totally new array. The new array would then only have the two elements I moved from the first array, AND the first array would then only have 8 elements since two of them have been moved out.
Use Array#slice! to remove the elements from the first array, and append them to the second array with Array#<<:
arr1 = ['Foo', 'Bar', 'Baz', 'Qux']
arr2 = []
arr2 << arr1.slice!(1)
arr2 << arr1.slice!(2)
puts arr1.inspect
puts arr2.inspect
Output:
["Foo", "Baz"]
["Bar", "Qux"]
Depending on your exact situation, you may find other methods on array to be even more useful, such as Enumerable#partition:
arr = ['Foo', 'Bar', 'Baz', 'Qux']
starts_with_b, does_not_start_with_b = arr.partition{|word| word[0] == 'B'}
puts starts_with_b.inspect
puts does_not_start_with_b.inspect
Output:
["Bar", "Baz"]
["Foo", "Qux"]
a = (0..9).map { |i| "el##{i}" }
x = [3, 5].sort_by { |i| -i }.map { |i| a.delete_at(i) }
puts x.inspect
# => ["el#5", "el#3"]
puts a.inspect
# => ["el#0", "el#1", "el#2", "el#4", "el#6", "el#7", "el#8", "el#9"]
As noted in comments, there is some magic to make indices stay in place. This can be avoided by first getting all the desired elements using a.values_at(*indices), then deleting them as above.
Code:
arr = ["null","one","two","three","four","five","six","seven","eight","nine"]
p "Array: #{arr}"
third_el = arr.delete_at(3)
fifth_el = arr.delete_at(4)
first_arr = arr
p "First array: #{first_arr}"
concat_el = third_el + "," + fifth_el
second_arr = concat_el.split(",")
p "Second array: #{second_arr}"
Output:
c:\temp>C:\case.rb
"Array: [\"null\", \"one\", \"two\", \"three\", \"four\", \"five\", \"six\", \"s
even\", \"eight\", \"nine\"]"
"First array: [\"null\", \"one\", \"two\", \"four\", \"six\", \"seven\", \"eight
\", \"nine\"]"
"Second array: [\"three\", \"five\"]"
Why not start deleting from the highest index.
arr = ['Foo', 'Bar', 'Baz', 'Qux']
index_array = [2, 1]
new_ary = index_array.map { |index| arr.delete_at(index) }
new_ary # => ["Baz", "Bar"]
arr # => ["Foo", "Qux"]
Here's one way:
vals = arr.values_at *pulls
arr = arr.values_at *([*(0...arr.size)] - pulls)
Try it.
arr = %w[Now is the time for all Rubyists to code]
pulls = [3,5]
vals = arr.values_at *pulls
#=> ["time", "all"]
arr = arr.values_at *([*(0...arr.size)] - pulls)
#=> ["Now", "is", "the", "for", "Rubyists", "to", "code"]
arr = %w[Now is the time for all Rubyists to code]
pulls = [5,3]
vals = arr.values_at *pulls
#=> ["all", "time"]
arr = arr.values_at *([*(0...arr.size)] - pulls)
#=> ["Now", "is", "the", "for", "Rubyists", "to", "code"]

What is meant: "Hash.new takes a default value for the hash, which is the value of the hash for a nonexistent key"

I'm currently going through the Ruby on Rails tutorial by Michael Hartl
Not understanding the meaning of this statement found in section 4.4.1:
Hashes, in contrast, are different. While the array constructor
Array.new takes an initial value for the array, Hash.new takes a
default value for the hash, which is the value of the hash for a
nonexistent key:
Could someone help explain what is meant by this? I don't understand what the author is trying to get at regarding how hashes differ from arrays in the context of this section of the book
You can always try out the code in irb or rails console to find out what they mean.
Array.new
# => []
Array.new(7)
# => [nil, nil, nil, nil, nil, nil, nil]
h1 = Hash.new
h1['abc']
# => nil
h2 = Hash.new(7)
h2['abc']
# => 7
Arrays and hashes both have a constructor method that takes a value. What this value is used for is different between the two.
For arrays, the value is used to initialize the array (example taken from mentioned tutorial):
a = Array.new([1, 3, 2])
# `a` is equal to [1, 3, 2]
Unlike arrays, the new constructor for hashes doesn't use its passed arguments to initialize the hash. So, for example, typing h = Hash.new('a', 1) does not initialize the hash with a (key, value) pair of a and 1:
h = Hash.new('a', 1) # NO. Does not give you { 'a' => 1 }!
Instead, passing a value to Hash.new causes the hash to use that value as a default when a non-existent key is passed. Normally, hashes return nil for non-existent keys, but by passing a default value, you can have hashes return the default in those cases:
nilHash = { 'x' => 5 }
nilHash['x'] # Return 5, because the key 'x' exists in nilHash
nilHash['foo'] # Returns nil, because there is no key 'foo' in nilHash
defaultHash = Hash.new(100)
defaultHash['x'] = 5
defaultHash['x'] # Return 5, because the key 'x' exists in defaultHash
defaultHash['foo']
# Returns 100 instead of nil, because you passed 100
# as the default value for non-existent keys for this hash
Begin by reading the docs for the class method Hash#new. You will see there are three forms:
new → new_hash
new(obj) → new_hash
new {|hash, key| block } → new_hash
Creating an Empty Hash
The first form is used to create an empty hash:
h = Hash.new #=> {}
which is more commonly written:
h = {} #=> {}
The other two ways of creating a hash with Hash#new establish a default value for a key/value pair when the hash does not already contain the key.
Hash.new with an argument
You can create a hash with a default value in one of two ways:
Hash.new(<default value>)
or
h = Hash.new # or h = {}
h.default = <default value>
Suppose the default value for the hash were 4; that is:
h = Hash.new(4) #=> {}
h[:pop] = 7 #=> 7
h[:pop] += 1 #=> 8
h[:pop] #=> 8
h #=> {:pop=>8}
h[:chips] #=> 4
h #=> {:pop=>8}
h[:chips] += 1 #=> 5
h #=> {:pop=>8, :chips=>5}
h[:chips] #=> 5
Notice that the default value does not affect the value of :pop. That's because it was created with an assignment:
h[:pop] = 7
h[:chips] by itself merely returns the default value (4); it does not add the key/value pair :chips=>4 to the hash! I repeat: it does not add the key/value pair to the hash. That's important!
h[:chips] += 1
is shorthand for:
h[:chips] = h[:chips] + 1
Since the hash h does not have a key :chips when h[:chips] on the right side of the equals sign is evaluated, it returns the default value of 4, then 1 is added to make it 5 and that value is assigned to h[:chips], which adds the key value pair :chips=>5 to the hash, as seen in following line. The last line merely reports the value for the existing key :chips.
So why would you want to establish a default value? I would venture that the main reason is to be able to initialize it with zero, so you can use:
h[k] += 1
instead of
k[k] = (h.key?(k)) ? h[k] + 1 : 1
or the trick:
h[k] = (h[k] ||= 0) + 1
(which only works when hash values are intended to be non-nil). Incidentally, key? is aka has_key?.
Can we make the default a string instead? Of course:
h = Hash.new('magpie')
h[:bluebird] #=> "magpie"
h #=> {}
h[:bluebird] = h[:bluebird] #=> "magpie"
h #=> {:bluebird=>"magpie"}
h[:redbird] = h[:redbird] #=> "magpie"
h #=> {:bluebird=>"magpie", :redbird=>"magpie"}
h[:bluebird] << "jay" #=> "magpiejay"
h #=> {:bluebird=>"magpiejay", :redbird=>"magpiejay"}
You may be scratching your head over the last line: why did h[:bluebird] << "jay" cause h[:redbird] to change?? Perhaps this will explain what's going on here:
h[:robin] #=> "magpiejay"
h[:robin].object_id #=> 2156227520
h[:bluebird].object_id #=> 2156227520
h[:redbird].object_id #=> 2156227520
h[:robin] merely returns the default value, which we see has been changed from "magpie" to "magpiejay". Now look at the object_id's for the default value and for the values associated with the keys :bluebird and :redbird. As you see, all values are the same object, so if we change one, we change all the the others, including the default value. It is now evident why h[:bluebird] << "jay" changed the default value.
We can clarify this further by adding a stately eagle:
h[:eagle] #=> "magpiejay"
h[:eagle] += "starling" #=> "magpiejaystarling"
h[:eagle].object_id #=> 2157098780
h #=> {:bluebird=>"magpiejay", :redbird=>"magpiejay", :eagle=>"magpiejaystarling"}
Because
h[:eagle] += "starling" #=> "magpiejaystarling"
is equivalent to:
h[:eagle] = h[:eagle] + "starling"
we have created a new object on the right side of the equals sign and assigned it to h[:eagle]. That's why the values for the keys :bluebird and :redbird are unaffected and h[:eagle] has a different object_id.
We have the similar problems if we write: Hash.new([]) or Hash.new({}). If there are ever reasons to use those defaults, I'm not aware of them. It certainly can be very useful for the default value to be an empty string, array or hash, but for that you need the third form of Hash.new, which takes a block.
Hash.new with a block
We now consider the third and final version of Hash#new, which takes a block, like so:
Hash.new { |h,k| ??? }
You may be expecting this to be devilishly complex and subtle, certainly much harder to grasp than the other two forms of the method. If so, you'd be wrong. It's actually quite simple, if you think of it as looking like this:
Hash.new { |h,k| h[k] = ??? }
In other words, Ruby is saying to you, "The hash h doesn't have the key k. What would you like it's value to be? Now consider the following:
h7 = Hash.new { |h,k| h[k]=7 }
hs = Hash.new { |h,k| h[k]='cat' }
ha = Hash.new { |h,k| h[k]=[] }
hh = Hash.new { |h,k| h[k]={} }
h7[:a] += 3 #=> 10
hs[:b] << 'nip' #=> "catnip"
ha[:c] << 4 << 6 #=> [4, 6]
ha[:d] << 7 #=> [7]
ha #=> {:c=>[4, 6], :d=>[7]}
hh[:k].merge({b: 4}) #=> {:b=>4}
hh #=> {}
hh[:k].merge!({b: 4} ) #=> {:b=>4}
hh #=> {:k=>{:b=>4}}
Notice that you cannot write ha = Hash.new { |h,k| [] } (or equivalently, ha = Hash.new { [] }) and expect h[k] => [] to be added to the hash. You can do whatever you like within the block; you are neither required nor limited to specifying a value for the key. In effect, within the block Ruby is actually saying, "A key that is not in the hash has been referenced without a value. I'm giving you that reference and also a reference to the hash. That will allow you to add that key with a value to the hash, if that's what you want to do, but what you do in this block is entirely your business."
The default values for the hashes h7, hs, ha and hh are respectively the number 7 (though it would be easier to simply enter 7 as An argument), an empty string, an empty array or an empty hash. Probably the last two are the most common use of Hash#new with a block, as in:
array = [[:a, 1], [:b, 3], [:a, 4], [:b, 6]]
array.each_with_object(Hash.new {|h,k| h[k] = []}) { |(k,v),h| h[k] << v }
#=> {:a=>[1, 4], :b=>[3, 6]}
That's really about all there is to the last form of Hash#new.

Hash with array as key

I'm defining a hash with an array as a key and another array as its value. For example:
for_example = {[0,1] => [:a, :b, :c]}
Everything is as expected below.
my_hash = Hash.new([])
an_array_as_key = [4,2]
my_hash[an_array_as_key] #=> []
my_hash[an_array_as_key] << "the" #=> ["the"]
my_hash[an_array_as_key] << "universal" #=> ["the", "universal"]
my_hash[an_array_as_key] << "answer" #=> ["the", "universal", "answer"]
But if I try to access the keys:
my_hash #=> {}
my_hash.keys #=> []
my_hash.count #=> 0
my_hash.values #=> []
my_hash.fetch(an_array_as_key) # KeyError: key not found: [4, 2]
my_hash.has_key?(an_array_as_key) #=> false
Rehash doesn't help:
my_hash #=> {}
my_hash.rehash #=> {}
my_hash.keys #=> []
But the values are saved:
my_hash[an_array_as_key] #=> ["the", "universal", "answer"]
Am I missing something?
To understand this, You need to understand the difference between Hash::new and Hash::new(ob). Suppose you define a hash object using Hash::new or hash literal {}. Now whenever you will write a code hsh[any_key], there is two kind of output may be seen, if any_key don't exist, then default value nil will be returned,otherwise whatever value is associated with the key will be returned. The same explanation will be applicable if you create any Hash object using Hash.new.
Now Hash.new(ob) is same as Hash.new, with one difference is, you can set any default value you want, for non existent keys of that hash object.
my_hash = Hash.new([])
my_hash[2] # => []
my_hash[2].object_id # => 83664630
my_hash[4] # => []
my_hash[4].object_id # => 83664630
my_hash[3] << 4 # => [4]
my_hash[3] # => [4]
my_hash[3].object_id # => 83664630
my_hash[5] << 8 # => [4, 8]
my_hash[5] # => [4, 8]
my_hash[5].object_id # => 83664630
Now see in the above example my_hash has no keys like 2,3 and 4. But the object_id proved that, all key access results in to return the same array object. my_hash[2] is not adding the key to the hash my_hash, rather trying to access the value of the key 2 if that key exist, otherwise it is returning the default value of my_hash. Remember all lines like my_hash[2],my_hash[3] etc is nothing but a call to Hash#[] method.
But there is a third way to go, may be you are looking for, which is Hash::new {|hash, key| block }.With this style you can add key to the hash object if that key doesn't exist, with a default value of same class instance,but not the same instance., while you are doing actually Hash#[] method call.
my_hash = Hash.new { |hash, key| hash[key] = []}
my_hash[2] # => []
my_hash[2].object_id # => 76312700
my_hash[3] # => []
my_hash[3].object_id # => 76312060
my_hash.keys # => [2, 3]

Resources