Iterating through hash of hashes converts elements into Array - ruby

I'm working on processing a response of a Webhook from GitHub. The response contains a hash web_hook_response that looks like this
{:commits=>
{:modified=>
["public/en/landing_pages/dc.json",
"draft/en/landing_pages/careers/ac123.json"]
}
}
Now I have a function that processes this hash.
modified_or_deleted_files = []
web_hook_response[:commits].map do |commit|
modified_or_deleted_files << commit[:removed] << commit[:modified]
end
I get this error
TypeError: no implicit conversion of Symbol into Integer
I tried to find out the value of commit when it's inside the map block and this is what got printed
[:modified,
["public/en/landing_pages/dc.json",
"draft/en/landing_pages/careers/ac123.json"]]
Why is the modified hash converting into an array of a symbol and an array inside the map block? I can't explain why this is happening. Can anyone explain why this is happening?

Data
This is the hash you are given (simplified slightly).
web_hook_response = {
:commits => { :modified => ["public", "draft"] }
}
It has one key-value pair, the key being :commits and the value being the hash
{ :modified => ["public", "draft"] }
which itself has one key (:modified) and one value (["public", "draft"]).
Error
Try this (with my definition of web_hook_response):
web_hook_response[:commits].map do |commit|
puts "commit = #{commit}"
modified_or_deleted_files << commit[:removed] << commit[:modified] # line 397
end
# commit = [:modified, ["public", "draft"]]
# TypeError: no implicit conversion of Symbol into Integer
from (irb):397:in `[]'
from (irb):397:in `block in irb_binding'
from (irb):395:in `each'
from (irb):395:in `map'
Note that commit equals a key-value pair from the hash web_hook_response[:commits]. As you see, an attempt is made to compute
commit[:removed]
#=> [:modified, ["public", "draft"]][:removed]
which is the syntactic sugar form of the conventional expression
[:modified, ["public", "draft"]].[](:removed)
Since [:modified, ["public", "draft"]] is an array, Array#[] is an instance method of the class Array. (Yes, it's a funny name for a method, but that's what it is.) As explained in its doc, the method's argument must be an integer, namely, the index of an element of the array that is to to be returned. Therefore, when Ruby discovers that the argument is a symbol (:removed), she raises the exception, "no implicit conversion of Symbol into Integer".
Computing modified_or_deleted_files
Given the keys :commits and :modified we may extract the hash
h = web_hook_response[:commits]
#=> { :modified=>["public", "draft"] }
and from that extract the array
a = h[:modified]
#=> ["public", "draft"]
We would normally chain these two operations to obtain the array in one statement.
web_hook_response[:commits][:modified]
#=> ["public", "draft"]
It appears you wish to simply set the value of the variable modified_or_deleted_files to this array, so simply write the following.
modified_or_deleted_files = web_hook_response[:commits][:modified]
#=> ["public", "draft"]

web_hook_response[:commits] is a hash, not an array, so when you call map on it, the block parameter commit gets key-value pairs, which are arrays of size 2.
I think what you need is to concatenate 2 arrays. You can
modified_or_deleted_files = web_hook_response[:commits].slice(:modified, :removed).values.flatten

Your :commits is a hash. When iterating through hashes, you usually use two block arguments, one for the key and one for the value, for example:
{ :foo => 'bar' }.each do |key, value|
puts "#{key} = #{value}"
}
# outputs:
# foo = bar
When you only use one block argument you will get a key-value pair in an array:
{ :foo => 'bar' }.each do |pair|
puts pair.inspect
end
# outputs:
# [:foo, "bar"]
In your example you could just do:
commits = web_hook_response[:commits]
modified_or_deleted_files = Array(commits[:removed]) + Array(commits[:modified])
(The Array(...) is used to avoid an error if commits[:removed] or commits[:modified] is nil. Array(nil) returns an empty array, Array(an_array) returns the array)
Or if you want to get fancy with the enumerators, iterators and such:
modified_or_deleted_files = web_hook_response[:commits].
values_at(:modified, :removed).
compact.
reduce(:+)

Related

issue with using inject to convert array to hash

data_arr = [['dog', 'Fido'], ['cat', 'Whiskers'], ['fish', 'Fluffy']]
data_hash = data_arr.inject({}) do |hsh, v|
hsh[v[0]] = v[1]
hsh
end
Hi, why do I not need to initialize data_hash as an empty hash? And why do I have to add hsh in the last line if not it will result in an error.
why do I not need to initialize data_hash as an empty hash?
You do, implicitly. The value passed to inject, i.e. {} will become the initial value for hsh which will eventually become the value for data_hash. According to the documentation:
At the end of the iteration, the final value of memo is the return value for the method.
Let's see what happens if we don't pass {}:
If you do not explicitly specify an initial value for memo, then the first element of collection is used as the initial value of memo.
The first element of your collection is the array ['dog', 'Fido']. If you omit {}, then inject would use that array as the initial value for hsh. The subsequent call to hsh[v[0]] = v[1] would fail, because of:
hsh = ['dog', 'Fido']
hsh['cat'] = 'Whiskers'
#=> TypeError: no implicit conversion of String into Integer
why do I have to add hsh in the last line
Again, let's check the documentation:
[...] the result [of the specified block] becomes the new value for memo.
inject expects you to return the new value for hsh at the end of the block.
if not it will result in an error.
That's because an assignment like hsh[v[0]] = v[1] returns the assigned value, e.g. 'Fido'. So if you omit the last line, 'Fido' becomes the new value for hsh:
hsh = 'Fido'
hsh['cat'] = 'Whiskers'
#=> IndexError: string not matched
There's also each_with_object which works similar to inject, but assumes that you want to mutate the same object within the block. It therefore doesn't require you to return it at the end of the block: (note that the argument order is reversed)
data_hash = data_arr.each_with_object({}) do |v, hsh|
hsh[v[0]] = v[1]
end
#=> {"dog"=>"Fido", "cat"=>"Whiskers", "fish"=>"Fluffy"}
or using array decomposition:
data_hash = data_arr.each_with_object({}) do |(k, v), hsh|
hsh[k] = v
end
#=> {"dog"=>"Fido", "cat"=>"Whiskers", "fish"=>"Fluffy"}
Although to convert your array to a hash you can simply use Array#to_h, which is
[...] interpreting ary as an array of [key, value] pairs
data_arr.to_h
#=> {"dog"=>"Fido", "cat"=>"Whiskers", "fish"=>"Fluffy"}

Ruby Hash Interaction With Pushing Onto Array

So let's say I do the following:
lph = Hash.new([]) #=> {}
lph["passed"] << "LCEOT" #=> ["LCEOT"]
lph #=> {} <-- Expected that to have been {"passed" => ["LCEOT"]}
lph["passed"] #=> ["LCEOT"]
lph["passed"] = lph["passed"] << "HJKL"
lph #=> {"passed"=>["LCEOT", "HJKL"]}
I'm surprised by this. A couple questions:
Why does it not get set until I push the second string on to the array? What is happening in the background?
What is the more idiomatic ruby way to essentially say. I have a hash, a key, and a value I want to to end up in the array associated with the key. How do I push the value in an array associated with a key into a hash the first time. In all future uses of the key, I just want to addd to the array.
Read the Ruby Hash.new documentation carefully - "if this hash is subsequently accessed by a key that doesn’t correspond to a hash entry, the value returned depends on the style of new used to create the hash".
new(obj) → new_hash
...If obj is specified, this single object will be used for all default values.
In your example you attempt to push something onto the value associated with a key which does not exist, so you end up mutating the same anonymous array you used to construct the hash initially.
the_array = []
h = Hash.new(the_array)
h['foo'] << 1 # => [1]
# Since the key 'foo' was not found
# ... the default value (the_array) is returned
# ... and 1 is pushed onto it (hence [1]).
the_array # => [1]
h # {} since the key 'foo' still has no value.
You probably want to use the block form:
new { |hash, key| block } → new_hash
...If a block is specified, it will be called with the hash object and the key, and should return the default value. It is the block’s responsibility to store the value in the hash if required.
For example:
h = Hash.new { |hash, key| hash[key] = [] } # Assign a new array as default for missing keys.
h['foo'] << 1 # => [1]
h['foo'] << 2 # => [1, 2]
h['bar'] << 3 # => [3]
h # => { 'foo' => [1, 2], 'bar' => [3] }
Why does it not get set until I push the second string on to the array?
In short; because you don't set anything in the hash until the point, where you also add the second string to the array.
What is happening in the background?
To see what's happening in the background, let's take this one line at a time:
lph = Hash.new([]) #=> {}
This creates an empty hash, configured to return the [] object whenever a non-existing key is accessed.
lph["passed"] << "LCEOT" #=> ["LCEOT"]
This can be written as
value = lph["passed"] #=> []
value << "LCEOT" #=> ["LCEOT"]
We see that lph["passed"] returns [] as expected, and we then proceed to append "LCEOT" to [].
lph #=> {}
lph is still an empty Hash. At no point have we added anything to the Hash. We have added something to its default value, but that doesn't change lph itself.
lph["passed"] #=> ["LCEOT"]
This is where it gets interesting. Remember above when we did value << ["LCEOT"]. That actually changed the default value that lph returns when a key isn't found. The default value is no longer [], but has become ["LCEOT"]. That new default value is returned here.
lph["passed"] = lph["passed"] << "HJKL"
This is our first change to lph. And what we actually assign to lph["passed"] is the default value (because "passed" is still a non-existing key in lph) with "HJKL" appended. Before this, the default value was ["LCEOT"], after this it is ["LCEOT", "HJKL"].
In other words lph["passed"] << "HJKL" returns ["LCEOT", "HJKL"] which is then assigned to lph["passed"].
What is the more idiomatic Ruby way
Using <<=:
>> lph = Hash.new { [] }
=> {}
>> lph["passed"] <<= "LCEOT"
=> ["LCEOT"]
>> lph
=> {"passed"=>["LCEOT"]}
Also note the change in how the Hash is initialized, using a block instead of a verbatim array. This ensures a new, blank array is created and returned whenever a new key is accessed, as opposed to the same array being used every time.

How to make Ruby var= return value assigned, not value passed in?

There's a nice idiom for adding to lists stored in a hash table:
(hash[key] ||= []) << new_value
Now, suppose I write a derivative hash class, like the ones found in Hashie, which does a deep-convert of any hash I store in it. Then what I store will not be the same object I passed to the = operator; Hash may be converted to Mash or Clash, and arrays may be copied.
Here's the problem. Ruby apparently returns, from the var= method, the value passed in, not the value that's stored. It doesn't matter what the var= method returns. The code below demonstrates this:
class C
attr_reader :foo
def foo=(value)
#foo = (value.is_a? Array) ? (value.clone) : value
end
end
c=C.new
puts "assignment: #{(c.foo ||= []) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo ||= []) << 6}"
puts "c.foo is #{c.foo}"
output is
assignment: [5]
c.foo is []
assignment: [6]
c.foo is [6]
When I posted this as a bug to Hashie, Danielle Sucher explained what was happening and pointed out that "foo.send :bar=, 1" returns the value returned by the bar= method. (Hat tip for the research!) So I guess I could do:
c=C.new
puts "clunky assignment: #{(c.foo || c.send(:foo=, [])) << 5}"
puts "c.foo is #{c.foo}"
puts "assignment: #{(c.foo || c.send(:foo=, [])) << 6}"
puts "c.foo is #{c.foo}"
which prints
clunky assignment: [5]
c.foo is [5]
assignment: [5, 6]
c.foo is [5, 6]
Is there any more elegant way to do this?
Assignments evaluate to the value that is being assigned. Period.
In some other languages, assignments are statements, so they don't evaluate to anything. Those are really the only two sensible choices. Either don't evaluate to anything, or evaluate to the value being assigned. Everything else would be too surprising.
Since Ruby doesn't have statements, there is really only one choice.
The only "workaround" for this is: don't use assignment.
c.foo ||= []
c.foo << 5
Using two lines of code isn't the end of the world, and it's easier on the eyes.
The prettiest way to do this is to use default value for hash:
# h = Hash.new { [] }
h = Hash.new { |h,k| h[k] = [] }
But be ware that you cant use Hash.new([]) and then << because of way how Ruby store variables:
h = Hash.new([])
h[:a] # => []
h[:b] # => []
h[:a] << 10
h[:b] # => [10] O.o
it's caused by that Ruby store variables by reference, so as we created only one array instance, ad set it as default value then it will be shared between all hash cells (unless it will be overwrite, i.e. by h[:a] += [10]).
It is solved by using constructor with block (doc) Hash.new { [] }. With this each time when new key is created block is called and each value is different array.
EDIT: Fixed error that #Uri Agassi is writing about.

Ruby hash default value behavior

I'm going through Ruby Koans, and I hit #41 which I believe is this:
def test_default_value_is_the_same_object
hash = Hash.new([])
hash[:one] << "uno"
hash[:two] << "dos"
assert_equal ["uno","dos"], hash[:one]
assert_equal ["uno","dos"], hash[:two]
assert_equal ["uno","dos"], hash[:three]
assert_equal true, hash[:one].object_id == hash[:two].object_id
end
It could not understand the behavior so I Googled it and found Strange ruby behavior when using Hash default value, e.g. Hash.new([]) that answered the question nicely.
So I understand how that works, my question is, why does a default value such as an integer that gets incremented not get changed during use? For example:
puts "Text please: "
text = gets.chomp
words = text.split(" ")
frequencies = Hash.new(0)
words.each { |word| frequencies[word] += 1 }
This will take user input and count the number of times each word is used, it works because the default value of 0 is always used.
I have a feeling it has to do with the << operator but I'd love an explanation.
The other answers seem to indicate that the difference in behavior is due to Integers being immutable and Arrays being mutable. But that is misleading. The difference is not that the creator of Ruby decided to make one immutable and the other mutable. The difference is that you, the programmer decided to mutate one but not the other.
The question is not whether Arrays are mutable, the question is whether you mutate it.
You can get both the behaviors you see above, just by using Arrays. Observe:
One default Array with mutation
hsh = Hash.new([])
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => ['one', 'two']
# Because we mutated the default value, nonexistent keys return the changed value
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
One default Array without mutation
hsh = Hash.new([])
hsh[:one] += ['one']
hsh[:two] += ['two']
# This is syntactic sugar for hsh[:two] = hsh[:two] + ['two']
hsh[:nonexistant]
# => []
# We didn't mutate the default value, it is still an empty array
hsh
# => { :one => ['one'], :two => ['two'] }
# This time, we *did* mutate the hash.
A new, different Array every time with mutation
hsh = Hash.new { [] }
# This time, instead of a default *value*, we use a default *block*
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => {}
# But we never mutated the hash itself, therefore it is still empty!
hsh = Hash.new {|hsh, key| hsh[key] = [] }
# This time, instead of a default *value*, we use a default *block*
# And the block not only *returns* the default value, it also *assigns* it
hsh[:one] << 'one'
hsh[:two] << 'two'
hsh[:nonexistent]
# => []
# We *did* mutate the default value, but it was a fresh one every time.
hsh
# => { :one => ['one'], :two => ['two'], :nonexistent => [] }
It is because Array in Ruby is mutable object, so you can change it internal state, but Fixnum isn't mutable. So when you increment value using += internally it get that (assume that i is our reference to Fixnum object):
get object referenced by i
get it internal value (lets name it raw_tmp)
create new object that internal value is raw_tmp + 1
assign reference to created object to i
So as you can see, we created new object, and i reference now to something different than at the beginning.
In the other hand, when we use Array#<< it works that way:
get object referenced by arr
to it's internal state append given element
So as you can see it is much simpler, but it can cause some bugs. One of them you have in your question, another one is thread race when booth are trying simultaneously append 2 or more elements. Sometimes you can end with only some of them and with thrashes in memory, when you use += on arrays too, you will get rid of both of these problems (or at least minimise impact).
From the doc, setting a default value has the following behaviour:
Returns the default value, the value that would be returned by hsh if key did not exist in hsh. See also Hash::new and Hash#default=.
Therefore, every time frequencies[word] is not set, the value for that individual key is set to 0.
The reason for the discrepancy between the two code blocks is that arrays are mutable in Ruby, while integers are not.

Ruby hash - cannot loop data

I have this structure of data in a hash:
[{"name"=>"Peter", "surname"=>"Green"}, {"name"=>"Jane", "surname"=>"Miller"}]
But when I try to work with this hash, for example:
puts hash.count # returns nothing
hash.each do |data|
puts data.name # => undefined method `name' for #<Hash:0x00000104bcf9f8>
end
What am I missing?
Array#count without an argument should return the number of elements, but it is more natural to use length or size. And Hash does not have a method name.
puts hash.length
hash.each do |data|
puts data["name"]
end
By the way, what you refer to as hash is actually an array, and is confusing.

Resources