Ruby hash vivification weirdness - ruby

I'm running ruby 2.2.2:
$ ruby -v
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
Here I am initializing a hash with one key :b that has a value of Hash.new({})
irb(main):001:0> a = { b: Hash.new({}) }
=> {:b=>{}}
Now, I'm going to attempt to auto-vivify another hash at a[:b][:c] with a key 'foo' and a value 'bar'
irb(main):002:0> a[:b][:c]['foo'] = 'bar'
=> "bar"
At this point, I expected that a would contain something like:
{ :b => { :c => { 'foo' => 'bar' } } }
However, that is not what I'm seeing:
irb(main):003:0> a
=> {:b=>{}}
irb(main):004:0> a[:b]
=> {}
irb(main):005:0> a[:b][:c]
=> {"foo"=>"bar"}
This differs from the following:
irb(main):048:0> a = { :b => { :c => { "foo" => "bar" } } }
=> {:b=>{:c=>{"foo"=>"bar"}}}
irb(main):049:0> a
=> {:b=>{:c=>{"foo"=>"bar"}}}
So what is going on here?
I suspect this is something to do with Hash.new({}) returning a default value of {}, but I'm not exactly sure how to explain the end result...

Apologies for answering my own question, but I figured out what is happening.
The answer here is that we are assigning into the default hash being returned by a[:b], NOT a[:b] directly.
As before, we're going to create a hash with a single key of b and a value of Hash.new({})
irb(main):068:0> a = { b: Hash.new({}) }
=> {:b=>{}}
As you might expect, this should make things like a[:b][:unknown_key] return an empty hash {}, like so:
irb(main):070:0> a[:b].default
=> {}
irb(main):071:0> a[:b][:unknown_key]
=> {}
irb(main):072:0> a[:b].object_id
=> 70127981905400
irb(main):073:0> a[:b].default.object_id
=> 70127981905420
Notice that the object_id for a[:b] is ...5400 while the object_id for a[:b].default is ...5420
So what happens when we do the assignment from the original question?
a[:b][:c]["foo"] = "bar"
First, a[:b][:c] is resolved:
irb(main):075:0> a[:b][:c].object_id
=> 70127981905420
That's the same object_id as the .default object, because :c is treated the same as :unknown_key from above!
Then, we assign a new key 'foo' with a value 'bar' into that hash.
Indeed, check it out, we've effectively altered the default instead of a[:b]:
irb(main):081:0> a[:b].default
=> {"foo"=>"bar"}
Oops!

The answer is probably not as esoteric as it might seem at the onset, but this is just the way Ruby is handling that Hash.
If your initial Hash is the:
a = { b: Hash.new({}) }
b[:b][:c]['foo'] = 'bar'
Then seeing that each 'layer' of the Hash is just referencing the next element, such that:
a # {:b=>{}}
a[:b] # {}
a[:b][:c] # {"foo"=>"bar"}
a[:b][:c]["foo"] # "bar"
Your idea of:
{ :b => { :c => { 'foo' => 'bar' } } }
Is somewhat already accurate, so it makes me think that you already understand what's happening, but felt unsure of what was happening due to the way IRB was perhaps displaying it.
If I'm missing some element of your question though, feel free to comment and I'll revise my answer. But I feel like you understand Hashes better than you're giving yourself credit for in this case.

Related

Incrementation in Ruby hashes

I'm trying to increment the key in a hash. For example. I'm trying to get this
{:b => "crayons", :c => "colors", :d => "apples"}
to turn into this
{:c => "crayons", :d => "colors", :e => "apples"}
I thought this code would do the trick but it doesn't. What do I need to change?
def hash(correct)
mapping = correct.each{|key, element| key.next}
Hash[correct.map {|key, element| [mapping[key], element]}]
end
Using Enumerable#each_with_object
def hash_correct(hsh)
hsh.each_with_object({}) { |(k,v), hsh| hsh[k.succ] = v }
end
hash_correct({:b => "crayons", :c => "colors", :d => "apples"})
# => {:c=>"crayons", :d=>"colors", :e=>"apples"}
def hash(correct)
Hash[correct.map{|key, element| [key.next, element]}]
end
h = {:b => "crayons", :c => "colors", :d => "apples"}
h.keys.map(&:succ).zip(h.values).to_h
#=> {:c=>"crayons", :d=>"colors", :e=>"apples"}
If the intent were to modify (not keep) the original hash, the update could be done in place:
keys = h.keys.reverse
keys.each { |k| h[k.succ] = h[k] }
h.delete(keys.last)
which could be inscrutablized to:
h.delete(h.keys.reverse.each { |k| h[k.succ] = h[k] }.last)
def hash(correct)
exp_hash = correct.map { | k, v| {k.next => v} }
Hash[*exp_hash.collect{|h| h.to_a}.flatten]
end
correct = {:b => "crayons", :c => "colors", :d => "apples"}
I thought this code would do the trick but it doesn't.
mapping = correct.each{|key, element| key.next}
If you go to the ruby Symbol docs and click on the link for next()...surprise there is no entry for next, but the description at the top of the window says:
succ
Same as sym.to_s.succ.intern.
From that you have to deduce that next() is a synonym for succ(). So Symbol#next/succ converts the symbol to a string by calling to_s(). Well, you know that you are going to get a String returned from to_s, and no matter what you do to that String, e.g. calling String#succ on it, it isn't going to effect some Symbol, e.g. your hash key. Furthermore, if you look at the docs for String#succ, it says
succ -> new_string
...so String#succ creates another String object and calling intern() on that String object, and by the way intern() is just a synonym for to_sym(), once again won't affect some Symbol...and it won't even affect the String object returned by to_s.
Finally, intern() doesn't change the second string object but instead returns a Symbol:
a String
V
key.next => key.to_s.succ.intern => Symbol
^
another String
...and because you didn't do anything with the Symbol returned by intern(), it is discarded.

IRB (apparently) not inspecting hashes correctly

I'm seeing some odd behavior in IRB 1.8.7 with printing hashes. If I initialize my hash with a Hash.new, it appears that my hash is "evaluating" to an empty hash:
irb(main):024:0> h = Hash.new([])
=> {}
irb(main):025:0> h["test"]
=> []
irb(main):026:0> h["test"] << "blah"
=> ["blah"]
irb(main):027:0> h
=> {}
irb(main):028:0> puts h.inspect
{}
=> nil
irb(main):031:0> require 'pp'
=> true
irb(main):032:0> pp h
{}
=> nil
irb(main):033:0> h["test"]
=> ["blah"]
As you can see, the data is actually present in the hash, but trying to print or display it seems to fail. Initialization with a hash literal seems to fix this problem:
irb(main):050:0> hash = { 'test' => ['testval'] }
=> {"test"=>["testval"]}
irb(main):051:0> hash
=> {"test"=>["testval"]}
irb(main):053:0> hash['othertest'] = ['secondval']
=> ["secondval"]
irb(main):054:0> hash
=> {"othertest"=>["secondval"], "test"=>["testval"]}
The issue here is that invoking h["test"] doesn't actually insert a new key into the hash - it just returns the default value, which is the array that you passed to Hash.new.
1.8.7 :010 > a = []
=> []
1.8.7 :011 > a.object_id
=> 70338238506580
1.8.7 :012 > h = Hash.new(a)
=> {}
1.8.7 :013 > h["test"].object_id
=> 70338238506580
1.8.7 :014 > h["test"] << "blah"
=> ["blah"]
1.8.7 :015 > h.keys
=> []
1.8.7 :016 > h["bogus"]
=> ["blah"]
1.8.7 :017 > h["bogus"].object_id
=> 70338238506580
1.8.7 :019 > a
=> ["blah"]
The hash itself is still empty - you haven't assigned anything to it. The data isn't present in the hash - it's present in the array that is returned for missing keys in the hash.
It looks like you're trying to create a hash of arrays. To do so, I recommend you initialize like so:
h = Hash.new { |h,k| h[k] = [] }
Your version isn't working correctly for me, either. The reason why is a little complicated to understand. From the docs:
If obj is specified, this single object will be used for all default values.
I've added the bolding. The rest of the emphasis is as-is.
You're specifying that obj is [], and it's only a default value. It doesn't actually set the contents of the hash to that default value. So when you do h["blah"] << "test", you're really just asking it to return a copy of the default value and then adding "test" to that copy. It never goes into the hash at all. (I need to give Chris Heald credit for explaining this below.)
If instead you give it a block, it calls that block EVERY TIME you do a lookup on a non-existent entry of the hash. So you're not just creating one Array anymore. You're creating one for each entry of the hash.

Difference between :symbol and symbol:?

Just going through a tutorial, and thought somewhere I saw
first_name:
And another place
:first_name
Is this right? What is the difference?
The hash syntax changed in Ruby 1.9.2 to get closer to json.
So:
{ :foo => "bar" }
Is the same as:
{ foo: "bar" }
In all other cases, the colon must come first.
:first_name is a symbol, while first_name: is a Hash key in the new Ruby 1.9.2 syntax.
Hash keys are then converted to symbols:
>> a = { foo: 10 , bar: 20 }
=> {:foo=>10, :bar=>20}
It is the same as writing:
>> a = { :foo => 10, :bar => 20 }
=> {:foo=>10, :bar=>20}

Ruby - Iterating over a Nested Hash and counting values

Despite there being many questions on nested hashes I've not found any solutions to my problem.
I'm pulling in strings and matching each character to a hash like so:
numberOfChars = {}
string.each_char do |c|
RULES.each do |key, value|
if c === key
numberOfChars[value] += 1
end
end
end
This worked fine and would output something like "a appears 3 times" until I realised that my hash needed to be nested, akin to this:
RULES = {
:limb {
:colour {
'a' => 'foo',
'b' => 'bar'
},
'c' => 'baz'
}
}
So how would I go about getting the 'leaf' key and it's value?
While it's iterating over the hash it also needs to count how many times each key appears, e.g. does 'a' appear more than 'b'? If so add a to new hash. But I'm pretty lost as to how that would work in practice without knowing how it'll be iterating over a nested hash to begin with.
It just seems to me an overly convoluted way of doing this but if anyone's got any pointers they'd be hugely appreciated!
Also, if it's not painfully clear already I'm new to Ruby so I'm probably making some fundamental mistakes.
Are you looking for something like this?
RULES = {
:limb => {
:colour => {
'a' => 'foo',
'b' => 'bar'
},
'c' => 'baz',
:color => {
'b' => 'baz'
}
}
}
def count_letters(hash, results = {})
hash.each do |key, value|
if value.kind_of?(Hash)
count_letters(value, results)
else
results[key] = (results[key] || 0) + 1
end
end
results
end
p count_letters(RULES)

Ruby Style: How to check whether a nested hash element exists

Consider a "person" stored in a hash. Two examples are:
fred = {:person => {:name => "Fred", :spouse => "Wilma", :children => {:child => {:name => "Pebbles"}}}}
slate = {:person => {:name => "Mr. Slate", :spouse => "Mrs. Slate"}}
If the "person" doesn't have any children, the "children" element is not present. So, for Mr. Slate, we can check whether he has parents:
slate_has_children = !slate[:person][:children].nil?
So, what if we don't know that "slate" is a "person" hash? Consider:
dino = {:pet => {:name => "Dino"}}
We can't easily check for children any longer:
dino_has_children = !dino[:person][:children].nil?
NoMethodError: undefined method `[]' for nil:NilClass
So, how would you check the structure of a hash, especially if it is nested deeply (even deeper than the examples provided here)? Maybe a better question is: What's the "Ruby way" to do this?
The most obvious way to do this is to simply check each step of the way:
has_children = slate[:person] && slate[:person][:children]
Use of .nil? is really only required when you use false as a placeholder value, and in practice this is rare. Generally you can simply test it exists.
Update: If you're using Ruby 2.3 or later there's a built-in dig method that does what's described in this answer.
If not, you can also define your own Hash "dig" method which can simplify this substantially:
class Hash
def dig(*path)
path.inject(self) do |location, key|
location.respond_to?(:keys) ? location[key] : nil
end
end
end
This method will check each step of the way and avoid tripping up on calls to nil. For shallow structures the utility is somewhat limited, but for deeply nested structures I find it's invaluable:
has_children = slate.dig(:person, :children)
You might also make this more robust, for example, testing if the :children entry is actually populated:
children = slate.dig(:person, :children)
has_children = children && !children.empty?
With Ruby 2.3 we'll have support for the safe navigation operator:
https://www.ruby-lang.org/en/news/2015/11/11/ruby-2-3-0-preview1-released/
has_children now could be written as:
has_children = slate[:person]&.[](:children)
dig is being added as well:
has_children = slate.dig(:person, :children)
Another alternative:
dino.fetch(:person, {})[:children]
You can use the andand gem:
require 'andand'
fred[:person].andand[:children].nil? #=> false
dino[:person].andand[:children].nil? #=> true
You can find further explanations at http://andand.rubyforge.org/.
One could use hash with default value of {} - empty hash. For example,
dino = Hash.new({})
dino[:pet] = {:name => "Dino"}
dino_has_children = !dino[:person][:children].nil? #=> false
That works with already created Hash as well:
dino = {:pet=>{:name=>"Dino"}}
dino.default = {}
dino_has_children = !dino[:person][:children].nil? #=> false
Or you can define [] method for nil class
class NilClass
def [](* args)
nil
end
end
nil[:a] #=> nil
Traditionally, you really had to do something like this:
structure[:a] && structure[:a][:b]
However, Ruby 2.3 added a feature that makes this way more graceful:
structure.dig :a, :b # nil if it misses anywhere along the way
There is a gem called ruby_dig that will back-patch this for you.
def flatten_hash(hash)
hash.each_with_object({}) do |(k, v), h|
if v.is_a? Hash
flatten_hash(v).map do |h_k, h_v|
h["#{k}_#{h_k}"] = h_v
end
else
h[k] = v
end
end
end
irb(main):012:0> fred = {:person => {:name => "Fred", :spouse => "Wilma", :children => {:child => {:name => "Pebbles"}}}}
=> {:person=>{:name=>"Fred", :spouse=>"Wilma", :children=>{:child=>{:name=>"Pebbles"}}}}
irb(main):013:0> slate = {:person => {:name => "Mr. Slate", :spouse => "Mrs. Slate"}}
=> {:person=>{:name=>"Mr. Slate", :spouse=>"Mrs. Slate"}}
irb(main):014:0> flatten_hash(fred).keys.any? { |k| k.include?("children") }
=> true
irb(main):015:0> flatten_hash(slate).keys.any? { |k| k.include?("children") }
=> false
This will flatten all the hashes into one and then any? returns true if any key matching the substring "children" exist.
This might also help.
dino_has_children = !dino.fetch(person, {})[:children].nil?
Note that in rails you can also do:
dino_has_children = !dino[person].try(:[], :children).nil? #
Here is a way you can do a deep check for any falsy values in the hash and any nested hashes without monkey patching the Ruby Hash class (PLEASE don't monkey patch on the Ruby classes, such is something you should not do, EVER).
(Assuming Rails, although you could easily modify this to work outside of Rails)
def deep_all_present?(hash)
fail ArgumentError, 'deep_all_present? only accepts Hashes' unless hash.is_a? Hash
hash.each do |key, value|
return false if key.blank? || value.blank?
return deep_all_present?(value) if value.is_a? Hash
end
true
end
Simplifying the above answers here:
Create a Recursive Hash method whose value cannot be nil, like as follows.
def recursive_hash
Hash.new {|key, value| key[value] = recursive_hash}
end
> slate = recursive_hash
> slate[:person][:name] = "Mr. Slate"
> slate[:person][:spouse] = "Mrs. Slate"
> slate
=> {:person=>{:name=>"Mr. Slate", :spouse=>"Mrs. Slate"}}
slate[:person][:state][:city]
=> {}
If you don't mind creating empty hashes if the value does not exists for the key :)
You can try to play with
dino.default = {}
Or for example:
empty_hash = {}
empty_hash.default = empty_hash
dino.default = empty_hash
That way you can call
empty_hash[:a][:b][:c][:d][:e] # and so on...
dino[:person][:children] # at worst it returns {}
Given
x = {:a => {:b => 'c'}}
y = {}
you could check x and y like this:
(x[:a] || {})[:b] # 'c'
(y[:a] || {})[:b] # nil
Thks #tadman for the answer.
For those who want perfs (and are stuck with ruby < 2.3), this method is 2.5x faster:
unless Hash.method_defined? :dig
class Hash
def dig(*path)
val, index, len = self, 0, path.length
index += 1 while(index < len && val = val[path[index]])
val
end
end
end
and if you use RubyInline, this method is 16x faster:
unless Hash.method_defined? :dig
require 'inline'
class Hash
inline do |builder|
builder.c_raw '
VALUE dig(int argc, VALUE *argv, VALUE self) {
rb_check_arity(argc, 1, UNLIMITED_ARGUMENTS);
self = rb_hash_aref(self, *argv);
if (NIL_P(self) || !--argc) return self;
++argv;
return dig(argc, argv, self);
}'
end
end
end
You can also define a module to alias the brackets methods and use the Ruby syntax to read/write nested elements.
UPDATE: Instead of overriding the bracket accessors, request Hash instance to extend the module.
module Nesty
def []=(*keys,value)
key = keys.pop
if keys.empty?
super(key, value)
else
if self[*keys].is_a? Hash
self[*keys][key] = value
else
self[*keys] = { key => value}
end
end
end
def [](*keys)
self.dig(*keys)
end
end
class Hash
def nesty
self.extend Nesty
self
end
end
Then you can do:
irb> a = {}.nesty
=> {}
irb> a[:a, :b, :c] = "value"
=> "value"
irb> a
=> {:a=>{:b=>{:c=>"value"}}}
irb> a[:a,:b,:c]
=> "value"
irb> a[:a,:b]
=> {:c=>"value"}
irb> a[:a,:d] = "another value"
=> "another value"
irb> a
=> {:a=>{:b=>{:c=>"value"}, :d=>"another value"}}
I don't know how "Ruby" it is(!), but the KeyDial gem which I wrote lets you do this basically without changing your original syntax:
has_kids = !dino[:person][:children].nil?
becomes:
has_kids = !dino.dial[:person][:children].call.nil?
This uses some trickery to intermediate the key access calls. At call, it will try to dig the previous keys on dino, and if it hits an error (as it will), returns nil. nil? then of course returns true.
You can use a combination of & and key? it is O(1) compared to dig which is O(n) and this will make sure person is accessed without NoMethodError: undefined method `[]' for nil:NilClass
fred[:person]&.key?(:children) //=>true
slate[:person]&.key?(:children)

Resources