Hash#compare_by_identity with string literals - ruby

I'm running Ruby 2.2.1.
The following code runs as expected as string hash keys are duped and frozen:
f = 'foo'
h = {f => 'bar'}
h.compare_by_identity
h[f] # => nil
h['foo'] # => nil
h[h.keys.first] # => "bar"
But I can't for the life of me figure out what is going on here:
h = {'foo' => 'bar'}
h.compare_by_identity
h.keys.first.frozen? # => true
'foo'.frozen? # => false
h.keys.first.object_id # => 20421220
'foo'.object_id # => 20067280
h['foo'] # => "bar"
h['foo'.dup] # => nil
It's interesting to note the the docs for #compare_by_identity started using #dup at 2.2.0. So it seems this behavior change is known.
2.1.7:
h1["a"] #=> nil # different objects.
2.2.0:
h1["a".dup] #=> nil # different objects.
However, the source is the same.
The same does not happen with other literals like arrays. Any ideas on why this behavior changed for string literals? The docs give no hints as to why.

Related

Ruby hash vivification weirdness

I'm running ruby 2.2.2:
$ ruby -v
ruby 2.2.2p95 (2015-04-13 revision 50295) [x86_64-linux]
Here I am initializing a hash with one key :b that has a value of Hash.new({})
irb(main):001:0> a = { b: Hash.new({}) }
=> {:b=>{}}
Now, I'm going to attempt to auto-vivify another hash at a[:b][:c] with a key 'foo' and a value 'bar'
irb(main):002:0> a[:b][:c]['foo'] = 'bar'
=> "bar"
At this point, I expected that a would contain something like:
{ :b => { :c => { 'foo' => 'bar' } } }
However, that is not what I'm seeing:
irb(main):003:0> a
=> {:b=>{}}
irb(main):004:0> a[:b]
=> {}
irb(main):005:0> a[:b][:c]
=> {"foo"=>"bar"}
This differs from the following:
irb(main):048:0> a = { :b => { :c => { "foo" => "bar" } } }
=> {:b=>{:c=>{"foo"=>"bar"}}}
irb(main):049:0> a
=> {:b=>{:c=>{"foo"=>"bar"}}}
So what is going on here?
I suspect this is something to do with Hash.new({}) returning a default value of {}, but I'm not exactly sure how to explain the end result...
Apologies for answering my own question, but I figured out what is happening.
The answer here is that we are assigning into the default hash being returned by a[:b], NOT a[:b] directly.
As before, we're going to create a hash with a single key of b and a value of Hash.new({})
irb(main):068:0> a = { b: Hash.new({}) }
=> {:b=>{}}
As you might expect, this should make things like a[:b][:unknown_key] return an empty hash {}, like so:
irb(main):070:0> a[:b].default
=> {}
irb(main):071:0> a[:b][:unknown_key]
=> {}
irb(main):072:0> a[:b].object_id
=> 70127981905400
irb(main):073:0> a[:b].default.object_id
=> 70127981905420
Notice that the object_id for a[:b] is ...5400 while the object_id for a[:b].default is ...5420
So what happens when we do the assignment from the original question?
a[:b][:c]["foo"] = "bar"
First, a[:b][:c] is resolved:
irb(main):075:0> a[:b][:c].object_id
=> 70127981905420
That's the same object_id as the .default object, because :c is treated the same as :unknown_key from above!
Then, we assign a new key 'foo' with a value 'bar' into that hash.
Indeed, check it out, we've effectively altered the default instead of a[:b]:
irb(main):081:0> a[:b].default
=> {"foo"=>"bar"}
Oops!
The answer is probably not as esoteric as it might seem at the onset, but this is just the way Ruby is handling that Hash.
If your initial Hash is the:
a = { b: Hash.new({}) }
b[:b][:c]['foo'] = 'bar'
Then seeing that each 'layer' of the Hash is just referencing the next element, such that:
a # {:b=>{}}
a[:b] # {}
a[:b][:c] # {"foo"=>"bar"}
a[:b][:c]["foo"] # "bar"
Your idea of:
{ :b => { :c => { 'foo' => 'bar' } } }
Is somewhat already accurate, so it makes me think that you already understand what's happening, but felt unsure of what was happening due to the way IRB was perhaps displaying it.
If I'm missing some element of your question though, feel free to comment and I'll revise my answer. But I feel like you understand Hashes better than you're giving yourself credit for in this case.

IRB (apparently) not inspecting hashes correctly

I'm seeing some odd behavior in IRB 1.8.7 with printing hashes. If I initialize my hash with a Hash.new, it appears that my hash is "evaluating" to an empty hash:
irb(main):024:0> h = Hash.new([])
=> {}
irb(main):025:0> h["test"]
=> []
irb(main):026:0> h["test"] << "blah"
=> ["blah"]
irb(main):027:0> h
=> {}
irb(main):028:0> puts h.inspect
{}
=> nil
irb(main):031:0> require 'pp'
=> true
irb(main):032:0> pp h
{}
=> nil
irb(main):033:0> h["test"]
=> ["blah"]
As you can see, the data is actually present in the hash, but trying to print or display it seems to fail. Initialization with a hash literal seems to fix this problem:
irb(main):050:0> hash = { 'test' => ['testval'] }
=> {"test"=>["testval"]}
irb(main):051:0> hash
=> {"test"=>["testval"]}
irb(main):053:0> hash['othertest'] = ['secondval']
=> ["secondval"]
irb(main):054:0> hash
=> {"othertest"=>["secondval"], "test"=>["testval"]}
The issue here is that invoking h["test"] doesn't actually insert a new key into the hash - it just returns the default value, which is the array that you passed to Hash.new.
1.8.7 :010 > a = []
=> []
1.8.7 :011 > a.object_id
=> 70338238506580
1.8.7 :012 > h = Hash.new(a)
=> {}
1.8.7 :013 > h["test"].object_id
=> 70338238506580
1.8.7 :014 > h["test"] << "blah"
=> ["blah"]
1.8.7 :015 > h.keys
=> []
1.8.7 :016 > h["bogus"]
=> ["blah"]
1.8.7 :017 > h["bogus"].object_id
=> 70338238506580
1.8.7 :019 > a
=> ["blah"]
The hash itself is still empty - you haven't assigned anything to it. The data isn't present in the hash - it's present in the array that is returned for missing keys in the hash.
It looks like you're trying to create a hash of arrays. To do so, I recommend you initialize like so:
h = Hash.new { |h,k| h[k] = [] }
Your version isn't working correctly for me, either. The reason why is a little complicated to understand. From the docs:
If obj is specified, this single object will be used for all default values.
I've added the bolding. The rest of the emphasis is as-is.
You're specifying that obj is [], and it's only a default value. It doesn't actually set the contents of the hash to that default value. So when you do h["blah"] << "test", you're really just asking it to return a copy of the default value and then adding "test" to that copy. It never goes into the hash at all. (I need to give Chris Heald credit for explaining this below.)
If instead you give it a block, it calls that block EVERY TIME you do a lookup on a non-existent entry of the hash. So you're not just creating one Array anymore. You're creating one for each entry of the hash.

The confusing Ruby method returns value

I have Ruby code:
def test_111(hash)
n = nil
3.times do |c|
if n
n[c] = c
else
n = hash
end
end
end
a = {}
test_111(a)
p a
Why it print {1=>1, 2=>2}, not the {} ??
In the test_111 method, the hash and the a use the same memory?
How can the a value be changed in the test_111 method?
I can't understand
Hashes are passed by reference. So, when you change a method parameter (which is a Hash), you change the original hash.
To avoid this, you should clone the hash.
test_111(a.dup)
This will create a shallow copy (that is, it will not clone child hashes that you may have).
A little illustration of what shallow copy is:
def mutate hash
hash[:new] = 1
hash[:existing][:value] = 2
hash
end
h = {existing: {value: 1}}
mutate h # => {:existing=>{:value=>2}, :new=>1}
# new member added, existing member changed
h # => {:existing=>{:value=>2}, :new=>1}
h = {existing: {value: 1}}
mutate h.dup # => {:existing=>{:value=>2}, :new=>1}
# existing member changed, no new members
h # => {:existing=>{:value=>2}}
In ruby, just about every object is passed by reference. This means when you do something as simple as
a = b
unless a was one of the simple types, after this assignment a and b will point to the same thing.
This means if you alter the second variable, the first is affected the same way:
irb(main):001:0> x = "a string"
=> "a string"
irb(main):002:0> y = x
=> "a string"
irb(main):003:0> x[1,0] = "nother"
=> "nother"
irb(main):004:0> x
=> "another string"
irb(main):005:0> y
=> "another string"
irb(main):006:0>
and of course the same applies for hashes:
irb(main):006:0> a = { :a => 1 }
=> {:a=>1}
irb(main):007:0> b = a
=> {:a=>1}
irb(main):008:0> a[:b] = 2
=> 2
irb(main):009:0> a
=> {:a=>1, :b=>2}
irb(main):010:0> b
=> {:a=>1, :b=>2}
irb(main):011:0>
If you don't want this to happen, use .dup or .clone:
irb(main):001:0> a = "a string"
=> "a string"
irb(main):002:0> b = a.dup
=> "a string"
irb(main):003:0> a[1,0] = "nother"
=> "nother"
irb(main):004:0> a
=> "another string"
irb(main):005:0> b
=> "a string"
irb(main):006:0>
For most people dup and clone have the same effect.
So if you write a function that modifies one of its parameters, unless you specifically want those changes to be seen by the code that calls the function, you should first dup the parameter being modified:
def test_111(hash)
hash = hash.dup
# etc
end
The behavior of your code is called a side effect - a change to the program's state that isn't a core part of the function. Side effects are generally to be avoided.

Finding out variable class and difference between Hash and Array

Is it possible to clearly identify a class of variable?
something like:
#users.who_r_u? #=>Class (some information)
#packs.who_r_u? #=> Array (some information)
etc.
Can someone provide clear short explanation of difference between Class, Hash, Array, Associated Array, etc. ?
You can use:
#users.class
Test it in irb:
1.9.3p0 :001 > 1.class
=> Fixnum
1.9.3p0 :002 > "1".class
=> String
1.9.3p0 :003 > [1].class
=> Array
1.9.3p0 :004 > {:a => 1}.class
=> Hash
1.9.3p0 :005 > (1..10).class
=> Range
Or:
1.9.3p0 :010 > class User
1.9.3p0 :011?> end
=> nil
1.9.3p0 :012 > #user = User.new
=> #<User:0x0000010111bfc8>
1.9.3p0 :013 > #user.class
=> User
These were only quick irb examples, hope it's enough to see the use of .class in ruby.
You could also use kind_of? to test wheter its receiver is a class, an array or anything else.
#users.kind_of?(Array) # => true
You can find these methods in Ruby document http://ruby-doc.org/core-1.9.3/Object.html
#user.class => User
#user.is_a?(User) => true
#user.kind_of?(User) => true
found helpful: <%= debug #users %>
A difference between Class and Hash? They are too different to even provide normal answer. Hash is basically an array with unique keys, where each key has its associated value. That's why it's also called associative array.
Here is some explanation:
array = [1,2,3,4]
array[0] # => 1
array[-1] # => 4
array[0..2] # => [1,2,3]
array.size # => 4
Check out more Array methods here: http://ruby-doc.org/core-1.9.3/Array.html
hash = {:foo => 1, :bar => 34, :baz => 22}
hash[:foo] # => 1
hash[:bar] # => 34
hash.keys # => [:baz,:foo,:bar]
hash.values # => [34,22,1]
hash.merge :foo => 3921
hash # => {:bar => 34,:foo => 3921,:baz => 22 }
Hash never keeps order of the elments you added to it, it just preserves uniqueness of keys, so you can easily retreive values.
However, if you do this:
hash.merge "foo" => 12
you will get
hash # => {:bar => 34, baz => 22, "foo" => 12, :foo => 2}
It created new key-value pair since :foo.eql? "foo" returns false.
For more Hash methods check out this: http://www.ruby-doc.org/core-1.9.3/Hash.html
Class object is a bit too complex to explain in short, but if you want to learn more about it, reffer to some online tutorials.
And remember, API is your friend.

How can I check a word is already all uppercase?

I want to be able to check if a word is already all uppercase. And it might also include numbers.
Example:
GO234 => yes
Go234 => no
You can compare the string with the same string but in uppercase:
'go234' == 'go234'.upcase #=> false
'GO234' == 'GO234'.upcase #=> true
a = "Go234"
a.match(/\p{Lower}/) # => #<MatchData "o">
b = "GO234"
b.match(/\p{Lower}/) # => nil
c = "123"
c.match(/\p{Lower}/) # => nil
d = "µ"
d.match(/\p{Lower}/) # => #<MatchData "µ">
So when the match result is nil, it is in uppercase already, else something is in lowercase.
Thank you #mu is too short mentioned that we should use /\p{Lower}/ instead to match non-English lower case letters.
I am using the solution by #PeterWong and it works great as long as the string you're checking against doesn't contain any special characters (as pointed out in the comments).
However if you want to use it for strings like "Überall", just add this slight modification:
utf_pattern = Regexp.new("\\p{Lower}".force_encoding("UTF-8"))
a = "Go234"
a.match(utf_pattern) # => #<MatchData "o">
b = "GO234"
b.match(utf_pattern) # => nil
b = "ÜÖ234"
b.match(utf_pattern) # => nil
b = "Über234"
b.match(utf_pattern) # => #<MatchData "b">
Have fun!
You could either compare the string and string.upcase for equality (as shown by JCorc..)
irb(main):007:0> str = "Go234"
=> "Go234"
irb(main):008:0> str == str.upcase
=> false
OR
you could call arg.upcase! and check for nil. (But this will modify the original argument, so you may have to create a copy)
irb(main):001:0> "GO234".upcase!
=> nil
irb(main):002:0> "Go234".upcase!
=> "GO234"
Update: If you want this to work for unicode.. (multi-byte), then string#upcase won't work, you'd need the unicode-util gem mentioned in this SO question

Resources