String values as hash keys, copy created - ruby

Experimenting in irb with strings and noticed that when a variable referencing a string is used as a key value in a Hash, a new copy of the String is created rather than a reference to the original object, this isn't the case with an Array:
1.9.3-p448 :051 > a = 'str1'
=> "str1"
1.9.3-p448 :052 > b = 'str2'
=> "str2"
1.9.3-p448 :053 > arr = [a,b]
=> ["str1", "str2"]
1.9.3-p448 :054 > arr[0].object_id == a.object_id
=> true
1.9.3-p448 :055 > hash = { a => b }
=> {"str1"=>"str2"}
1.9.3-p448 :056 > hash.keys[0].object_id == a.object_id
=> false
I understand if I just stuck to symbols I wouldn't be asking this question.
What is the purpose for making a copy of the String? I understand that a string comparison would still work but surely an object_id comparison would be quicker?

From Hash.[]= documentation:
key should not have its value changed while it is in use as a key (an
unfrozen String passed as a key will be duplicated and frozen).
Since by default, strings are not immutable in ruby, theoretically you can change them after you set them as keys in your hash. If you do that - your hash will become invalid, as it will not be able to find those keys properly.
Since string are ubiquitous and are often used by reference, this way Ruby protects its hashes from unexpected bugs, which are very hard to detect.

Most of the usual kinds of keys are immutable: numbers, symbols, dates. Strings however are mutable, and as Uri Agassi writes, Ruby protects the hash from bugs. It does not do so for arrays used as keys, perhaps for performance reasons (possibly large arrays) or perhaps arrays are not commonly used as keys. Hashes normally compare by the result of the hash method which every object has. If you want it to compare by object_id then you can switch it on: hash.compare_by_identity.

Related

In Ruby, how to choose whether a symbol or string to be used in a given scenario?

I see this answer in various websites:
If the contents (the sequence of characters) of the object are
important, use a string If the identity of the object is important,
use a symbol
But, what does this actually mean? Please give me an explanation which even a layman can understand.
a = :foo
b = :foo
a and b refer to the same object in memory (same identity)
a.object_id # => 898908
b.object_id # => 898908
Strings behave differently
a = 'foo'
b = 'foo'
a.object_id # => 70127643805220
b.object_id # => 70127643805200
So, you use strings to store data and perform manipulations on data (replace characters or whatnot) and you use symbols to name things (keys in a hash or something). Also see this answer for more use cases for symbol.

is there a way to get a random element from an array and also delete it just that single element

I want:
1.9.3p392 :015 > a=["birds","things","people","people"]
=> ["birds", "things", "people","people"]
1.9.3p392 :016 > a.sample
=> "people"
1.9.3p392 :017 > a
=> ["birds", "things","people"]
1.9.3p392 :018 >
but doesn't look like sample supports this. Anything I'm missing in sample's arguments? I'm aware I could I could delete with what is returned but that will delete ALL the members that are that value not just that single instance.
thx
Here is a more traditional version I initially suggested which removes the element by index. This method mutates the original array, but also preserves order.
a = ["birds","things","people","people"]
i = rand(a.size)
# delete_at returns the element removed, or nil
elm = a.delete_at(i)
Here is another solution that doesn't have side-effects (it utilizes a one of the sample-size forms). This approach changes the order of the remaining elements and may be "less efficient" for large arrays.
a = ["birds","things","people","people"]
elm, *rest = a.sample(a.size)
I actually would not use the 2nd solution and would manually duplicate the array first if that is what I wanted - after reviewing it, it seems too "clever" and convoluted.
You could do this:
a = ["birds","things","people","people"]
a.delete_at(a.find_index(a.sample))

Why is a string key for a hash frozen?

According to the specification, strings that are used as a key to a hash are duplicated and frozen. Other mutable objects do not seem to have such special consideration. For example, with an array key, the following is possible.
a = [0]
h = {a => :a}
h.keys.first[0] = 1
h # => {[1] => :a}
h[[1]] # => nil
h.rehash
h[[1]] # => :a
On the other hand, a similar thing cannot be done with a string key.
s = "a"
h = {s => :s}
h.keys.first.upcase! # => RuntimeError: can't modify frozen String
Why is string designed to be different from other mutable objects when it comes to a hash key? Is there any use case where this specification becomes useful? What other consequences does this specification have?
I actually have a use case where absence of such special specification about strings may be useful. That is, I read with the yaml gem a manually written YAML file that describes a hash. the keys may be strings, and I would like to allow case insensitivity in the original YAML file. When I read a file, I might get a hash like this:
h = {"foo" => :foo, "Bar" => :bar, "BAZ" => :baz}
And I want to normalize the keys to lower case to get this:
h = {"foo" => :foo, "bar" => :bar, "baz" => :baz}
by doing something like this:
h.keys.each(&:downcase!)
but that returns an error for the reason explained above.
In short it's just Ruby trying to be nice.
When a key is entered in a Hash, a special number is calculated, using the hash method of the key. The Hash object uses this number to retrieve the key. For instance, if you ask what the value of h['a'] is, the Hash calls the hash method of string 'a' and checks if it has a value stored for that number. The problem arises when someone (you) mutates the string object, so the string 'a' is now something else, let's say 'aa'. The Hash would not find a hash number for 'aa'.
The most common types of keys for hashes are strings, symbols and integers. Symbols and integers are immutable, but strings are not. Ruby tries to protect you from the confusing behaviour described above by dupping and freezing string keys. I guess it's not done for other types because there could be nasty performance side effects (think of large arrays).
Immutable keys make sense in general because their hash codes will be stable.
This is why strings are specially-converted, in this part of MRI code:
if (RHASH(hash)->ntbl->type == &identhash || rb_obj_class(key) != rb_cString) {
st_insert(RHASH(hash)->ntbl, key, val);
}
else {
st_insert2(RHASH(hash)->ntbl, key, val, copy_str_key);
}
In a nutshell, in the string-key case, st_insert2 is passed a pointer to a function that will trigger the dup and freeze.
So if we theoretically wanted to support immutable lists and immutable hashes as hash keys, then we could modify that code to something like this:
VALUE key_klass;
key_klass = rb_obj_class(key);
if (key_klass == rb_cArray || key_klass == rb_cHash) {
st_insert2(RHASH(hash)->ntbl, key, val, freeze_obj);
}
else if (key_klass == rb_cString) {
st_insert2(RHASH(hash)->ntbl, key, val, copy_str_key);
}
else {
st_insert(RHASH(hash)->ntbl, key, val);
}
Where freeze_obj would be defined as:
static st_data_t
freeze_obj(st_data_t obj)
{
return (st_data_t)rb_obj_freeze((VALUE) obj);
}
So that would solve the specific inconsistency that you observed, where the array-key was mutable. However to be really consistent, more types of objects would need to be made immutable as well.
Not all types, however. For example, there'd be no point to freezing immediate objects like Fixnum because there is effectively only one instance of Fixnum corresponding to each integer value. This is why only String needs to be special-cased this way, not Fixnum and Symbol.
Strings are a special exception simply as a matter of convenience for Ruby programmers, because strings are very often used as hash keys.
Conversely, the reason that other object types are not frozen like this, which admittedly leads to inconsistent behavior, is mostly a matter of convenience for Matz & Company to not support edge cases. In practice, comparatively few people will use a container object like an array or a hash as a hash key. So if you do so, it's up to you to freeze before insertion.
Note that this is not strictly about performance, because the act of freezing a non-immediate object simply involves flipping the FL_FREEZE bit on the basic.flags bitfield that's present on every object. That's of course a cheap operation.
Also speaking of performance, note that if you are going to use string keys, and you are in a performance-critical section of code, you might want to freeze your strings before doing the insertion. If you don't, then a dup is triggered, which is a more-expensive operation.
Update #sawa pointed out that leaving your array-key simply frozen means the original array might be unexpectedly immutable outside of the key-use context, which could also be an unpleasant surprise (although otoh it would serve you right for using an array as a hash-key, really). If you therefore surmise that dup + freeze is the way out of that, then you would in fact incur possible noticeable performance cost. On the third hand, leave it unfrozen altogether, and you get the OP's original weirdness. Weirdness all around. Another reason for Matz et al to defer these edge cases to the programmer.
See this thread on the ruby-core mailing list for an explanation (freakily, it happened to be the first mail I stumbled across when I opened up the mailing list in my mail app!).
I've no idea about the first part of your question, but hHere is a practical answer for the 2nd part:
new_hash = {}
h.each_pair do |k,v|
new_hash.merge!({k.downcase => v})
end
h.replace new_hash
There's lots of permutations of this kind of code,
Hash[ h.map{|k,v| [k.downcase, v] } ]
being another (and you're probably aware of these, but sometimes it's best to take the practical route:)
You are askin 2 different questions: theoretical and practical. Lain was the first to answer, but I would like to provide what I consider a proper, lazier solution to your practical question:
Hash.new { |hsh, key| # this block get's called only if a key is absent
downcased = key.to_s.downcase
unless downcased == key # if downcasing makes a difference
hsh[key] = hsh[downcased] if hsh.has_key? downcased # define a new hash pair
end # (otherways just return nil)
}
The block used with Hash.new constructor is only invoked for those missing keys, that are actually requested. The above solution also accepts symbols.
A very old question - but if anyone else is trying to answer the "how can I get around the hash keys are freezing strings" part of the question...
A simple trick you could do to solve the String special case is:
class MutableString < String
end
s = MutableString.new("a")
h = {s => :s}
h.keys.first.upcase! # => RuntimeError: can't modify frozen String
puts h.inspect
Doesn't work unless you are creating the keys, and unless you are then careful that it doesn't cause any problems with anything that strictly requires that the class is exactly "String"

Usage of integers as hash keys

Is it appropriate to use integers as keys in a Ruby hash?
Every example from documentation shows a string or symbol being used as a key, but never an integer.
Internally, would integers somehow get converted to strings? I have seen some conflicting information on the subject.
In other words, is there any significant disadvantage to using integer keys to a hash?
Others looking at the answers here might find it interesting to know that an exception happens when you use integers as symbol keys in a Ruby hash {symbol: value}
hash = {1: 'one'} # will not work
hash = {1 => 'one'} # will work
Requested Explanation:
The simplest answer for why the first example fails is probably that to_sym is not a method that's been implemented for Fixnum integers.
To go more in depth to maybe explaining why that is, one of the main benefits to using symbols is that two symbols are in fact "the same object". Or at least they share the same object ids.
:foo.object_id == :foo.object_id
=> true
Strings that are the same do not share the same objects, and therefore do not share the same object ids.
"foo".object_id == "foo".object_id
=> false
Like symbols, Fixnum integers that are the same will have the same object ids. Therefore you don't really need to convert them into symbols.
one = 1
=> 1
uno = 1
=> 1
one.object_id
=> 3
one.object_id == uno.object_id
=> true
of course you can use integers as keys...
h = {1 => 'one', 2 => 'two', 3 => 'three'}
(1..3).each do |i|
puts h[i]
end
=>
one
two
there
irb is your friend! try it..
In fact you can use any Ruby object as the key (or the value).
We usually don't think about using Hashes like this, but it could be quite useful.
Edit:
As Óscar López points out, the object just has to respond to .hash for it to work as a key in a Ruby Hash.
The only requirement for using an object as a hash key is that it must respond to the message hash with a hash value, and the hash value for a given key must not change. For instance, if you call this:
1.hash()
You can see that the number 1 indeed responds to the hash message
There are already answers about the is it possible?.
An explanation, why there are no examples with integers as Hash-keys.
Hash-keys have (most of the times) a meaning. It may be an attribute name and its value (e.g. :color => 'red'...).
When you have an integer as a key, your semantic may be 'first, second ...' (1). But then you don't use a hash, but an array to store your values.
(1) A counterexample may be a foreign key in a database.

How to convert a ruby integer into a symbol

I have a Ruby array like this
q_id = [1,2,3,4,5,...,100]
I want to iterate through the array and convert into a hash like this
{
:1 => { #some hash} ,
:2 => { #another hash},
...
:100 => {#yet another hash}
}
What is the shortest and most elegant way to accomplish this?
[EDIT : the to_s.to_sym while being handy is not how I want it. Apologies for not mentioning it earlier.]
For creating a symbol, either of these work:
42.to_s.to_sym
:"#{42}"
The #inspect representation of these shows :"42" only because :42 is not a valid Symbol literal. Rest assured that the double-quotes are not part of the symbol itself.
To create a hash, there is no reason to convert the keys to symbols, however. You should simply do this:
q_id = (1..100).to_a
my_hash_indexed_by_value = {}
q_id.each{ |val| my_hash_indexed_by_value[val] = {} }
Or this:
my_hash = Hash[ *q_id.map{ |v| [v,{}] }.flatten ]
Or this:
# Every time a previously-absent key is indexed, assign and return a new hash
my_hash = Hash.new{ |h,val| h[val] = {} }
With all of these you can then index your hash directly with an integer and get a unique hash back, e.g.
my_hash[42][:foo] = "bar"
Unlike JavaScript, where every key to an object must be a string, Hashes in Ruby accept any object as the key.
To translate an integer into a symbol, use to_s.to_sym .. e.g.,:
1.to_s.to_sym
Note that a symbol is more related to a string than an integer. It may not be as useful for things like sorting anymore.
Actually "symbol numbers" aren't a thing in Ruby (try to call the to_sym method on a number). The benefit of using symbols in a hash is about performance, since they always have the same object_id (try to call object_id on strings, booleans, numbers, and symbols).
Numbers are immediate value and, like Symbol objects, they always have the same object_id.
Anyway, using the new hash syntax implies using symbols as keys, but you can always use the old good "hash rocket" syntax
awesome_hash = { 1 => "hello", 2 => "my friend" }
Read about immediate values here:
https://books.google.de/books?id=jcUbTcr5XWwC&pg=PA73&lpg=PA73&dq=immediate+values+singleton+method&source=bl&ots=fIFlAe8xjy&sig=j7WgTA1Cft0WrHwq40YdTA50wk0&hl=en&sa=X&ei=0kHSUKCVB-bW0gHRxoHQAg&redir_esc=y#v=onepage&q&f=false
If you are creating a hard-coded constant numeric symbol, there's a simpler way:
:'99'
This produces the same results as the more complex methods in other answers:
irb(main):001:0> :'99'
=> :"99"
irb(main):002:0> :"#{99}"
=> :"99"
irb(main):003:0> 99.to_s.to_sym
=> :"99"
Of course, this will not work if you're dynamically creating a symbol from a variable, in which case one of the other two approaches is required.
As already stated, :1 is not a valid symbol. Here's one way to do what you're wanting, but with the keys as strings:
Hash[a.collect{|n| [n.to_s, {}] }]
An array of the objects you want in your hash would be so much easier to use, wouldn't it? Even a hash of integers would work pretty well, wouldn't it?
u can use
1.to_s.to_sym
but this will make symbols like :"1"
You can make symbolic keys with Hash[]:
a = Hash[(1..100).map{ |x| ["#{x}".to_sym, {}] }]
Check type of hash keys:
puts a.keys.map(&:class)
=>
Symbol
...
Symbol
Symbol

Resources