Testing order using a hashmap - ruby

I'm a ruby beginner and reading that a Hash does not have order. I tried to play with that concept but found I could still order things like so:
Travel_Plans = Hash.new
Travel_Plans[4] = "Colorado Springs"
Travel_Plans[1] = "Santa Fe"
Travel_Plans[2] = "Raton"
Travel_Plans[5] = "Denver"
Travel_Plans[3] = "Pueblo"
puts Travel_Plans.sort
Could someone explain what is meant by "Hash does not have order"?
If you could provide a simple example that would be great.

Ruby's Hash class represents a "hash map" or "key-value dictionary" in conventional terms. These are intended to be structures that allow quick random-access to individual elements, but the elements themselves have no intrinsic ordering.
Internally Ruby's Hash organizes elements into their various locations in memory using the hash method each object must provide to be used as a key. Ruby's Hash is unusually, if not ludicrously flexible, in that an object, any object, can be used as a key, and it's preserved exactly as-is. Contrast with JavaScript where keys must be strings and strings only.
That means you can do this:
{ 1 => 'Number One', '1' => 'String One', :one => 'Symbol One', 1.0 => 'Float One }
Where that has four completely different keys.
This is in contrast to Array where the ordering is an important part of how an array works. You don't want to have a queue where things go in one order and come out in another.
Now Ruby's Hash class used to have no intrinsic order, but due to popular demand now it stores order in terms of insertion. That is, the first items inserted are "first". Normally you don't depend on this behaviour explicitly, but it does show up if you're paying attention:
a = { x: '1', y: '2' }
# => {:x=>"1, :y=>"2"}
b = { }
b[:y] = '2'
b[:x] = '1'
b
# => {:y=>"2", :x=>"1"}
Note that the order of the keys in b are reversed due to inserting them in reverse order. They're still equivalent though:
a == b
# => true
When you call sort on a Hash you actually end up converting it to an array of key/value pairs, then sorting each of those:
b.sort
# => [[:x, "1"], [:y, "2"]]
Which you could convert back into a Hash if you want:
b.sort.to_h
# => {:x=>"1", :y=>"2"}
So now it's "ordered" properly. In practice this rarely matters, though, as you'll be accessing the keys individually as necessary. b[:x] doesn't care where the :x key is, it always returns the right value regardless.
Some things to note about Ruby:
Don't use Hash.new, instead just use { } to represent an empty Hash structure.
Don't use capital letters for variables, they have significant meaning in Ruby. Travel_Plans is a constant, not a variable, because it starts with a capital letter. Those are reserved for ClassName and CONSTANT_NAME type usage. This should be travel_plans.

First, the statement "[h]ash does not have order" is wrong as of today. It used to be true for really old and outdated versions of Ruby. You seem to have picked outdated information, which would be unreliable as of today.
Second, the code you provided:
puts Travel_Plans.sort
is irrelevant to showing your point that the hash Travel_Plans has, i.e. preserves, the order. What you should have done to check whether the order is preserved, is to simply do:
p Travel_Plans
which would always result in showing the keys in the order 4, 1, 2, 5, 3, which matches the order in which you assigned the key-values to the hash, thus indeed shows that hash preserves the order.

Related

Unexpected FrozenError when appending elements via <<

I have a hash containing names and categories:
hash = {
'Dog' => 'Fauna',
'Rose' => 'Flora',
'Cat' => 'Fauna'
}
and I want to reorganize it so that the names are grouped by their corresponding category:
{
'Fauna' => ['Dog', 'Cat'],
'Flora' => ['Rose']
}
I am adding each names via <<:
new_hash = Hash.new
hash.each do |name , category|
if new_hash.key?(category)
new_file[category] << name
else
new_hash[category] = name
end
end
But I am being told that this operation is being performed on a frozen element:
`<<' : Can’t modify frozen string (FrozenError)
I suppose this is because each yields frozen objects. How can I restructure this code so the '.each' doesn't provide frozen variables?
I needed to add the first name to an array and then that array to the hash.
new_hash = Hash.new
hash.each do |name , category|
if new_hash.key?(category)
new_file[category] << name
else
new_hash[category] = [name] # <- must be an array
end
end
How can I restructure this code so the '.each' doesn't provide frozen variables?
Short answer: you can't.
Hash#each doesn't "provide frozen variables".
First off, there is no such thing as a "frozen variable". Variables aren't frozen. Objects are. The distinction between variables and objects is fundamental, not just in Ruby but in any programming language (and in fact pretty much everywhere else, too). If I have a sticker with the name "Seamus" on it, then this sticker is not you. It is simply a label that refers to you.
Secondly, Hash#each doesn't provide "variables". In fact, it doesn't provide anything that is not in the hash already. It simply yields the objects that are already in the hash.
Note that, in order to avoid confusion and bugs, strings are automatically frozen when used as keys. So, you can't modify string keys. You can either make sure they are correct from the beginning, or you can create a new hash with new string keys. (You can also add the new keys to the existing hash and delete the old keys, but that is a lot of complexity for little gain.)

How symbols equal each others?

When I do :symbol == :symbol I find that its true. They are the same.
If this is the case, how can we create arrays like this:
a = [{:name=>"Michael"},{:name=>"John"}]
Look the below code :
a = [{:name=>"Michael"},{:name=>"John"}]
a.map(&:object_id) # => [70992070, 70992050]
This is because a is an array of Hash, but they are 2 different hash objects. In Ruby, Hash must have uniq key. But 2 different hash can have same named symbols as keys.
You seem to be confused about hash keys. One hash cannot contain the same key twice, but two different hashes can have the same object as a key. For example:
a_key = "hello"
spanish = { a_key => "hola" }
french = { a_key => "bonjour" }
some_array = [spanish, french]
On top of that, it is possible for arrays to contain duplicate objects (e.g. [1, 2, 1] is valid) -- but these aren't even duplicates. Two hashes that contain the same key are still different objects.
There's nothing at all unusual about an array like that. In fact, it's normal for hashes in an array to have keys in common, because usually if you want to put things in an array, it means they have something in common that you can use to deal with them in the same way.

Why is a string key for a hash frozen?

According to the specification, strings that are used as a key to a hash are duplicated and frozen. Other mutable objects do not seem to have such special consideration. For example, with an array key, the following is possible.
a = [0]
h = {a => :a}
h.keys.first[0] = 1
h # => {[1] => :a}
h[[1]] # => nil
h.rehash
h[[1]] # => :a
On the other hand, a similar thing cannot be done with a string key.
s = "a"
h = {s => :s}
h.keys.first.upcase! # => RuntimeError: can't modify frozen String
Why is string designed to be different from other mutable objects when it comes to a hash key? Is there any use case where this specification becomes useful? What other consequences does this specification have?
I actually have a use case where absence of such special specification about strings may be useful. That is, I read with the yaml gem a manually written YAML file that describes a hash. the keys may be strings, and I would like to allow case insensitivity in the original YAML file. When I read a file, I might get a hash like this:
h = {"foo" => :foo, "Bar" => :bar, "BAZ" => :baz}
And I want to normalize the keys to lower case to get this:
h = {"foo" => :foo, "bar" => :bar, "baz" => :baz}
by doing something like this:
h.keys.each(&:downcase!)
but that returns an error for the reason explained above.
In short it's just Ruby trying to be nice.
When a key is entered in a Hash, a special number is calculated, using the hash method of the key. The Hash object uses this number to retrieve the key. For instance, if you ask what the value of h['a'] is, the Hash calls the hash method of string 'a' and checks if it has a value stored for that number. The problem arises when someone (you) mutates the string object, so the string 'a' is now something else, let's say 'aa'. The Hash would not find a hash number for 'aa'.
The most common types of keys for hashes are strings, symbols and integers. Symbols and integers are immutable, but strings are not. Ruby tries to protect you from the confusing behaviour described above by dupping and freezing string keys. I guess it's not done for other types because there could be nasty performance side effects (think of large arrays).
Immutable keys make sense in general because their hash codes will be stable.
This is why strings are specially-converted, in this part of MRI code:
if (RHASH(hash)->ntbl->type == &identhash || rb_obj_class(key) != rb_cString) {
st_insert(RHASH(hash)->ntbl, key, val);
}
else {
st_insert2(RHASH(hash)->ntbl, key, val, copy_str_key);
}
In a nutshell, in the string-key case, st_insert2 is passed a pointer to a function that will trigger the dup and freeze.
So if we theoretically wanted to support immutable lists and immutable hashes as hash keys, then we could modify that code to something like this:
VALUE key_klass;
key_klass = rb_obj_class(key);
if (key_klass == rb_cArray || key_klass == rb_cHash) {
st_insert2(RHASH(hash)->ntbl, key, val, freeze_obj);
}
else if (key_klass == rb_cString) {
st_insert2(RHASH(hash)->ntbl, key, val, copy_str_key);
}
else {
st_insert(RHASH(hash)->ntbl, key, val);
}
Where freeze_obj would be defined as:
static st_data_t
freeze_obj(st_data_t obj)
{
return (st_data_t)rb_obj_freeze((VALUE) obj);
}
So that would solve the specific inconsistency that you observed, where the array-key was mutable. However to be really consistent, more types of objects would need to be made immutable as well.
Not all types, however. For example, there'd be no point to freezing immediate objects like Fixnum because there is effectively only one instance of Fixnum corresponding to each integer value. This is why only String needs to be special-cased this way, not Fixnum and Symbol.
Strings are a special exception simply as a matter of convenience for Ruby programmers, because strings are very often used as hash keys.
Conversely, the reason that other object types are not frozen like this, which admittedly leads to inconsistent behavior, is mostly a matter of convenience for Matz & Company to not support edge cases. In practice, comparatively few people will use a container object like an array or a hash as a hash key. So if you do so, it's up to you to freeze before insertion.
Note that this is not strictly about performance, because the act of freezing a non-immediate object simply involves flipping the FL_FREEZE bit on the basic.flags bitfield that's present on every object. That's of course a cheap operation.
Also speaking of performance, note that if you are going to use string keys, and you are in a performance-critical section of code, you might want to freeze your strings before doing the insertion. If you don't, then a dup is triggered, which is a more-expensive operation.
Update #sawa pointed out that leaving your array-key simply frozen means the original array might be unexpectedly immutable outside of the key-use context, which could also be an unpleasant surprise (although otoh it would serve you right for using an array as a hash-key, really). If you therefore surmise that dup + freeze is the way out of that, then you would in fact incur possible noticeable performance cost. On the third hand, leave it unfrozen altogether, and you get the OP's original weirdness. Weirdness all around. Another reason for Matz et al to defer these edge cases to the programmer.
See this thread on the ruby-core mailing list for an explanation (freakily, it happened to be the first mail I stumbled across when I opened up the mailing list in my mail app!).
I've no idea about the first part of your question, but hHere is a practical answer for the 2nd part:
new_hash = {}
h.each_pair do |k,v|
new_hash.merge!({k.downcase => v})
end
h.replace new_hash
There's lots of permutations of this kind of code,
Hash[ h.map{|k,v| [k.downcase, v] } ]
being another (and you're probably aware of these, but sometimes it's best to take the practical route:)
You are askin 2 different questions: theoretical and practical. Lain was the first to answer, but I would like to provide what I consider a proper, lazier solution to your practical question:
Hash.new { |hsh, key| # this block get's called only if a key is absent
downcased = key.to_s.downcase
unless downcased == key # if downcasing makes a difference
hsh[key] = hsh[downcased] if hsh.has_key? downcased # define a new hash pair
end # (otherways just return nil)
}
The block used with Hash.new constructor is only invoked for those missing keys, that are actually requested. The above solution also accepts symbols.
A very old question - but if anyone else is trying to answer the "how can I get around the hash keys are freezing strings" part of the question...
A simple trick you could do to solve the String special case is:
class MutableString < String
end
s = MutableString.new("a")
h = {s => :s}
h.keys.first.upcase! # => RuntimeError: can't modify frozen String
puts h.inspect
Doesn't work unless you are creating the keys, and unless you are then careful that it doesn't cause any problems with anything that strictly requires that the class is exactly "String"

Cannot understand what the following code does

Can somebody explain to me what the below code is doing. before and after are hashes.
def differences(before, after)
before.diff(after).keys.sort.inject([]) do |diffs, k|
diff = { :attribute => k, :before => before[k], :after => after[k] }
diffs << diff; diffs
end
end
It is from the papertrail differ gem.
It's dense code, no question. So, as you say before and after are hash(-like?) objects that are handed into the method as parameters. Calling before.diff(after) returns another hash, which then immediately has .keys called on it. That returns all the keys in the hash that diff returned. The keys are returned as an array, which is then immediately sorted.
Then we get to the most complex/dense bit. Using inject on that sorted array of keys, the method builds up an array (called diffs inside the inject block) which will be the return value of the inject method.
That array is made up of records of differences. Each record is a hash - built up by taking one key from the sorted array of keys from the before.diff(after) return value. These hashes store the attribute that's being diffed, what it looked like in the before hash and what it looks like in the after hash.
So, in a nutshell, the method gets a bunch of differences between two hashes and collects them up in an array of hashes. That array of hashes is the final return value of the method.
Note: inject can be, and often is, much, much simpler than this. Usually it's used to simply reduce a collection of values to one result, by applying one operation over and over again and storing the results in an accumlator. You may know inject as reduce from other languages; reduce is an alias for inject in Ruby. Here's a much simpler example of inject:
[1,2,3,4].inject(0) do |sum, number|
sum + number
end
# => 10
0 is the accumulator - the initial value. In the pair |sum, number|, sum will be the accumulator and number will be each number in the array, one after the other. What inject does is add 1 to 0, store the result in sum for the next round, add 2 to sum, store the result in sum again and so on. The single final value of the accumulator sum will be the return value. Here 10. The added complexity in your example is that the accumulator is different in kind from the values inside the block. That's less common, but not bad or unidiomatic. (Edit: Andrew Marshall makes the good point that maybe it is bad. See his comment on the original question. And #tokland points out that the inject here is just a very over-complex alternative for map. It is bad.) See the article I linked to in the comments to your question for more examples of inject.
Edit: As #tokland points out in a few comments, the code seems to need just a straightforward map. It would read much easier then.
def differences(before, after)
before.diff(after).keys.sort.map do |k|
{ :attribute => k, :before => before[k], :after => after[k] }
end
end
I was too focused on explaining what the code was doing. I didn't even think of how to simplify it.
It finds the entries in before and after that differ according to the underlying objects, then builds up a list of those differences in a more convenient format.
before.diff(after) finds the entries that differ.
keys.sort gives you the keys (of the map of differences) in sorted order
inject([]) is like map, but starts with diffs initialized to an empty array.
The block creates a diff line (a hash) for each of these differences, and then appends it to diffs.

Usage of integers as hash keys

Is it appropriate to use integers as keys in a Ruby hash?
Every example from documentation shows a string or symbol being used as a key, but never an integer.
Internally, would integers somehow get converted to strings? I have seen some conflicting information on the subject.
In other words, is there any significant disadvantage to using integer keys to a hash?
Others looking at the answers here might find it interesting to know that an exception happens when you use integers as symbol keys in a Ruby hash {symbol: value}
hash = {1: 'one'} # will not work
hash = {1 => 'one'} # will work
Requested Explanation:
The simplest answer for why the first example fails is probably that to_sym is not a method that's been implemented for Fixnum integers.
To go more in depth to maybe explaining why that is, one of the main benefits to using symbols is that two symbols are in fact "the same object". Or at least they share the same object ids.
:foo.object_id == :foo.object_id
=> true
Strings that are the same do not share the same objects, and therefore do not share the same object ids.
"foo".object_id == "foo".object_id
=> false
Like symbols, Fixnum integers that are the same will have the same object ids. Therefore you don't really need to convert them into symbols.
one = 1
=> 1
uno = 1
=> 1
one.object_id
=> 3
one.object_id == uno.object_id
=> true
of course you can use integers as keys...
h = {1 => 'one', 2 => 'two', 3 => 'three'}
(1..3).each do |i|
puts h[i]
end
=>
one
two
there
irb is your friend! try it..
In fact you can use any Ruby object as the key (or the value).
We usually don't think about using Hashes like this, but it could be quite useful.
Edit:
As Óscar López points out, the object just has to respond to .hash for it to work as a key in a Ruby Hash.
The only requirement for using an object as a hash key is that it must respond to the message hash with a hash value, and the hash value for a given key must not change. For instance, if you call this:
1.hash()
You can see that the number 1 indeed responds to the hash message
There are already answers about the is it possible?.
An explanation, why there are no examples with integers as Hash-keys.
Hash-keys have (most of the times) a meaning. It may be an attribute name and its value (e.g. :color => 'red'...).
When you have an integer as a key, your semantic may be 'first, second ...' (1). But then you don't use a hash, but an array to store your values.
(1) A counterexample may be a foreign key in a database.

Resources