Behavior of altered array keys in hashes - ruby

Ruby allows for a mutable object to be used as a hash key, and I was curious how this worked when the object is updated. It seems like the referenced object is irretrievable from key requests if it's updated.
key = [1,2]
test = {key => 12}
test # => {[1, 2] => 12}
test[key] # => 12
test[[1,2]] # => 12
test[[1,2,3]] # => nil
key << 3
test # => {[1, 2, 3] => 12}
test[key] # => nil
test[[1,2]] # => nil
test[[1,2,3]] # => nil
Why does this work this way? Why can't I provide a key to the hash which will return the value associated with the list I original used as a key?

According to the documentation:
Two objects refer to the same hash key when their hash value is identical and the two objects are eql? to each other.
Mutating a key doesn't change the hash it's stored under. After you mutate the key, trying to index with [1,2] matches the hash but not eql?, while [1,2,3] matches the eql? but isn't found by hash.
See this article for a more elaborate explanation.
You can rehash test, however, to recalculate the hashes based on current key values:
test.rehash
test[[1,2,3]] # => 12

class D
end
p D.new.methods.include?(:hash) #=> true
# so the D instance has a hash method. What does it do?
p D.new.hash #=> -332308361 # just some number
(Almost) every object in Ruby has a hash method. The Hash calls this method when the object is used as a key, and uses the resulting number to store and retrieve the key. (There are smart procedures to handle duplicate numbers (hash collisions)). Retrieving goes like this:
a_hash[[1,2,3]]
# the a_hash calls the hash method to the [1,2,3] object
# and checks if it has stored a value for the resulting number.
This number is only created once: when the key is added to the hash instance.
Problems arise when you start messing with the key after including it in a hash: the hashmethod of the object will differ from the one stored in the hash.
Don't do that, or
consider not using mutable objects as keys, or
remember to do a timely:
a_hash.rehash
which will recalculate all hash numbers.
Note: For strings keys, a copy is used for calculating the hash number, so modifying the original key won't matter.

It would be inconvenient if the identity of an array matters as the hash key. If you have a hash with a key [1, 2], you want to be able to access that with a different array object [1, 2] that has the same content. You want access by the content, not the identity. That would mean that what particular object (with the particular object id) is stored as a key does not matter for a hash. All that matters is the content of the key at the time it was assigned to the hash.
Therefore, after doing key << 3, it makes sense that test[key] or test[[1, 2, 3]] does not return the stored value anymore because key at the time of assignment to test was [1, 2].
The tricky thing is that test[[1, 2]] also returns nil. That is the limitation of Ruby.
If you want the hash to reflect the change made in the key objects, there is a method Hash#rehash.
test.rehash
test[key] # => 12
test[[1,2]] # => nil
test[[1,2,3]] # => 12

Related

How to get the index of a key in a hash?

I'm trying to get the index of a key in a hash.
I know how to do this in an array:
arr = ['Done', 13, 0.4, true]
a = arr.index('Done')
puts a
Is there a method or some sort of way to do this something like this with a key in a hash? Thanks!
Hashes aren't usually treated as ordered structures, they simply have a list of keys and values corresponding to those keys.
It's true that in Ruby hashes are technically ordered, but there's very rarely an actual use case for treating them as such.
If what you want to do is find the key corresponding to a value in a hash, you can simply use the Hash#key method:
hash = { a: 1, b: 2 }
hash.key(1) # => :a
I suppose you could use hash.keys.index(hash.key(1)) to get 0 since it's the first value, but again, I wouldn't advise doing this because it's not typical use of the data structure
There are at least a couple ways you can get this information, the 2 that come to mind are Enumerable's find_index method to pass each element to a block and check for your key:
hash.find_index { |key, _| key == 'Done' }
or you could get all the keys from your hash as an array and then look up the index as you've been doing:
hash.keys.index('Done')

what is the mean of * and flatten in ruby

I am new to ruby language so when I was trying to sort a hash by value
I used this method to sort:
movie_popularity.sort_by{|m,p| p}.reverse
but the the sort method returns an array while I need a hash to be returned so I used this command:
movie_popularity=Hash[*movie_popularity.sort_by{|m,p| p}.reverse.flatten]
my Question is what is the meaning of * and flatten in the above line?
Thanks =)
The * is called the "splat operator"; I'm not sure I could give you the technical definition (though I'm sure you'd find it soon enough with Google's help), but the way I'd describe it is that it basically takes the place of hand-writing multiple comma-separated values in code.
To make this more concrete, consider the case of Hash[] which you've used in your example. The Hash class has a [] class method which takes a variable number of arguments and can normally be called like this:
# Returns { "foo" => 1, "bar" => 2 }
h = Hash["foo", 1, "bar", 2]
Notice how that isn't an array or a hash or anything that I passed in; it's a (hand-written) sequence of values. The * operator allows you to achieve basically the same thing using an array--in your case, the one returned by movie_popularity.sort_by{|m,p| p}.reverse.flatten.
As for that flatten call: when you call sort_by on a hash, you're really leveraging the Enumerable module which is included in a variety of classes (most notably Array and Hash) that provide enumeration. In the case of a hash, you've probably noticed that instead of iterating over one like this:
hash.each { |value| ... }
Instead you do this:
hash.each { |key, value| ... }
That is, iterating over a hash yields two values on each iteration. So your sort_by call on its own would return a sequence of pairs. Calling flatten on this result collapses the pairs into a one-dimensional sequence of values, like this:
# Returns [1, 2, 3, 4]
[[1, 2], [3, 4]].flatten
'flatten' flattens an array: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-flatten
'*' is the splat operator: http://theplana.wordpress.com/2007/03/03/ruby-idioms-the-splat-operator/
The pertinent bit in the last url is this:
a = [[:planes, 21], [:cars, 36]]
h = Hash[*a] # => { :planes=>21, :cars=>36}

Ruby array hash keys

Basically I'm working with a 2D matrix. I can access elements of the matrix by specifying an (x,y) pair to get the corresponding value at that position.
Now I also want to be able to keep track of certain pairs that are arbitrarily determined at run-time. For example, maybe I need to keep track of the values at (1,2), (3,4), and (5,6), and maybe I need to retrieve the value at that position frequently.
So I was thinking how about just make a hash .
liked_elements = {[1,2] => M[1,2], [3,4] =>M[3,4], [5,6]=>M[5,6]}
Or something like that.
Then I can quickly iterate over the hash and get the elements that I like.
Are there any issues with using arrays as hash keys?
Just don't modify the array afterward (or remember to rehash the hash if you do).
If it's truly a matrix (an array of arrays), then you can just pass in coordinates like this
matrix = [[:a, :b, :c],[:d, :e, :f], [:g, :h, :i]]
matrix[0][1] # returns :b
matrix[1][2] # returns :f
matrix[2][3] # returns nil, since 3 is out of bounds
Yes, you can create an array as a hash key.
h = Hash[[0,1], matrix[0][1]]
h[[0,1]] # returns :b

Why does hash.keys.class return Arrays?

New to Ruby, I'm just missing something basic here. Are the keys in a Hash considered an Array unto themselves?
Yes, Hash#keys returns the hash's keys as a new array, i.e., the hash and the array returned by Hash#keys are completely independent of each other:
a = {}
b = a.keys
c = a.keys
b << :foo
a # still {}
b # [:foo]
c # still []
a[:bar] = :baz
a # {:bar => :baz}
b # still [:foo]
c # still []
From the documentation of hash.keys:
Returns a new array populated with the keys from this hash. See also Hash#values.
So the class is Array because the return value is an array.
About your question "Are the keys in a Hash considered an Array unto themselves?", they "kind" of are, hashes in Ruby are implemented as struct (st_table) which contains a list of pointers to each of its entries(st_table_entry),the st_table_entry contains the key and its value, so I guess what the keys method does is just transversing that list taking out each of the keys.
You can read this article of Ilya Grigorik where he explains much better Hashes in Ruby http://www.igvita.com/2009/02/04/ruby-19-internals-ordered-hash/
Do you think there's something paradoxical about this? Keep in mind that hashes aren't arrays in Ruby.

Ruby: hash that doesn't remember key values

Is there a hash implementation around that doens't remember key values? I have to make a giant hash but I don't care what the keys are.
Edit:
Ruby's hash implementation stores the key's value. I would like hash that doesn't remember the key's value. It just uses the hash function to store your value and forgets the key. The reason for this is that I need to make a hash for about 5 gb of data and I don't care what the key values are after creating it. I only want to be able to look up the values based on other keys.
Edit Edit:
The language is kind of confusing. By key's value I mean this:
hsh['value'] = data
I don't care what 'value' is after the hash function stores data in the hash.
Edit^3:
Okay so here's what I am doing: I am generating every 35-letter (nucleotide) kmer for a set of multiple genes. Each gene has an ID. The hash looks like this:
kmers = { 'A...G' => [1, 5, 3], 'G...T' => [4, 9, 9, 3] }
So the hash key is the kmer, and the value is an array containing IDs for the gene(s)/string(s) that have that kmer.
I am querying the hash for kmers in another dataset to quickly find matching genes. I don't care what the hash keys are, I just need to get the array of numbers from a kmer.
>> kmers['A...G']
=> [1, 5, 3]
>> kmers.keys.first
=> "Sorry Dave, I can't do that"
I guess you want a set, allthough it stores unique keys and no values. It has the fast lookup time from a hash.
Set is included in the standard libtrary.
require 'set'
s = Set.new
s << 'aaa'
p s.merge(['ccc', 'ddd']) #=> #<Set: {"aaa", "ccc", "ddd"}>
Even if there was an oddball hash that just recorded existence (which is how I understand the question) you probably wouldn't want to use it, as the built-in Hash would be simpler, faster, not require a gem, etc. So just set...
h[k] = k
...and call it a day...
I assume the 5 gb string is a genome, and the kmers are 35 base pair nucleotide sequences.
What I'd probably do (slightly simplified) is:
human_genome = File.read("human_genome.txt")
human_kmers = Set.new
human_genome.each_cons(35) do |potential_kmer|
human_kmers << potential_kmer unless human_kmers.include?(potential_kmer)
end
unknown_gene = File.read("unknown_gene.txt")
related_to_humans = unknown_gene.each_cons(35).any? do |unknown_gene_kmer|
human_kmers.include?(unknown_gene_kmer)
end
I have to make a giant hash but I don't care what the keys are.
That is called an array. Just use an array. A hash without keys is not a hash at all and loses its value. If you don't need key-value lookup then you don't need a hash.
Use an Array. An Array indexes by integers instead of keys. http://www.ruby-doc.org/core/classes/Array.html
a = []
a << "hello"
puts a #=> ["hello"]

Resources