Why is Array * referencing, and not copying, values in Ruby? - ruby

I want to duplicate a hash using the same keys but different values. I coded up the following snippet, and encountered something I didn't expect:
hsh = {:foo => 'foo', :bar => 'bar'}
hsh_copy = Hash[hsh.keys.zip([[]] * hsh.length)] # => {:foo=>[], :bar=>[]}
hsh_copy[:foo] << 1
hsh_copy[:bar] << 2
hsh_copy # => {:foo=>[1, 2], :bar=>[1, 2]}
It seems that instead of copying the nested array when using the * operator, it just continues to reference the first array.
I'd be very happy if someone could explain why this is happening. Additionally, a better way of duplicating the hash would be appreciated, but I'm more concerned with understanding why * doesn't work as expected here.

If Array#* copied the elements of the array, it would break when used on arrays with non-copyable elements (which includes, among others, numbers), which would not be desirable.
As for how to do what you want to do: Replace hsh.keys.zip([[]] * hsh.length) with hsh.map {|k,v| [k, []] }.

The * operator concatenates copies of the array together to meet the new length.
If an array element references an object, when it is duplicated a new array element is in fact created, but it's a new array element that references the same object.
For example:
irb(main):012:0> ([[]] * 3).map { |e| e.object_id }
=> [2149128060, 2149128060, 2149128060]
In your case, you could just create new elements with .map and let Ruby create a new object with [] each time, but for a general solution, start with:
irb(main):013:0> ([[]] * 3).map { |e| e.clone.object_id }
=> [2149106700, 2149106660, 2149106640]


Ruby how to return an element of a dictionary?

# dictionary = {"cat"=>"Sam"}
This a return a key
This returns a value
How do I return the entire element
should do the trick for you
whatever is the last evaluated expression in ruby is the return value of a method.
If you want to return the hash as a whole. the last line of the method should look like the line I have written above
Your example is a bit (?) misleading in a sense it only has one pair (while not necessarily), and you want to get one pair. What you call a "dictionary" is actually a hashmap (called a hash among Rubyists).
A hashrocket (=>) is a part of hash definition syntax. It can't be used outside it. That is, you can't get just one pair without constructing a new hash. So, a new such pair would look as: { key => value }.
So in order to do that, you'll need a key and a value in context of your code somewhere. And you've specified ways to get both if you have one. If you only have a value, then:
{ #dictionary.key(x) => x }
...and if just a key, then:
{ x => #dictionary[x] }
...but there is no practical need for this. If you want to process each pair in a hash, use an iterator to feed each pair into some code as an argument list:
#dictionary.each do |key, value|
# do stuff with key and value
This way a block of code will get each pair in a hash once.
If you want to get not a hash, but pairs of elements it's constructed of, you can convert your hash to an array:
# => [["cat", "Sam"]]
# Note the double braces! And see below.
# Let's say we have this:
#dictionary2 = { 1 => 2, 3 => 4}
# => 2
# => [[1, 2], [3, 4]]
# Now double braces make sense, huh?
It returns an array of pairs (which are arrays as well) of all elements (keys and values) that your hashmap contains.
If you wish to return one element of a hash h, you will need to specify the key to identify the element. As the value for key k is h[k], the key-value pair, expressed as an array, is [k, h[k]]. If you wish to make that a hash with a single element, use Hash[[[k, h[k]]]].
For example, if
h = { "cat"=>"Sam", "dog"=>"Diva" }
and you only wanted to the element with key "cat", that would be
["cat", h["cat"]] #=> ["cat", "Sam"]
Hash[[["cat", h["cat"]]]] #=> {"cat"=>"Sam"}
With Ruby 2.1 you could alternatively get the hash like this:
[["cat", h["cat"]]].to_h #=> {"cat"=>"Sam"}
Let's look at a little more interesting case. Suppose you have an array arr containing some or all of the keys of a hash h. Then you can get all the key-value pairs for those keys by using the methods Enumerable#zip and Hash#values_at:
Suppose, for example,
h = { "cat"=>"Sam", "dog"=>"Diva", "pig"=>"Petunia", "owl"=>"Einstein" }
arr = ["dog", "owl"]
#=> [["dog", "Diva"], ["owl", "Einstein"]]
In steps:
a = h.values_at(*arr)
#=> h.values_at(*["dog", "owl"])
#=> h.values_at("dog", "owl")
#=> ["Diva", "Einstein"]
#=> [["dog", "Diva"], ["owl", "Einstein"]]
To instead express as a hash:
#=> {"dog"=>"Diva", "owl"=>"Einstein"}
You can get the key and value in one go - resulting in an array:
#h = {"cat"=>"Sam", "dog"=>"Phil"}
key, value = p h.assoc("cat") # => ["cat", "Sam"]
Use rassoc to search by value ( .rassoc("Sam") )

Getting an array of hash values given specific keys

Given certain keys, I want to get an array of values from a hash (in the order I gave the keys). I had done this:
class Hash
def values_for_keys(*keys_requested)
result = []
keys_requested.each do |key|
result << self[key]
return result
I modified the Hash class because I do plan to use it almost everywhere in my code.
But I don't really like the idea of modifying a core class. Is there a builtin solution instead? (couldn't find any, so I had to write this).
You should be able to use values_at:
values_at(key, ...) → array
Return an array containing the values associated with the given keys. Also see Hash.select.
h = { "cat" => "feline", "dog" => "canine", "cow" => "bovine" }
h.values_at("cow", "cat") #=> ["bovine", "feline"]
The documentation doesn't specifically say anything about the order of the returned array but:
The example implies that the array will match the key order.
The standard implementation does things in the right order.
There's no other sensible way for the method to behave.
For example:
>> h = { :a => 'a', :b => 'b', :c => 'c' }
=> {:a=>"a", :b=>"b", :c=>"c"}
>> h.values_at(:c, :a)
=> ["c", "a"]
i will suggest you do this:
your_hash.select{|key,value| given_keys.include?(key)}.values

Looping on array

I'm having some trouble figuring out the right way to do this:
I have an array and a separate array of arrays that I want to compare to the first array. The first array is a special Enumerable object that happens to contain an array.
Logic tells me that I should be able to do this:
[1,2,3].delete_if do |n|
[[2,4,5], [3,6,7]].each do |m|
! m.include?(n)
Which I would expect to return
=> [2,3]
But it returns [] instead.
This idea works if I do this:
[1,2,3].delete_if do |n|
! [2,4,5].include?(n)
It will return
=> [2]
I can't assign the values to another object, as the [1,2,3] array must stay its special Enumerable object. I'm sure there is a much simpler explanation to this than what I'm trying. Anybody have any ideas?
You can also flatten the multi-dimensional array and use the Array#& intersection operator to get the same result:
# cast your enumerable to array with to_a
e = [1,2,3].each
e.to_a & [[2,4,5], [3,6,7]].flatten
# => [2, 3]
Can't you just add the two inner array together, and and check the inclusion on the concatenated array?
[1,2,3].delete_if do |n|
!([2,4,5] + [3,6,7]).include?(n)
The problem is that the return value of each is the array being iterated over, not the boolean (which is lost). Since the array is truthy, the value returned back to delete_if is always true, so all elements are deleted. You should instead use any?:
[1,2,3].delete_if do |n|
![[2,4,5], [3,6,7]].any? do |m|
#=> [2, 3]

Why does array.each behavior depend on Array.new syntax?

I'm using Ruby 1.9.2-p290 and found:
a = Array.new(2, []).each {|i| i.push("a")}
=> [["a", "a"], ["a", "a"]]
Which is not what I would expect. But the following constructor style does do what I would expect:
b = Array.new(2) {Array.new}.each {|i| i.push("b")}
=> [["b"], ["b"]]
Is the first example the expected behavior?
In ruby-doc it looks like my size=2 argument is the same kind of argument for both constructors. I think that if the each method is getting passed that argument that it would use it the same way for both constructors.
This is a common misunderstanding. In your first example you are creating an array with 2 elements. Both of those are a pointer to the same array. So, when you iterate through your outer array you add 2 elements to the inner array, which is then reflected in your output twice
Compare these:
> array = Array.new(5, [])
=> [[], [], [], [], []]
# Note - 5 identical object IDs (memory locations)
> array.map { |o| o.object_id }
=> [70228709214620, 70228709214620, 70228709214620, 70228709214620, 70228709214620]
> array = Array.new(5) { [] }
=> [[], [], [], [], []]
# Note - 5 different object IDs (memory locations)
> array.map { |o| o.object_id }
=> [70228709185900, 70228709185880, 70228709185860, 70228709185840, 70228709185780]
In the first case you're using a single instance of an Array as a default for the elements of the main Array:
a = Array.new(2, []).each {|i| i.push("a")}
The second argument is simply recycled, so the push is applied to the same instance twice. You've only created one instance here, the one being supplied as an argument, so it gets used over and over.
The second method is the correct way to do this:
b = Array.new(2) {Array.new}.each {|i| i.push("b")
This deliberately creates a new instance of an Array for each position in the main Array. The important difference here is the use of the block { ... } which executes once for each position in the new Array. A short-form version of this would be:
b = Array.new(2) { [ ] }.each {|i| i.push("b")
From the ruby documentation:
new(size=0, obj=nil)
new(size) {|index| block }
Returns a new array. In the first form, the new array is empty. In the second it is created with size copies of obj (that is, size references to the same obj). The third form creates a copy of the array passed as a parameter (the array is generated by calling to_ary on the parameter). In the last form, an array of the given size is created.
Thus, in the a array you create, you have two references to the same array, thus the push works on both of them. That is, you're pushing "a" onto the same array twice. In the the b array you create, you're actually creating a new array for each element.

Convert array-of-hashes to a hash-of-hashes, indexed by an attribute of the hashes

I've got an array of hashes representing objects as a response to an API call. I need to pull data from some of the hashes, and one particular key serves as an id for the hash object. I would like to convert the array into a hash with the keys as the ids, and the values as the original hash with that id.
Here's what I'm talking about:
api_response = [
{ :id => 1, :foo => 'bar' },
{ :id => 2, :foo => 'another bar' },
# ..
ideal_response = {
1 => { :id => 1, :foo => 'bar' },
2 => { :id => 2, :foo => 'another bar' },
# ..
There are two ways I could think of doing this.
Map the data to the ideal_response (below)
Use api_response.find { |x| x[:id] == i } for each record I need to access.
A method I'm unaware of, possibly involving a way of using map to build a hash, natively.
My method of mapping:
keys = data.map { |x| x[:id] }
mapped = Hash[*keys.zip(data).flatten]
I can't help but feel like there is a more performant, tidier way of doing this. Option 2 is very performant when there are a very minimal number of records that need to be accessed. Mapping excels here, but it starts to break down when there are a lot of records in the response. Thankfully, I don't expect there to be more than 50-100 records, so mapping is sufficient.
Is there a smarter, tidier, or more performant way of doing this in Ruby?
Ruby <= 2.0
> Hash[api_response.map { |r| [r[:id], r] }]
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
However, Hash::[] is pretty ugly and breaks the usual left-to-right OOP flow. That's why Facets proposed Enumerable#mash:
> require 'facets'
> api_response.mash { |r| [r[:id], r] }
#=> {1=>{:id=>1, :foo=>"bar"}, 2=>{:id=>2, :foo=>"another bar"}}
This basic abstraction (convert enumerables to hashes) was asked to be included in Ruby long ago, alas, without luck.
Note that your use case is covered by Active Support: Enumerable#index_by
Ruby >= 2.1
[UPDATE] Still no love for Enumerable#mash, but now we have Array#to_h. It creates an intermediate array, but it's better than nothing:
> object = api_response.map { |r| [r[:id], r] }.to_h
Something like:
ideal_response = api_response.group_by{|i| i[:id]}
#=> {1=>[{:id=>1, :foo=>"bar"}], 2=>[{:id=>2, :foo=>"another bar"}]}
It uses Enumerable's group_by, which works on collections, returning matches for whatever key value you want. Because it expects to find multiple occurrences of matching key-value hits it appends them to arrays, so you end up with a hash of arrays of hashes. You could peel back the internal arrays if you wanted but could run a risk of overwriting content if two of your hash IDs collided. group_by avoids that with the inner array.
Accessing a particular element is easy:
ideal_response[1][0] #=> {:id=>1, :foo=>"bar"}
ideal_response[1][0][:foo] #=> "bar"
The way you show at the end of the question is another valid way of doing it. Both are reasonably fast and elegant.
For this I'd probably just go:
ideal_response = api_response.each_with_object(Hash.new) { |o, h| h[o[:id]] = o }
Not super pretty with the multiple brackets in the block but it does the trick with just a single iteration of the api_response.
