Why does array.each behavior depend on Array.new syntax? - ruby

I'm using Ruby 1.9.2-p290 and found:
a = Array.new(2, []).each {|i| i.push("a")}
=> [["a", "a"], ["a", "a"]]
Which is not what I would expect. But the following constructor style does do what I would expect:
b = Array.new(2) {Array.new}.each {|i| i.push("b")}
=> [["b"], ["b"]]
Is the first example the expected behavior?
In ruby-doc it looks like my size=2 argument is the same kind of argument for both constructors. I think that if the each method is getting passed that argument that it would use it the same way for both constructors.

This is a common misunderstanding. In your first example you are creating an array with 2 elements. Both of those are a pointer to the same array. So, when you iterate through your outer array you add 2 elements to the inner array, which is then reflected in your output twice
Compare these:
> array = Array.new(5, [])
=> [[], [], [], [], []]
# Note - 5 identical object IDs (memory locations)
> array.map { |o| o.object_id }
=> [70228709214620, 70228709214620, 70228709214620, 70228709214620, 70228709214620]
> array = Array.new(5) { [] }
=> [[], [], [], [], []]
# Note - 5 different object IDs (memory locations)
> array.map { |o| o.object_id }
=> [70228709185900, 70228709185880, 70228709185860, 70228709185840, 70228709185780]

In the first case you're using a single instance of an Array as a default for the elements of the main Array:
a = Array.new(2, []).each {|i| i.push("a")}
The second argument is simply recycled, so the push is applied to the same instance twice. You've only created one instance here, the one being supplied as an argument, so it gets used over and over.
The second method is the correct way to do this:
b = Array.new(2) {Array.new}.each {|i| i.push("b")
This deliberately creates a new instance of an Array for each position in the main Array. The important difference here is the use of the block { ... } which executes once for each position in the new Array. A short-form version of this would be:
b = Array.new(2) { [ ] }.each {|i| i.push("b")

From the ruby documentation:
new(size=0, obj=nil)
new(array)
new(size) {|index| block }
Returns a new array. In the first form, the new array is empty. In the second it is created with size copies of obj (that is, size references to the same obj). The third form creates a copy of the array passed as a parameter (the array is generated by calling to_ary on the parameter). In the last form, an array of the given size is created.
Thus, in the a array you create, you have two references to the same array, thus the push works on both of them. That is, you're pushing "a" onto the same array twice. In the the b array you create, you're actually creating a new array for each element.

Related

Unexpected result with splat operator

I have a hash, whose values are an array of size 1:
hash = {:start => [1]}
I want to unpack the arrays as in:
hash.each_pair{ |key, value| hash[key] = value[0] } # => {:start=>1}
and I thought the *-operator as in the following would work, but it does not give the expected result:
hash.each_pair{ |key, value| hash[key] = *value } # => {:start=>[1]}
Why does *value return [1] and not 1?
Because the []= method applied to hash takes only one argument in addition to the key (which is put inside the [] part), and a splatted/expanded array, which is in general a sequence of values (which coincidentally happens to be a single element in this particular case) cannot be directly accepted as the argument as is splatted. So it is accepted by the argument of []= as an array after all.
In other words, an argument (of the []= method) must be an object, but splatted elements (such as :foo, :bar, :baz) are not an object. The only way to interpret them as an object is to put them back into an array (such as [:foo, :bar, :baz]).
Using the splat operator, you can do it like this:
hash.each_pair{|key, value| hash.[]= key, *value}
sawa and Ninigi already pointed out why the assignment doesn't work as expected. Here's my attempt.
Ruby's assignment features work regardless of whether you're assigning to a variable, a constant or by implicitly invoking an assignment method like Hash#[]= with the assignment operator. For the sake of simplicity, I'm using a variable in the following examples.
Using the splat operator in an assignment does unpack the array, i.e.
a = *[1, 2, 3]
is evaluated as:
a = 1, 2, 3
But Ruby also allows you to implicitly create arrays during assignment by listing multiple values. Therefore, the above is in turn equivalent to:
a = [1, 2, 3]
That's why *[1] results in [1] - it's unpacked, just to be converted back to an array.
Elements can be assigned separately using multiple assignment:
a, b = [1, 2, 3]
a #=> 1
b #=> 2
or just:
a, = [1, 2, 3]
a #=> 1
You could use this in your code (note the comma after hash[key]):
hash = {:start => [1]}
hash.each_pair { |key, values| hash[key], = values }
#=> {:start=>1}
But there's another and more elegant way: you can unpack the array by putting parentheses around the array argument:
hash = {:start => [1]}
hash.each_pair { |key, (value)| hash[key] = value }
#=> {:start=>1}
The parentheses will decompose the array, assigning the first array element to value.
Because Ruby is acting unexpectedly smart here.
True, the splash operator will "fold" and "unfold" an array, but the catch in your code is what you do with that fanned value.
Take this code into account:
array = ['a', 'b']
some_var = *array
array # => ['a', 'b']
As you can see the splat operator seemingly does nothing to your array, while this:
some_var, some_other_var = *array
some_var # => "a"
somet_other_var # => "b"
Will do what you'd expect it does.
It seems ruby just "figures" if you splat an array into a single variable, that you want the array, not the values.
EDIT: As sawa pointed out in the comments, hash[key] = is not identical to variable =. []= is an instance Method of Hash, with it's own C-Code under the hood, which COULD (in theory) lead to different behaviour in some instances. I don't know of any example, but that does not mean there is none.
But for the sake of simplicity, we can asume that the regular variable assignment behaves exactly identical to hash[key] =.

How to remove elements of array in place returning the removed elements

I have an array arr. I want to destructively remove elements from arr based on a condition, returning the removed elements.
arr = [1,2,3]
arr.some_method{|a| a > 1} #=> [2, 3]
arr #=> [1]
My first try was reject!:
arr = [1,2,3]
arr.reject!{|a| a > 1}
but the returning blocks and arr's value are both [1].
I could write a custom function, but I think there is an explicit method for this. What would that be?
Update after the question was answered:
partition method turns out to be useful for implementing this behavior for hash as well. How can I remove elements of a hash, returning the removed elements and the modified hash?
hash = {:x => 1, :y => 2, :z => 3}
comp_hash, hash = hash.partition{|k,v| v > 1}.map{|a| Hash[a]}
comp_hash #=> {:y=>2, :z=>3}
hash #=> {:x=>1}
I'd use partition here. It doesn't modify self inplace, but returns two new arrays. By assigning the second array to arr again, it gets the results you want:
comp_arr, arr = arr.partition { |a| a > 1 }
See the documentation of partition.
All methods with a trailing bang ! modify the receiver and it seems to be a convention that these methods return the resulting object because the non-bang do so.
What you can to do though is something like this:
b = (arr.dup - arr.reject!{|a| a>1 })
b # => [2,3]
arr #=> [1]
Here is a link to a ruby styleguide which has a section on nameing - although its rather short
To remove (in place) elements of array returning the removed elements one could use delete method, as per Array class documentation:
a = [ "a", "b", "b", "b", "c" ]
a.delete("b") #=> "b"
a #=> ["a", "c"]
a.delete("z") #=> nil
a.delete("z") { "not found" } #=> "not found"
It accepts block so custom behavior could be added, as needed

Why does Array#each return an array with the same elements?

I'm learning the details of how each works in ruby, and I tried out the following line of code:
p [1,2,3,4,5].each { |element| el }
And the result is an array of
[1,2,3,4,5]
But I don't think I fully understand why. Why is the return value of each the same array? Doesn't each just provide a method for iterating? Or is it just common practice for the each method to return the original value?
Array#each returns the [array] object it was invoked upon: the result of the block is discarded. Thus if there are no icky side-effects to the original array then nothing will have changed.
Perhaps you mean to use map?
p [1,2,3,4,5].map { |i| i*i }
If you want, for some reason, to suppress the output (for example debugging in console) here is how you can achive that
[1,2,3,4,5].each do |nr|
puts nr.inspect
end;nil
Array#each
The block form of Array#each returns the original Array object. You generally use #each when you want to do something with each element of an array inside the block. For example:
[1, 2, 3, 4, 5].each { |element| puts element }
This will print out each element, but returns the original array. You can verify this with:
array = [1, 2, 3, 4, 5]
array.each { |element| element }.object_id === array.object_id
=> true
Array#map
If you want to return a new array, you want to use Array#map or one of its synonyms. The block form of #map returns a different Array object. For example:
array.object_id
=> 25659920
array.map { |element| element }.object_id
=> 20546920
array.map { |element| element }.object_id === array.object_id
=> false
You will generally want to use #map when you want to operate on a modified version of the original array, while leaving the original unchanged.
All methods return something. Even if it's just a nil object, it returns something.
It may as well return the original object rather than return nil.

Replace array values with converted data?

I have this array of time values:
["00:04:48.563044", "00:05:29.835918", "00:09:38.622569"]
But I need to pass each array item through a parser (in this case, chronic_duration), and then spit it back out in to an array.
So each array item would need to get passed through:
ChronicDuration.parse('00:04:48.563044')
And then put back in an array:
[288.563044, 329.835918, 578.622569]
Two obvious options; new array, or in-place.
pry(main)> arr = ["00:04:48.563044", "00:05:29.835918", "00:09:38.622569"];
pry(main)> arr.collect! { |s| ChronicDuration.parse s }
=> [288.563044, 329.835918, 578.622569]
To create a new array, leave off the exclamation point ("!") on the collect call:
pry(main)> new_arr = arr.collect { |s| ChronicDuration.parse s }
You might want to map from one to the other:
pry(main)> h = Hash[arr.collect { |s| [s, ChronicDuration.parse(s)] }]
=> {"00:04:48.563044"=>288.563044,
"00:05:29.835918"=>329.835918,
"00:09:38.622569"=>578.622569}
Or switch the keys/values to allow easy sorting; either switching the collect array, or inverting:
pry(main)> h.invert.keys.sort.each_with_index {|k, i| puts "#{i+1}: #{h[k]}"}
1: 00:04:48.563044
2: 00:05:29.835918
3: 00:09:38.622569

Why is Array * referencing, and not copying, values in Ruby?

I want to duplicate a hash using the same keys but different values. I coded up the following snippet, and encountered something I didn't expect:
hsh = {:foo => 'foo', :bar => 'bar'}
hsh_copy = Hash[hsh.keys.zip([[]] * hsh.length)] # => {:foo=>[], :bar=>[]}
hsh_copy[:foo] << 1
hsh_copy[:bar] << 2
hsh_copy # => {:foo=>[1, 2], :bar=>[1, 2]}
It seems that instead of copying the nested array when using the * operator, it just continues to reference the first array.
I'd be very happy if someone could explain why this is happening. Additionally, a better way of duplicating the hash would be appreciated, but I'm more concerned with understanding why * doesn't work as expected here.
If Array#* copied the elements of the array, it would break when used on arrays with non-copyable elements (which includes, among others, numbers), which would not be desirable.
As for how to do what you want to do: Replace hsh.keys.zip([[]] * hsh.length) with hsh.map {|k,v| [k, []] }.
The * operator concatenates copies of the array together to meet the new length.
If an array element references an object, when it is duplicated a new array element is in fact created, but it's a new array element that references the same object.
For example:
irb(main):012:0> ([[]] * 3).map { |e| e.object_id }
=> [2149128060, 2149128060, 2149128060]
In your case, you could just create new elements with .map and let Ruby create a new object with [] each time, but for a general solution, start with:
irb(main):013:0> ([[]] * 3).map { |e| e.clone.object_id }
=> [2149106700, 2149106660, 2149106640]

Resources