Clone or duplicate the results array of Ruby's Scan method? - ruby

Suppose I pass the string "abcd" through Ruby's Scan method using the regular expression /(a)|(b)/. This would return an array:
>> results_orig = "abcd".scan(/(a)|(b)/)
#=> [["a", nil], [nil, "b"]]
Now, if I duplicate (.dup) or clone (.clone) this array,
>> results_copy = results_orig.dup
#=> [[["a", nil], [nil, "b"]]
and modify any element of this copy, the original array also gets modified!
>> results_copy[0][0]="hello"
#=> "hello"
>> results_copy
#=> [["hello", nil], [nil, "b"]]
>> results_orig
#=> [["hello", nil], [nil, "b"]]
This is strange, since, first, the arrays have different object IDs (results_orig.object_id == results_copy.object_id returns false), and, second, it does not happen if the array was not the product of the Scan method. To see the latter, consider the following example.
>> a = [1, 2, 3]
>> b = a.dup
>> b[0] = "hello"
>> a
#=> [1, 2, 3]
>> b
#=> ["hello", 2, 3]
My current solution is to run scan twice and catch each array in separate objects---that is, r_orig = "abca".scan(/(a)|(b)/)" ; r_copy = "abca".scan(/(a)|(b)/). But this is going to be very inefficient when I have to scan hundreds of strings.
Is there a proper way to duplicate the array from Scan's results that I can then modify whilst leaving the original results array unharmed?
Edit #1: I am running Ruby 2.0.0-p353 on Mac OS X 10.9.2.
Edit #2: It appears the issue exists when the array structure is nested... simple (single-level) arrays don't seem to have this problem. Corrected my example to reflect this.

You need to make a Deep copy. Check out this article for more information. Essentially, you need to do
copied_array = Marshal.load(Marshal.dump(complex_array))
Code source: http://thingsaaronmade.com/blog/ruby-shallow-copy-surprise.html. Marshalling works for arrays, but not for every object. A more robust method to perform a Deep copy is in the answer to this question.

Related

Differences between these 2 Ruby enumerators: [1,2,3].map vs. [1,2,3].group_by

In Ruby, is there a functional difference between these two Enumerators?
irb> enum_map = [1,2,3].map
=> #<Enumerator: [1, 2, 3]:map> # ends with "map>"
irb> enum_group_by = [1,2,3].group_by
=> #<Enumerator: [1, 2, 3]:group_by> # ends with "group_by>"
irb> enum_map.methods == enum_group_by.methods
=> true # they have the same methods
What can #<Enumerator: [1, 2, 3]:map> do that <Enumerator: [1, 2, 3]:group_by> can't do, and vice versa?
Thanks!
From the documentation of group_by:
Groups the collection by result of the block. Returns a hash where the
keys are the evaluated result from the block and the values are arrays
of elements in the collection that correspond to the key.
If no block is given an enumerator is returned.
(1..6).group_by { |i| i%3 } #=> {0=>[3, 6], 1=>[1, 4], 2=>[2, 5]}
From the documentation of map:
Returns a new array with the results of running block once for every
element in enum.
If no block is given, an enumerator is returned instead.
(1..4).map { |i| i*i } #=> [1, 4, 9, 16]
(1..4).collect { "cat" } #=> ["cat", "cat", "cat", "cat"]
As you can see, each does something different, which serves a different purpose. Concluding that two APIs are the same because they expose the same interface seems to miss the entire purpose of Object Oriented Programming - different services are supposed to expose the same interface to enable polymorphism.
There's a difference in what they do, but fundamentally they are both of the same class: Enumerator.
When they're used the values emitted by the enumerator will be different, yet the interface to them is identical.
Two objects of the same class generally have the same methods. It is possible to augment an instance with additional methods, but this is not normally done.

Why does .map produce a row of nils when used to enumerate over hashes?

test =
{:content=>"type_name", :content_length=>9, :array_index=>0},
{:content=>"product_id", :content_length=>10, :array_index=>1},
{:content=>"First Item", :content_length=>10, :array_index=>0},
{:content=>"1111", :content_length=>4, :array_index=>1}
pp test.map {|x| puts x} #=>
{:content=>"type_name", :content_length=>9, :array_index=>0}
{:content=>"product_id", :content_length=>10, :array_index=>1}
{:content=>"First Item", :content_length=>10, :array_index=>0}
{:content=>"1111", :content_length=>4, :array_index=>1}
[nil, nil, nil, nil]
What is the cause of that array of nils? The map works perfectly, but then it causes these nils!
The trouble is that #map is designed to transform an array into a different array. Generally, the block of #map will not have side effects. Here's a use of #map to double all the numbers in an array:
[1, 2, 3].map { |n| n * 2} # => [2, 4, 6]
If the purpose of your loop is solely to have side effects (such as printing the elements), you want #each instead:
[1, 2, 3].each { |n| puts n }
# => 1
# => 2
# => 3
In this case, we don't care about the return value of #each. All we care about is that each number gets printed.
Argh what a stupid error!
This fixes it:
test.map {|x| puts x}
I was pretty printing the puts statement, and irb, trying to be helpful, returned nil four times!

How to remove elements of array in place returning the removed elements

I have an array arr. I want to destructively remove elements from arr based on a condition, returning the removed elements.
arr = [1,2,3]
arr.some_method{|a| a > 1} #=> [2, 3]
arr #=> [1]
My first try was reject!:
arr = [1,2,3]
arr.reject!{|a| a > 1}
but the returning blocks and arr's value are both [1].
I could write a custom function, but I think there is an explicit method for this. What would that be?
Update after the question was answered:
partition method turns out to be useful for implementing this behavior for hash as well. How can I remove elements of a hash, returning the removed elements and the modified hash?
hash = {:x => 1, :y => 2, :z => 3}
comp_hash, hash = hash.partition{|k,v| v > 1}.map{|a| Hash[a]}
comp_hash #=> {:y=>2, :z=>3}
hash #=> {:x=>1}
I'd use partition here. It doesn't modify self inplace, but returns two new arrays. By assigning the second array to arr again, it gets the results you want:
comp_arr, arr = arr.partition { |a| a > 1 }
See the documentation of partition.
All methods with a trailing bang ! modify the receiver and it seems to be a convention that these methods return the resulting object because the non-bang do so.
What you can to do though is something like this:
b = (arr.dup - arr.reject!{|a| a>1 })
b # => [2,3]
arr #=> [1]
Here is a link to a ruby styleguide which has a section on nameing - although its rather short
To remove (in place) elements of array returning the removed elements one could use delete method, as per Array class documentation:
a = [ "a", "b", "b", "b", "c" ]
a.delete("b") #=> "b"
a #=> ["a", "c"]
a.delete("z") #=> nil
a.delete("z") { "not found" } #=> "not found"
It accepts block so custom behavior could be added, as needed

Weird behavior on new (nested) Array in Ruby [duplicate]

This question already has an answer here:
Ruby Array Initialization [duplicate]
(1 answer)
Closed 3 years ago.
Why both pieces of code are not printing the same thing. I was intending the first piece to produce the output of the second
a=Array.new(5,Array.new(3))
for i in (0...a[0].length)
a[0][i]=2
end
p a
# this prints [[2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2], [2, 2, 2]]*
a=Array.new(5).map{|d|d=Array.new(3)}
for i in (0...a[0].length)
a[0][i]=2
end
p a
# this prints [[2, 2, 2], [nil, nil, nil], [nil, nil, nil], [nil, nil, nil], [nil, nil, nil]]
This one
a=Array.new(5,Array.new(3))
Creates an array that contains the same array object within it five times. It's kinda like doing this:
a = []
b = a
a[0] = 123
puts b[0] #=> 123
Where this one:
a=Array.new(5).map{ Array.new(3) }
Creates a new 3 item array for each item in the parent array. So when you alter the first item it doesn't touch the others.
This is also why you shouldn't really use the Array and Hash constructor default arguments, as they don't always work they way you might expect.
Array.new(5,Array.new(3))
In the first example, your array contains 5 references to the same array. You create a single instance of an array with Array.new(3), a reference to which is used for each of the 5 arrays you initialize. When you modify a[0][0], you're also modifying a[1][0], a[2][0], etc. They are all references to the same array.
Array.new(5).map{ |d| Array.new(3) }
In the second example, your array contains 5 different arrays. Your block is invoked 5 times, Array.new(3) is invoked 5 times, and 5 different arrays are created. a[0] is a different array than a[1], etc.
The following are equivalent:
Array.new(5,Array.new(3))
[Array.new(3)] * 5
inside = Array.new(3); [inside, inside, inside, inside, inside]
They will all produce an array containing the same array. I mean the exact same object. That's why if you modify its contents, you will see that new value 5 times.
As you want independent arrays, you want to make sure that the "inside" arrays are not the same object. This can be achieved in different ways, for example:
Array.new(5){ Array.new(3) }
5.times.map { Array.new(3) }
Array.new(5).map { Array.new(3) }
# or dup your array manually:
inside = Array.new(3); [inside.dup, inside.dup, inside.dup, inside.dup, inside]
Note that the Array.new(5, []) form you used first doesn't copy the obj for you, it will reuse it. As that's not what you want, you should use the block form Array.new(5){ [] } which will call the block 5 times, and each time a new array is created.
The Hash class also has two constructors and is even more tricky.

Arrays misbehaving

Here's the code:
# a = Array.new(3, Array.new(3))
a = [[nil,nil,nil],[nil,nil,nil]]
a[0][0] = 1
a.each {|line| p line}
With the output:
[1, nil, nil]
[nil, nil, nil]
but using the commented line:
[1, nil, nil]
[1, nil, nil]
[1, nil, nil]
So why is that?
The commented line is assigning three of the same reference to the array, so a change to one array will propagate across the other references to it.
As for the 2 arrays vs 3, that's simply a matter of the first line specifying 3 as its first parameter and only specifying 2 array literals in the second line.
To create the nested arrays without having any shared references:
a = Array.new(3) {Array.new(3)}
When passed a block ({...} or do ... end), Array.new will call the block to obtain the value of each element of the array.

Resources