Changing one array in an array of arrays changes them all; why? - ruby

a = Array.new(3,[])
a[1][0] = 5
a => [[5], [5], [5]]
I thought this doesn't make sense!
isn't it should a => [[], [5], []]
or this's sort of Ruby's feature ?

Use this instead:
a = Array.new(3){ [] }
With your code the same object is used for the value of each entry; once you mutate one of the references you see all others change. With the above you instead invoke the block each time a new value is needed, which returns a new array each time.
This is similar in nature to the new user question about why the following does not work as expected:
str.gsub /(<([a-z]+)>/, "-->#{$1}<--"
In the above, string interpolation occurs before the gsub method is ever called, so it cannot use the then-current value of $1 in your string. Similarly, in your question you create an object and pass it to Array.new before Ruby starts creating array slots. Yes, the runtime could call dup on the item by default…but that would be potentially disastrous and slow. Hence you get the block form to determine on your own how to create the initial values.

Related

method cascading is possible here?

I have a three lines of code here like shown below
local = headers.zip(*data_rows).transpose
local = local[1..-1].map {|dataRow| local[0].zip(dataRow).to_h}
p local
Now if you watch the above three lines, I have to store the result of the first line in the variable called local since it would be used in two places in the second line as I have shown,So Can't I cascade the second line with first line anyway? I tried using tap like this
local = headers.zip(*data_rows).transpose.tap{|h|h[1..-1].map {|dataRow| h[0].zip(dataRow).to_h}}
tap is returning the self as explained in the document so can't I get the result final result when I use tab? Anyway other way to achieve this result in one single line so that I don't have to use local variable?
If you're on Ruby 2.5.0 or later, you can use yield_self for this.
local = headers.zip(*data_rows).transpose.yield_self { |h| h[1..-1].map { |dataRow| h[0].zip(dataRow).to_h } }
yield_self is similar to tap in that they both yield self to the block. The difference is in what is returned by each of the two methods.
Object#tap yields self to the block and then returns self. Kernel#yield_self yields self to the block and then returns the result of the block.
Here's an answer to a previous question where I gave a couple of further examples of where each of these method can be useful.
It's often helpful to execute working code with data, to better understand what is to be computed. Seeing transpose and zip, which are often interchangeable, used together, was a clue that a simplification might be possible (a = [1,2,3]; b = [4,5,6]; a.zip(b) => [[1, 4], [2, 5], [3, 6]] <= [a,b].transpose).
Here's my data:
headers=[1,2,3]
data_rows=[[11,12,13],[21,22,23],[31,32,33],[41,42,43]]
and here's what the working code returns:
local = headers.zip(*data_rows).transpose
local[1..-1].map {|dataRow| local[0].zip(dataRow).to_h}
#=> [{1=>11, 2=>12, 3=>13}, {1=>21, 2=>22, 3=>23},
# {1=>31, 2=>32, 3=>33}, {1=>41, 2=>42, 3=>43}]
It would seem that this might be computed more simply:
data_rows.map { |row| headers.zip(row).to_h }
#=> [{1=>11, 2=>12, 3=>13}, {1=>21, 2=>22, 3=>23},
# {1=>31, 2=>32, 3=>33}, {1=>41, 2=>42, 3=>43}]

Correct semantic usage of map

I am trying to understand what is a semantically right way to use map. As map can behave the same way as each, you could modify the array any way you like. But I've been told by my colleague that after map is applied, array should have
the same order and the same size.
For example, that would mean using the map to return an updated array won't be the right way to use map:
array = [1,2,3,4]
array.map{|num| num unless num == 2 || num == 4}.compact
I've been using map and other Enumerator methods for ages and never thought about this too much. Would appreciate advice from experienced Ruby Developers.
In Computer Science, map according to Wikipedia:
In many programming languages, map is the name of a higher-order
function that applies a given function to each element of a list,
returning a list of results in the same order
This statement implies the returned value of map should be of the same length (because we're applying the function to each element). And the returned-elements are to be in the same order. So when you use map, this is what the reader expects.
How not to use map
arr.map {|i| arr.pop } #=> [3, 2]
This clearly betrays the intention of map since we have a different number of elements returned and they are not even in the original order of application. So don't use map like this. See "How to use ruby's value_at to get subhashes in a hash" and subsequent comments for further clarification and thanks to #meager for originally pointing this out to me.
Meditate on this:
array = [1,2,3,4]
array.map{|num| num unless num == 2 || num == 4} # => [1, nil, 3, nil]
.compact # => [1, 3]
The intermediate value is an array of the same size, however it contains undesirable values, forcing the use of compact. The fallout of this is CPU time is wasted generating the nil values, then deleting them. In addition, memory is being wasted generating another array that is the same size when it shouldn't be. Imagine the CPU and memory cost in a loop that is processing thousands of elements in an array.
Instead, using the right tool cleans up the code and avoids wasting CPU or memory:
array.reject { |num| num == 2 || num == 4 } # => [1, 3]
I've been using map and other Enumerator methods for ages and never thought about this too much.
I'd recommend thinking about it. It's the little things like this that can make or break code or a system, and everything we do when programming needs to be done deliberately, avoiding all negative side-effects we can foresee.

Can I count on partition preserving order?

Say I have a sorted Array, such as this:
myArray = [1, 2, 3, 4, 5, 6]
Suppose I call Enumerable#partition on it:
p myArray.partition(&:odd?)
Must the output always be the following?
[[1, 3, 5], [2, 4, 6]]
The documentation doesn't state this; this is what it says:
partition { |obj| block } → [ true_array, false_array ]
partition → an_enumerator
Returns two arrays, the first containing the elements of enum for which the block evaluates to true, the second containing the rest.
If no block is given, an enumerator is returned instead.
But it seems logical to assume partition works this way.
Through testing Matz's interpreter, it appears to be the case that the output works like this, and it makes full sense for it to be like this. However, can I count on partition working this way regardless of the Ruby version or interpreter?
Note: I made implementation-agnostic because I couldn't find any other tag that describes my concern. Feel free to change the tag to something better if you know about it.
No, you can't rely on the order. The reason is parallelism.
A traditional serial implementation of partition would loop through each element of the array evaluating the block one at a time in order. As each call to odd returns, it's immediately pushed into the appropriate true or false array.
Now imagine an implementation which takes advantage of multiple CPU cores. It still iterates through the array in order, but each call to odd can return out of order. odd(myArray[2]) might return before odd(myArray[0]) resulting in [[3, 1, 5], [2, 4, 6]].
List processing idioms such as partition which run a list through a function (most of Enumerable) benefit greatly from parallel processing, and most computers these days have multiple cores. I wouldn't be surprised if a future Ruby implementation took advantage of this. The writers of the API documentation for Enumerable likely carefully omitted any mention of process ordering to leave this optimization possibility open.
The documentation makes no explicit mention of this, but judging from the official code, it does retain ordering:
static VALUE
partition_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, arys))
{
struct MEMO *memo = MEMO_CAST(arys);
VALUE ary;
ENUM_WANT_SVALUE();
if (RTEST(enum_yield(argc, i))) {
ary = memo->v1;
}
else {
ary = memo->v2;
}
rb_ary_push(ary, i);
return Qnil;
}
This code gets called from the public interface.
Essentially, the ordering in which your enumerable emits objects gets retained with the above logic.

Ruby object references vs collection references

I was going through The Well Grounded Rubyist and got confused by the following example.
Suppose we have an array of strings:
numbers = ["one", "two", "three"]
If I freeze this array, I can't do the following:
numbers[2] = "four"
That statement is a Runtime error, but this:
numbers[2].replace("four")
is not.
The book explains that in the first of the last two statements, we are trying to access the array. That's what I found confusing because I thought we are trying to access the third element of the array, which is a string object. And how is that different from the last statement?
It's different because in the statement that works you are calling String#replace. As you might expect, a call to Array#replace will fail.
numbers.replace [1,2,3]
TypeError: can't modify frozen array
The object reference at any given array index might be arbitrarily complicated and it's not the job of the frozen array to keep those objects from changing ... it just wants to keep the array from changing. You can see this:
ree-1.8.7> numbers[2].object_id
=> 2149301040
ree-1.8.7> numbers[2].replace "four"
=> "four"
ree-1.8.7> numbers[2].object_id
=> 2149301040
numbers[2] has the same object_id after String#replace runs; the Array did not actually change.
An array is a list of object_id's. String#replace is special - it changes the string but it keeps the object_id. So the list of object_id's does not change and the Array does not detect any change.
You can freeze every string of the array. String#replace would then result in an error.

Ruby Array find_first object?

Am I missing something in the Array documentation? I have an array which contains up to one object satisfying a certain criterion. I'd like to efficiently find that object. The best idea I have from the docs is this:
candidates = my_array.select { |e| e.satisfies_condition? }
found_it = candidates.first if !candidates.empty?
But I am unsatisfied for two reasons:
That select made me traverse the whole array, even though we could have bailed after the first hit.
I needed a line of code (with a condition) to flatten the candidates.
Both operations are wasteful with foreknowledge that there's 0 or 1 satisfying objects.
What I'd like is something like:
array.find_first(block)
which returns nil or the first object for which the block evaluates to true, ending the traversal at that object.
Must I write this myself? All those other great methods in Array make me think it's there and I'm just not seeing it.
Either I don't understand your question, or Enumerable#find is the thing you were looking for.
use array detect method if you wanted to return first value where block returns true
[1,2,3,11,34].detect(&:even?) #=> 2
OR
[1,2,3,11,34].detect{|i| i.even?} #=> 2
If you wanted to return all values where block returns true then use select
[1,2,3,11,34].select(&:even?) #=> [2, 34]
Guess you just missed the find method in the docs:
my_array.find {|e| e.satisfies_condition? }
Do you need the object itself or do you just need to know if there is an object that satisfies.
If the former then yes: use find:
found_object = my_array.find { |e| e.satisfies_condition? }
otherwise you can use any?
found_it = my_array.any? { |e| e.satisfies_condition? }
The latter will bail with "true" when it finds one that satisfies the condition.
The former will do the same, but return the object.

Resources