Can I count on partition preserving order? - ruby

Say I have a sorted Array, such as this:
myArray = [1, 2, 3, 4, 5, 6]
Suppose I call Enumerable#partition on it:
p myArray.partition(&:odd?)
Must the output always be the following?
[[1, 3, 5], [2, 4, 6]]
The documentation doesn't state this; this is what it says:
partition { |obj| block } → [ true_array, false_array ]
partition → an_enumerator
Returns two arrays, the first containing the elements of enum for which the block evaluates to true, the second containing the rest.
If no block is given, an enumerator is returned instead.
But it seems logical to assume partition works this way.
Through testing Matz's interpreter, it appears to be the case that the output works like this, and it makes full sense for it to be like this. However, can I count on partition working this way regardless of the Ruby version or interpreter?
Note: I made implementation-agnostic because I couldn't find any other tag that describes my concern. Feel free to change the tag to something better if you know about it.

No, you can't rely on the order. The reason is parallelism.
A traditional serial implementation of partition would loop through each element of the array evaluating the block one at a time in order. As each call to odd returns, it's immediately pushed into the appropriate true or false array.
Now imagine an implementation which takes advantage of multiple CPU cores. It still iterates through the array in order, but each call to odd can return out of order. odd(myArray[2]) might return before odd(myArray[0]) resulting in [[3, 1, 5], [2, 4, 6]].
List processing idioms such as partition which run a list through a function (most of Enumerable) benefit greatly from parallel processing, and most computers these days have multiple cores. I wouldn't be surprised if a future Ruby implementation took advantage of this. The writers of the API documentation for Enumerable likely carefully omitted any mention of process ordering to leave this optimization possibility open.

The documentation makes no explicit mention of this, but judging from the official code, it does retain ordering:
static VALUE
partition_i(RB_BLOCK_CALL_FUNC_ARGLIST(i, arys))
{
struct MEMO *memo = MEMO_CAST(arys);
VALUE ary;
ENUM_WANT_SVALUE();
if (RTEST(enum_yield(argc, i))) {
ary = memo->v1;
}
else {
ary = memo->v2;
}
rb_ary_push(ary, i);
return Qnil;
}
This code gets called from the public interface.
Essentially, the ordering in which your enumerable emits objects gets retained with the above logic.

Related

method cascading is possible here?

I have a three lines of code here like shown below
local = headers.zip(*data_rows).transpose
local = local[1..-1].map {|dataRow| local[0].zip(dataRow).to_h}
p local
Now if you watch the above three lines, I have to store the result of the first line in the variable called local since it would be used in two places in the second line as I have shown,So Can't I cascade the second line with first line anyway? I tried using tap like this
local = headers.zip(*data_rows).transpose.tap{|h|h[1..-1].map {|dataRow| h[0].zip(dataRow).to_h}}
tap is returning the self as explained in the document so can't I get the result final result when I use tab? Anyway other way to achieve this result in one single line so that I don't have to use local variable?
If you're on Ruby 2.5.0 or later, you can use yield_self for this.
local = headers.zip(*data_rows).transpose.yield_self { |h| h[1..-1].map { |dataRow| h[0].zip(dataRow).to_h } }
yield_self is similar to tap in that they both yield self to the block. The difference is in what is returned by each of the two methods.
Object#tap yields self to the block and then returns self. Kernel#yield_self yields self to the block and then returns the result of the block.
Here's an answer to a previous question where I gave a couple of further examples of where each of these method can be useful.
It's often helpful to execute working code with data, to better understand what is to be computed. Seeing transpose and zip, which are often interchangeable, used together, was a clue that a simplification might be possible (a = [1,2,3]; b = [4,5,6]; a.zip(b) => [[1, 4], [2, 5], [3, 6]] <= [a,b].transpose).
Here's my data:
headers=[1,2,3]
data_rows=[[11,12,13],[21,22,23],[31,32,33],[41,42,43]]
and here's what the working code returns:
local = headers.zip(*data_rows).transpose
local[1..-1].map {|dataRow| local[0].zip(dataRow).to_h}
#=> [{1=>11, 2=>12, 3=>13}, {1=>21, 2=>22, 3=>23},
# {1=>31, 2=>32, 3=>33}, {1=>41, 2=>42, 3=>43}]
It would seem that this might be computed more simply:
data_rows.map { |row| headers.zip(row).to_h }
#=> [{1=>11, 2=>12, 3=>13}, {1=>21, 2=>22, 3=>23},
# {1=>31, 2=>32, 3=>33}, {1=>41, 2=>42, 3=>43}]

Correct semantic usage of map

I am trying to understand what is a semantically right way to use map. As map can behave the same way as each, you could modify the array any way you like. But I've been told by my colleague that after map is applied, array should have
the same order and the same size.
For example, that would mean using the map to return an updated array won't be the right way to use map:
array = [1,2,3,4]
array.map{|num| num unless num == 2 || num == 4}.compact
I've been using map and other Enumerator methods for ages and never thought about this too much. Would appreciate advice from experienced Ruby Developers.
In Computer Science, map according to Wikipedia:
In many programming languages, map is the name of a higher-order
function that applies a given function to each element of a list,
returning a list of results in the same order
This statement implies the returned value of map should be of the same length (because we're applying the function to each element). And the returned-elements are to be in the same order. So when you use map, this is what the reader expects.
How not to use map
arr.map {|i| arr.pop } #=> [3, 2]
This clearly betrays the intention of map since we have a different number of elements returned and they are not even in the original order of application. So don't use map like this. See "How to use ruby's value_at to get subhashes in a hash" and subsequent comments for further clarification and thanks to #meager for originally pointing this out to me.
Meditate on this:
array = [1,2,3,4]
array.map{|num| num unless num == 2 || num == 4} # => [1, nil, 3, nil]
.compact # => [1, 3]
The intermediate value is an array of the same size, however it contains undesirable values, forcing the use of compact. The fallout of this is CPU time is wasted generating the nil values, then deleting them. In addition, memory is being wasted generating another array that is the same size when it shouldn't be. Imagine the CPU and memory cost in a loop that is processing thousands of elements in an array.
Instead, using the right tool cleans up the code and avoids wasting CPU or memory:
array.reject { |num| num == 2 || num == 4 } # => [1, 3]
I've been using map and other Enumerator methods for ages and never thought about this too much.
I'd recommend thinking about it. It's the little things like this that can make or break code or a system, and everything we do when programming needs to be done deliberately, avoiding all negative side-effects we can foresee.

Understanding Ruby array sorting syntax

I really don't understand the following sorting method:
books = ["Charlie and the Chocolate Factory", "War and Peace", "Utopia", "A Brief History of Time", "A Wrinkle in Time"]
books.sort! { |firstBook, secondBook| firstBook <=> secondBook }
How does the this work? In the ruby books, they had one parameter for example |x| represent each of the values in the array. If there is more than one parameter (firstBook and secondBook in this example) what does it represent??
Thank you!
The <=> operator returns the result of a comparison.
So "a" <=> "b" returns -1, "b" <=> "a" returns 1, and "a" <=> "a" returns 0.
That's how sort is able to determine the order of elements.
Array#sort (and sort!) called without a block will do comparisons with <=>, so the block is redundant. These all accomplish the same thing:
books.sort!
books.sort_by!{|x| x}
books.sort!{|firstBook, secondBook| firstBook <=> secondBook}
Since you are not overriding the default behavior, the second and third forms are needlessly complicated.
So how does this all work?
The first form sorts the array by using some sorting algorithm -- it's not relevant which one -- which needs to be able to compare two elements to decide which comes first. (More on this below.) It automatically, behind the scenes, follows the same logic as the third line above.
The middle form lets you choose what to sort on. For example: instead of, for each item, just sorting on that item (which is the default), you can sort on that item's length:
books.sort_by!{|title| title.length}
Then books is sorted from shortest title to longest title. If all you are doing is calling a method on each item, there's another shortcut available. This does the same thing:
books.sort_by!(&:length)
In the final form, you have control over the comparison itself. For example, you could sort backwards:
books.sort!{|first, second| second <=> first}
Why does sort need two items passed into the block, and what do they represent?
Array#sort (and sort!) with a block is how you override the comparison step of sorting. Comparison has to happen at some point during a sort in order to figure out what order to put things in. You don't need to override the comparison in most cases, but if you do, this is the form that allows that, so it needs two items passed into the block: the two items that need to be compared right now. Let's look at an example in action:
[4, 3, 2, 1].sort{|x, y| puts "#{x}, #{y}"; x <=> y}
This outputs:
4, 2
2, 1
3, 2
3, 4
This shows us that in this case, sort compared 4 and 2, then 2 and 1, then 3 and 2, and then finally 3 and 4, in order to sort the array. The precise details are irrelevant to this discussion and depend on the sorting algorithm being used, but again, all sorting algorithms need to be able to compare items in order to sort.
The block given inside {} is passed as a comparing function for method sort. |a, b| tells us that this comparing function takes 2 parameters (which is expected number of arguments since we need to compare).
This block is executed for each element in array but if we need one more argument we take next element after this.
See http://ruby-doc.org/core-2.0/Array.html#method-i-sort for an explanation. As for a single-parameter method referred to in your books, I can only guess you were looking at sort_by. Can you give an example?

Changing one array in an array of arrays changes them all; why?

a = Array.new(3,[])
a[1][0] = 5
a => [[5], [5], [5]]
I thought this doesn't make sense!
isn't it should a => [[], [5], []]
or this's sort of Ruby's feature ?
Use this instead:
a = Array.new(3){ [] }
With your code the same object is used for the value of each entry; once you mutate one of the references you see all others change. With the above you instead invoke the block each time a new value is needed, which returns a new array each time.
This is similar in nature to the new user question about why the following does not work as expected:
str.gsub /(<([a-z]+)>/, "-->#{$1}<--"
In the above, string interpolation occurs before the gsub method is ever called, so it cannot use the then-current value of $1 in your string. Similarly, in your question you create an object and pass it to Array.new before Ruby starts creating array slots. Yes, the runtime could call dup on the item by default…but that would be potentially disastrous and slow. Hence you get the block form to determine on your own how to create the initial values.

Mocking Sort With Mocha

How can I mock an array's sort expect a lambda expression?
This is a trivial example of my problem:
# initializing the data
l = lambda { |a,b| a <=> b }
array = [ 1, 2, 3, 4, 5 ]
sorted_array = [ 2, 3, 8, 9, 1]
# I expect that sort will be called using the lambda as a parameter
array.expects(:sort).with( l ).returns( sorted_array )
# perform the sort using the lambda expression
temp = array.sort{|a,b| l.call(a,b) }
Now, at first I expected that this would work; however, I got the following error:
- expected exactly once, not yet invoked: [ 1, 2, 3, 4, 5 ].sort(#<Proc:0xb665eb48>)
I realize that this will not work because l is not passed as a parameter to l. However, is there another way to do what this code is trying to accomplish?
NOTE: I have figured out how to solve my issue without figuring out how to do the above. I will leave this open just in case someone else has a similar problem.
Cheers,
Joseph
Mocking methods with blocks can be quite confusing. One of the keys is to be clear about what behaviour you want to test. I can't tell from your sample code exactly what it is that you want to test. However, you might find the documentation for Mocha::Expectation#yields (or even Mocha::Expectation#multiple_yields) useful.

Categories

Resources