Merge N sorted arrays in ruby lazily - ruby

How does one merge N sorted arrays (or other list-like data structures) lazily in Ruby? For example, in Python you would use heapq.merge. There must be something like this built into Ruby, right?

Here's a (slightly golfed) solution that should work on arrays of any 'list-like' collections that support #first, #shift, and #empty?. Note that it is destructive - each call to lazymerge removes one item from one collection.
def minheap a,i
r=(l=2*(m=i)+1)+1 #get l,r index
m = l if l< a.size and a[l].first < a[m].first
m = r if r< a.size and a[r].first < a[m].first
(a[i],a[m]=a[m],a[i];minheap(a,m)) if (m!=i)
end
def lazymerge a
(a.size/2).downto(1){|i|minheap(a,i)}
r = a[0].shift
a[0]=a.pop if a[0].empty?
return r
end
p arrs = [ [1,2,3], [2,4,5], [4,5,6],[3,4,5]]
v=true
puts "Extracted #{v=lazymerge (arrs)}. Arr= #{arrs.inspect}" while v
Output:
[[1, 2, 3], [2, 4, 5], [4, 5, 6], [3, 4, 5]]
Extracted 1. Arr= [[2, 3], [2, 4, 5], [4, 5, 6], [3, 4, 5]]
Extracted 2. Arr= [[3], [2, 4, 5], [4, 5, 6], [3, 4, 5]]
Extracted 2. Arr= [[4, 5], [3], [4, 5, 6], [3, 4, 5]]
Extracted 3. Arr= [[4, 5], [3, 4, 5], [4, 5, 6]]
Extracted 3. Arr= [[4, 5], [4, 5], [4, 5, 6]]
Extracted 4. Arr= [[5], [4, 5], [4, 5, 6]]
Extracted 4. Arr= [[5], [5], [4, 5, 6]]
Extracted 4. Arr= [[5, 6], [5], [5]]
Extracted 5. Arr= [[6], [5], [5]]
Extracted 5. Arr= [[5], [6]]
Extracted 5. Arr= [[6]]
Extracted 6. Arr= [[]]
Extracted . Arr= [[]]
Note also that this algorithm is also lazy about maintaining the heap property - it is not maintained between calls. This probably causes it to do more work than needed, since it does a complete heapify on each subsequent call. This could be fixed by doing a complete heapify once up front, then calling minheap(a,0) before the return r line.

I ended up writing it myself using the data structures from the 'algorithm' gem. It wasn't as bad as I expected.
require 'algorithms'
class LazyHeapMerger
def initialize(sorted_arrays)
#heap = Containers::Heap.new { |x, y| (x.first <=> y.first) == -1 }
sorted_arrays.each do |a|
q = Containers::Queue.new(a)
#heap.push([q.pop, q])
end
end
def each
while #heap.length > 0
value, q = #heap.pop
#heap.push([q.pop, q]) if q.size > 0
yield value
end
end
end
m = LazyHeapMerger.new([[1, 2], [3, 5], [4]])
m.each do |o|
puts o
end

Here's an implementation which should work on any Enumerable, even infinite ones. It returns Enumerator.
def lazy_merge *list
list.map!(&:enum_for) # get an enumerator for each collection
Enumerator.new do |yielder|
hash = list.each_with_object({}){ |enum, hash|
begin
hash[enum] = enum.next
rescue StopIteration
# skip empty enumerators
end
}
loop do
raise StopIteration if hash.empty?
enum, value = hash.min_by{|k,v| v}
yielder.yield value
begin
hash[enum] = enum.next
rescue StopIteration
hash.delete(enum) # remove enumerator that we already processed
end
end
end
end
Infinity = 1.0/0 # easy way to get infinite range
p lazy_merge([1, 3, 5, 8], (2..4), (6..Infinity), []).take(12)
#=> [1, 2, 3, 3, 4, 5, 6, 7, 8, 8, 9, 10]

No, there's nothing built in to do that. At least, nothing that springs instantly to mind. However, there was a GSoC project to implement the relevant data types a couple of years ago, which you could use.

Related

Ruby, remove super-arrays

If I have an array of arrays, A, and want to get rid of all arrays in A who also have a sub-array in A, how would I do that. In this context, array_1 is a sub-array of array_2 if array_1 - array_2 = []. In the case that multiple arrays are simply rearranged versions of the same elements, bonus points if you can get rid of all but one of them, but you can handle this however you want if it's easier.
In python, I could easily use comprehension, with A being a set of frozen sets :
A = {a for a in A if all(b-a for b in A-{a})}
Is there a simple way to write this in ruby? I don't care if the order of A or it's arrays are preserved at all. Also, in my program, none of the arrays have duplicate elements, if that makes things any easier/faster.
Example
A = [[1,6],[1,2],[2,4],[3,5],[1,3,6],[2,3,6]]
# [1,6] is a subarray of [1,3,6], so [1,3,6] should be removed
remove_super_arrays(A)
> A = [[1,6],[1,2],[2,4],[3,5],[2,3,6]]
A = [[1,2,4],[2,3,4],[1,4,5],[2,6]]
# although there is overlap, there are no subarrays, so nothing should be removed
remove_super_arrays(A)
> A = [[1,2,4],[2,3,4],[1,4,5],[2,6]]
A = [[1],[2,1,3],[2,4],[1,4]]
# [1] is a subarray of [2,1,3] and [1,4]
remove_super_arrays(A)
> A = [[1],[2,4]]
Code
def remove_super_arrays(arr)
order = arr.each_with_index.to_a.to_h
arr.sort_by(&:size).reject.with_index do |a,i|
arr[0,i].any? { |aa| (aa.size < a.size) && (aa-a).empty? }
end.sort_by { |a| order[a] }
end
Examples
remove_super_arrays([[1,6],[1,2],[2,4],[3,5],[1,3,6],[2,3,6]] )
#=> [[1,6],[1,2],[2,4],[3,5],[2,3,6]]
remove_super_arrays([[1,2,4],[2,3,4],[1,4,5],[2,6]])
#=> [[1,2,4],[2,3,4],[1,4,5],[2,6]]
remove_super_arrays([[1],[2,1,3],[2,4],[1,4]])
#=> [[1],[2,4]]
Explanation
Consider the first example.
arr = [[1,6],[1,2],[2,4],[3,5],[1,3,6],[2,3,6]]
We first save the positions of the elements of a
order = arr.each_with_index.to_a.to_h # save original order
#=> {[1, 6]=>0, [1, 2]=>1, [2, 4]=>2, [3, 5]=>3, [1, 3, 6]=>4, [2, 3, 6]=>5}
Then reject elements of arr:
b = arr.sort_by(&:size)
#=> [[1, 6], [1, 2], [2, 4], [3, 5], [1, 3, 6], [2, 3, 6]]
c = b.reject.with_index do |a,i|
arr[0,i].any? { |aa| (aa.size < a.size) && (aa-a).empty? }
end
#=> [[1, 6], [1, 2], [2, 4], [3, 5], [2, 3, 6]]
Lastly, reorder c to correspond to the original ordering of the elements of arr.
c.sort_by { |a| order[a] }
#=> [[1, 6], [1, 2], [2, 4], [3, 5], [2, 3, 6]]
which in this case happens to be the same order as the elements of c.
Let's look more carefully at the calculation of c:
enum1 = b.reject
#=> #<Enumerator: [[1, 6], [1, 2], [2, 4], [3, 5], [1, 3, 6],
# [2, 3, 6]]:reject>
enum2 = enum1.with_index
#=> #<Enumerator: #<Enumerator: [[1, 6], [1, 2], [2, 4], [3, 5],
# [1, 3, 6], [2, 3, 6]]:reject>:with_index>
The first element is generated by the enumerator enum2 and passed to the block and assigned as values of the block variables:
a, i = enum2.next
#=> [[1, 6], 0]
a #=> [1, 6]
i #=> 0
The block calculation is then performed:
d = arr[0,i]
#=> []
d.any? { |aa| (aa.size < a.size) && (aa-a).empty? }
#=> false
so a[0] is not rejected. The next pair passed to the block by enum2 is [[1, 2], 1]. That value is retained as well, but let's skip ahead to the last element passed to the block by enum2:
a, i = enum2.next
#=> [[1, 2], 1]
a, i = enum2.next
#=> [[2, 4], 2]
a, i = enum2.next
#=> [[3, 5], 3]
a, i = enum2.next
#=> [[1, 3, 6], 4]
a #=> [1, 3, 6]
i #=> 4
Perform the block calculation:
d = arr[0,i]
#=> [[1, 6], [1, 2], [2, 4], [3, 5]]
d.any? { |aa| (aa.size < a.size) && (aa-a).empty? }
#=> true
As true is returned, a is rejected. In the last calculation the first element of d is passed to the block and the following calculation is performed:
aa = [1, 6]
(aa.size < a.size)
#=> 2 < 3 => true
(aa-a).empty?
#=> ([1, 6] - [1, 3, 6]).empty? => [].empty? => true
As true && true #=> true, a ([1, 3, 6]) is rejected.
Alternative calculation
The following is a closer match to the OP's Python equivalent, but less efficient:
def remove_super_arrays(arr)
arr.select do |a|
(arr-[a]).all? { |aa| aa.size > a.size || (aa-a).any? }
end
end
or
def remove_super_arrays(arr)
arr.reject do |a|
(arr-[a]).any? { |aa| (aa.size < a.size) && (aa-a).empty? }
end
end
This was a nice exercise for me. I have used the logic from here.
My code iterates over each subarray (except the first), then there is the magic substraction using the first index, when it is empty the other array contained both numbers.
def remove_super_arrays(arr)
arr.each_with_index.with_object([]) do |(sub_array, index), result|
next if index == 0
result << sub_array unless (arr.first - sub_array).empty?
end.unshift(arr.first)
end
arr = [[1,6],[1,2],[2,4],[3,5],[1,3,6],[2,3,6]]
p remove_super_arrays(arr)
#=> [[1, 6], [1, 2], [2, 4], [3, 5], [2, 3, 6]]

Problems using `elsif` in a `sort` block

I have this array:
ary = [[1, 6, 7], [1, 4, 9], [1, 8, 3]]
I want to sort it by the first odd number, or the last number if they are all even, in each subarray.
Since the first element in each array is the same object 1 for this particular ary, I can solve this like this:
ary2 = ary.sort_by { |a, b, c| b.odd? ? b : c }
But when I try a more general one:
arr2 = ary.sort_by { |a, b, c| a.odd? ? a : b.odd? ? b : c }
ary2 comes back unsorted.
I tried removing the ternary operators like this:
ary2 = ary.sort_by do |a, b, c|
if a.odd?
a
elsif b.odd?
b
else
c
end
end
with the same effect (i.e., none).
Is there some reason that elsif can't be used in blocks passed to the sort_by method?
Edit: Axiac pointed out the problem with my logic. It looks like conditional logic has to deal with all of the possible permutations of odd and even values. This works:
arr2 = arr.sort_by do |a, b, c|
if a.odd?
if b.odd?
if c.odd?
[a, b, c]
else
[a, b]
end
elsif c.odd?
[a, c]
else
a
end
elsif b.odd?
if c.odd?
[b, c]
else
b
end
else
c
end
end
Maybe there's a more succinct and less brittle way to do it, but it's probably a good idea to do it this way instead:
arr2 = arr.sort_by do |sub_arr|
temp = sub_arr.select do |e|
e.odd?
end
temp.empty? ? Array(sub_arr.last) : temp
end
I'll see myself out.
Regarding your original question, just as axiac points out in the comment, the result of the sorting should be exactly the same as the input array because they are all sorted by the first odd element in each subarray, which is 1, and the sort method is stable in MRI.
Regarding your question after the edit, my answer would be:
ary.sort_by{|a| a[0...-1].select(&:odd?) << a.last}
# => [[1, 8, 3], [1, 6, 7], [1, 4, 9]]
I am pretty confident that this is what you wrote after the edit that you wanted, but I am not sure if this is what you wanted since the sorting mechanism looks strange to me.
I find the statement of the question ambiguous. I will give an answer that is consist with one interpretation. If that is not what you want, please clarify hte question.
def my_sort(arr)
arr.sort_by {|a| a.any?(&:odd?) ? a.map {|e| e.odd? ? e : Float::INFINITY} : [a.last]}
end
my_sort [[1, 6, 7], [1, 4, 9], [1, 2, 3]]
#=> [[1, ∞, 7], [1, ∞, 9], [1, ∞, 3]] (sort_by)
#=> [[1, 2, 3], [1, 6, 7], [1, 4, 9]]
my_sort [[3, 6, 7], [4, 1, 9], [5, 8, 1]]
#=> [[3, ∞, 7], [∞, 1, 9], [5, ∞, 1]] (sort_by)
#=> [[3, 6, 7], [5, 8, 1], [4, 1, 9]]
my_sort [[2, 6, 8], [4, 1, 4], [8, 6, 2]]
#=> [[8], [∞, 1, ∞], [2]] (sort_by)
#=> [[8, 6, 2], [2, 6, 8], [4, 1, 4]]
my_sort [[8, 6, 2], [5, 1, 1], [6, 8, 4]]
#=> [[2], [5, 1, 1], [4] (sort_by)
#=> [[8, 6, 2], [6, 8, 4], [5, 1, 1]]
For each example I've shown the arrays used by sort_by to produce the sort shown on the following line.

Inserting elements into new array and then deleting from old array, some elements getting ignored

I'm trying to remove pairs of the smallest and largest elements from an Array and store them in a second one. Is there a better way to do this or a Ruby method I don't know about that could accomplish something like this?
Here's my code:
nums = [1, 2, 3, 4, 5, 6]
pairs = []; for n in nums
pairs << [n, nums.last]
nums.delete nums.last
nums.delete n
end
Current result:
nums
#=> [2, 4]
pairs
#=> [[1, 6], [3, 5]]
Expected result:
nums
#=> []
pairs
#=> [[1, 6], [2, 5], [3, 4]]
Assuming nums is sorted and can be modified, I like this way because it has a mechanical feel about it:
pairs = (nums.size/2).times.map { [nums.shift, nums.pop] }
#=> [[1, 6], [2, 5], [3, 4]]
nums
#=> []
I see #Drenmi has the same idea of using shift and pop.
If you don't want to modify nums, you could of course operate on a copy.
Enumerating over an Array while deleting it's content is generally not advisible. Here's an alternative solution:
nums = *(1..6)
#=> [1, 2, 3, 4, 5, 6]
pairs = []
#=> []
until nums.size < 2 do
pairs << [nums.shift, nums.pop]
end
pairs
#=> [[1, 6], [2, 5], [3, 4]]

How to access a nested element, passing array with coordinates

Is there any short way to access an element of a nested array, passing the array with coordinates? I mean something like:
matrix = [[1,2,3,4],[5,6,7,8]]
array = [1,1]
matrix [array]
# => 6
I just wonder if there is a shorter version than:
matrix [array[0]][array[1]]
I believe you want to use the Matrix class:
require 'matrix'
arr = [[1,2,3,4],[5,6,7,8]]
matrix = Matrix[*arr] #=> Matrix[[1, 2, 3, 4], [5, 6, 7, 8]]
matrix[1,1] #=> 6
matrix.row(1) #=> Vector[5, 6, 7, 8]
c = matrix.column(1) #=> Vector[2, 6]
c.to_a #=> [2, 6]
m = matrix.transpose #=> Matrix[[1, 5], [2, 6], [3, 7], [4, 8]]
m.to_a #=> [[1, 5], [2, 6], [3, 7], [4, 8]]
array.inject(matrix, :fetch)
# => 6
matrix[1][1]
should equal 6. matrix[1] is the 2nd array, matrix[1][1] is the second element in that array.

How can I interweave items from two arrays?

How can I go from this:
for number in [1,2] do
puts 1+number
puts 2+number
puts 3+number
end
which will return 2,3,4 then 3,4,5 -> 2,3,4,3,4,5. This is just an example, and clearly not the real use.
Instead, I would like it to return 2,3 3,4 4,5 -> 2,3,3,4,4,5. I would like each of the puts to be iterated for each of the possible values of number; In this case 1 and 2 are the two possible values of 'number', before moving on to the next puts.
One way to do this is to create two lists, [2,3,4] and [3,4,5] and then use the zip method to combine them like [2,3,4].zip([3,4,5]) -> [2,3,3,4,4,5].
zip is good. You should also look at each_cons:
1.9.2p290 :006 > [2,3,4].each_cons(2).to_a
=> [[2, 3], [3, 4]]
1.9.2p290 :007 > [2,3,4,5,6].each_cons(2).to_a
=> [[2, 3], [3, 4], [4, 5], [5, 6]]
1.9.2p290 :008 > [2,3,4,5,6].each_cons(3).to_a
=> [[2, 3, 4], [3, 4, 5], [4, 5, 6]]
Because each_cons returns an Enumerator, you can use a block with it, as mentioned in the documentation for it, or convert it to an array using to_a like I did above. That returns the array of arrays, which can be flattened to get a single array:
[2,3,4,5].each_cons(2).to_a.flatten
=> [2, 3, 3, 4, 4, 5]
From the ri docs:
Iterates the given block for each array of consecutive elements. If no
block is given, returns an enumerator.
e.g.:
(1..10).each_cons(3) {|a| p a}
# outputs below
[1, 2, 3]
[2, 3, 4]
[3, 4, 5]
[4, 5, 6]
[5, 6, 7]
[6, 7, 8]
[7, 8, 9]
[8, 9, 10]
Maybe not the most readable code but you could use inject on the first range to create an array based on the summed up second range.
(1..3).inject([]){|m,n| (1..2).each{|i| m<<n+i }; m }
=> [2, 3, 3, 4, 4, 5]
This might be a little more readable
res=[]
(1..3).each{|r1| (1..2).each{|r2| res<<r1+r2 } }
[1, 2, 3].each { |i| [1, 2].each { |y| puts i + y } }

Resources