ruby array intersection performance issue - ruby

I need to make intersection of n-arrays with millions of elements (database ID's).
This code works perfect, but slow (with very big arrays). How can i improve it?
[[1,2,3,4],[2,4,6,8],[4,5,8]].inject([]){|c,v| c = v if c.size==0; c = c&v if c.size>0; c }

[1,2,3,4] & [2,4,6,8] & [4,5,8] #=> [4]
The intersection method uses hash so it should be quick.

Ruby provides an intersection opperator.
May I suggest you try this:
> [[1,2,3,4],[2,4,6,8],[4,5,8]].reduce{ |accum, arr| accum & arr }
=> [4]
Edit:
This can be written a little more concise but it suffers from readability.
[[1,2,3,4],[2,4,6,8],[4,5,8]].reduce(:&)

Related

What are some nice ways to reverse a nested hash?

Suppose we have
b = {"b"=>"c"}
By doing b.invertwe can easily obtain the result of
{"c"=>"b"}
Thats when I thought of trying something pretty cool. Suppose we have
a = {"a"=>{"b"=>"c"}}
Whats a fairly efficient way to make this {{"c"=>"b"}=>"a"} (Here we reverse the most inner hash and work our way out)
Of course it would be best to extend this to n amount of hashes within each other. I've been looking for some other questions similar but haven't found any.
Thanks.
This can be accomplished with a recursive method for inverting the keys of the hash (and values, if desired). For example:
hsh = {{"c"=>"b"}=>"a"}
def recursive_invert(hsh)
hsh.each_with_object({}) do |(k, v), inverted_hsh|
if k.is_a? Hash
k = recursive_invert(k)
end
inverted_hsh[v] = k
end
end
recursive_invert(hsh) # {"a"=>{"b"=>"c"}}
Here's A recursive solution that will work in both directions.
def deep_invert(h)
h.each_with_object({}) do |(k,v),obj|
k = deep_invert(k) if k.is_a?(Hash)
v = deep_invert(v) if v.is_a?(Hash)
obj[v] = k
end
end
Example:
a = {"a"=>{"b"=>"c"}}
deep_invert(a)
#=> {{"c"=>"b"}=>"a"}
deep_invert(deep_invert(a)) == a
#=> true

Ruby - Find element not in common for two arrays

I've been thinking about a following problem - there are two arrays, and I need to find elements not common for them both, for example:
a = [1,2,3,4]
b = [1,2,4]
And the expected answer is [3].
So far I've been doing it like this:
a.select { |elem| !b.include?(elem) }
But it gives me O(N ** 2) time complexity. I'm sure it can be done faster ;)
Also, I've been thinking about getting it somehow like this (using some method opposite to & which gives common elements of 2 arrays):
a !& b #=> doesn't work of course
Another way might be to add two arrays and find the unique element with some method similar to uniq, so that:
[1,1,2,2,3,4,4].some_method #=> would return 3
The simplest (in terms of using only the arrays already in place and stock array methods, anyway) solution is the union of the differences:
a = [1,2,3,4]
b = [1,2,4]
(a-b) | (b-a)
=> [3]
This may or may not be better than O(n**2). There are other options which are likely to give better peformance (see other answers/comments).
Edit: Here's a quick-ish implementation of the sort-and-iterate approach (this assumes no array has repeated elements; otherwise it will need to be modified depending on what behavior is wanted in that case). If anyone can come up with a shorter way to do it, I'd be interested. The limiting factor is the sort used. I assume Ruby uses some sort of Quicksort, so complexity averages O(n log n) with possible worst-case of O(n**2); if the arrays are already sorted, then of course the two calls to sort can be removed and it will run in O(n).
def diff a, b
a = a.sort
b = b.sort
result = []
bi = 0
ai = 0
while (ai < a.size && bi < b.size)
if a[ai] == b[bi]
ai += 1
bi += 1
elsif a[ai]<b[bi]
result << a[ai]
ai += 1
else
result << b[bi]
bi += 1
end
end
result += a[ai, a.size-ai] if ai<a.size
result += b[bi, b.size-bi] if bi<b.size
result
end
As #iamnotmaynard noted in the comments, this is traditionally a set operation (called the symmetric difference). Ruby's Set class includes this operation, so the most idiomatic way to express it would be with a Set:
Set.new(a) ^ b
That should give O(n) performance (since a set membership test is constant-time).
a = [1, 2, 3]
b = [2, 3, 4]
a + b - (a & b)
# => [1, 4]
The solution for Array divergences is like:
a = [1, 2, 3]
b = [2, 3, 4]
(a - b) | (b - a)
# => [1, 4]
You can also read my blog post about Array coherences

Add two array of arrays in Ruby

I have two arrays:
a = [[1,2],[3,4]];
b = [[5,6],[7,8]];
I want the resultant array to be their sum, i.e.,
c = [[6,8],[10,12]];
Would there be an elegant way to do so?
Note:
I currently know that to simply add a = [1,2] with b = [3,4] to get c = [4,6] I need to do
c = [a,b].transpose.map{|x| x.reduce(:+)};
but I'm not sure how to, if possible, extend this to my problem.
a.zip(b).map { |x,y| x.zip(y).map { |s| s.inject(:+) } }
c = [a, b].transpose.map{|ary| ary.transpose.map{|ary| ary.inject(:+)}}
An alternative, with better expression for manipulating numbers, would be to use 'narray'
require 'narray'
a = NArray[[1,2],[3,4]]
b = NArray[[5,6],[7,8]]
c = a + b
. . . yes really, c = a + b and it is much faster too.
You do pay for this though - NArray expects all the elements to contain the same type of object. If that's the case, and especially if your real-world problem has much larger matrices, then I highly recommend narray for handling this kind of data

Is there a pre-built function to add the elements in two arrays?

If I have two arrays:
a = [1,2,3]
b = [2,3,4]
Is there a pre-built function to add the two arrays to give
c = a + b = [3,5,7]
i.e. add the values of each element in the array?
No, there isn't one method for this. But you can combine zip and map like this:
c = a.zip(b).map {|a,b| a+b}
I think the closest thing to what you ask is:
[1,2,3].zip([2,3,4]).map{|x| x.reduce(:+)}
it works even with more arrays
[1,2,3].zip([2,3,4], [3,4,5], [4,5,6]).map{|x| x.reduce(:+)}
That looks a lot like vector addition. Here's one way to accomplish that:
require 'matrix'
a = Vector[1,2,3]
b = Vector[2,3,4]
puts a+b
#=> Vector[3,5,7]
Simply use to_a on a Vector to get an array.

Will array.each work properly if the array is updated during each iteration?

Will the method to process each element of an Array properly do that if the array is being updated within the code block of the each loop?
For example:
arr.each do |x|
if (x != 2)
arr.push(x+4)
end
end
Will the loop still iterate over every element within the array, even though it is being lengthened?
Maybe
Yes, if you are talking about MRI, and the question is: "Will the iterator traverse my new elements?".
If you are talking about Ruby as a language, "maybe". There is no specification so MRI serves as the reference implementation.
But having said that, this just seems like something that would be implementation-specific, partly because requiring any specific behavior would impose a constraint on implementations for no clear benefit, but with certain performance trade-offs.
It's also quite imperative, so it's perhaps not "the Ruby way", which leans more to functional styles.
Here is how I think a good Ruby program should write that sort of loop. This expression will return the old array a unless it changes, in which case it creates a new array in a functional style so there is never any doubt about what the result will be...
>> a = [1, 2, 3]
=> [1, 2, 3]
>> a.inject(a) { |m, e| e < 99 ? m + [99] : m }
=> [1, 2, 3, 99, 99, 99]
A faster (if lots of new elements are added) semi-functional expression would be:
t = a.inject(a.dup) { |m, e| e < 99 ? m << 99 : m }
Yes, it will keep looping until it reaches the end of the array which will probably never happen since it keeps getting new entries with every iteration.
So I would strongly suggest against you current code since you will probably be stuck in an infinite loop.
Not sure exactly what you are going for, however this code would be much better since it has a clear ending:
arr.each do |x|
if x < 2
arr.push x + 4
end
end

Resources