How to match data from 2 arrays efficiently with Ruby - ruby

Right now I have this:
array1 = [obj1, obj2, obj3, obj4]
array2 = [obj1, obj2, obj5, obj6]
array1.each do |item1|
array2.each do |item2|
if (item1[0] == item2[0]) && (item1[1] == item2[1])
p "do stuff"
end
end
end
I need to match 2 pieces of data from each array but both arrays are very large and I'm wondering if there is a faster way to do this.
My current setup requires looking at each element in the second array for each element in the first array which seems terribly inefficient.

Combining map and intersection:
(array1.map { |a| a.first 2 } & array2.map { |a| a.first 2 }).each do
p "do_stuff"
end
Performance should be good. Memory intensive though.

If the two arrays is all you have, you can't avoid the O(n^2) complexity that DigitalRoss mentioned. However, you can index the data so that the next time you don't have to do it all again. In the simplest case, you can use hashes to allow direct access to your data based on the criteria used in the test:
index1 = array1.each_with_object({}){|e, acc|
acc[[e[0], e[1]]] ||= []
acc[[e[0], e[1]]] << e
}
and the same thing for the other array. Then, your loop would look like:
index1.each do |key1, vals1|
if vals2 = index2[key1]
vals1.product(vals2).each do |e1, e2|
p do_stuff
end
end
end
which is, I believe, O(n).

This is very slow because - as DigitalRoss already stated - it is O(n^2). Assuming that eql? is just as fine for you as ==, you can build an index and iterate over that instead, that'll be O(n+m) instead:
array1 = [obj1, obj2, obj3, obj4]
array2 = [obj1, obj2, obj5, obj6]
index = {}
found = []
array1.each do |item1| index[item1.first(2)] = item1 end
array2.each do |item2|
item1 = index[item2.first(2)]
found << [item1,item2] if item1 then
end
found.each do |item1, item2| puts "do something" end
This assumes that the combination of the first 2 elements of all elements in array1 are unique within array1. If that's not the case, the code will be slightly more complex:
array1 = [obj1, obj2, obj3, obj4]
array2 = [obj1, obj2, obj5, obj6]
index = {}
found = []
array1.each do |item1|
key = item1.first(2)
index[key] ||= []
index[key] << item1
end
array2.each do |item2|
items_from_1 = index[item2.first(2)]
if items_from_1 then
found.concat(items_from_1.map { |item1| [item1,item2] })
end
end
found.each do |item1, item2| puts "do something" end
Since you didn't provide any sample data, I didn't test the code.
I hope that helps.

Related

Ruby - Initialize has key-value in a loop

I have a hash of key value pairs, similar to -
myhash={'test1' => 'test1', 'test2 => 'test2', ...}
how can I initialize such a hash in a loop? Basically I need it to go from 1..50 with the same test$i values but I cannot figure out how to initialize it properly in a loop instead of doing it manually.
I know how to loop through each key-value pair individually:
myhash.each_pair do |key, value|
but that doesn't help with init
How about:
hash = (1..50).each.with_object({}) do |i, h|
h["test#{i}"] = "test#{i}"
end
If you want to do this lazily, you could do something like below:
hash = Hash.new { |hash, key| key =~ /^test\d+/ ? hash[key] = key : nil}
p hash["test10"]
#=> "test10"
p hash
#=> {"test10"=>"test10"}
The block passed to Hash constructor will be invoked whenever a key is not found in hash, we check whether key follows a certain pattern (based on your need), and create a key-value pair in hash where value is equal to key passed.
(1..50).map { |i| ["test#{i}"] * 2 }.to_h
The solution above is more DRY than two other answers, since "test" is not repeated twice :)
It is BTW, approx 10% faster (that would not be a case when keys and values differ):
require 'benchmark'
n = 500000
Benchmark.bm do |x|
x.report { n.times do ; (1..50).map { |i| ["test#{i}"] * 2 }.to_h ; end }
x.report { n.times do ; (1..50).each.with_object({}) do |i, h| ; h["test#{i}"] = "test#{i}" ; end ; end }
end
user system total real
17.630000 0.000000 17.630000 ( 17.631221)
19.380000 0.000000 19.380000 ( 19.372783)
Or one might use eval:
hash = {}
(1..50).map { |i| eval "hash['test#{i}'] = 'test#{i}'" }
or even JSON#parse:
require 'json'
JSON.parse("{" << (1..50).map { |i| %Q|"test#{i}": "test#{i}"| }.join(',') << "}")
First of all, there's Array#to_h, which converts an array of key-value pairs into a hash.
Second, you can just initialize such a hash in a loop, just do something like this:
target = {}
1.upto(50) do |i|
target["test_#{i}"] = "test_#{i}"
end
You can also do this:
hash = Hash.new{|h, k| h[k] = k.itself}
(1..50).each{|i| hash["test#{i}"]}
hash # => ...

Compare two dimensional arrays

I have two two-dimensional arrays,
a = [[17360, "Z51.89"],
[17361, "S93.601A"],
[17362, "H66.91"],
[17363, "H25.12"],
[17364, "Z01.01"],
[17365, "Z00.121"],
[17366, "Z00.129"],
[17367, "K57.90"],
[17368, "I63.9"]]
and
b = [[17360, "I87.2"],
[17361, "s93.601"],
[17362, "h66.91"],
[17363, "h25.12"],
[17364, "Z51.89"],
[17365, "z00.121"],
[17366, "z00.129"],
[17367, "k55.9"],
[17368, "I63.9"]]
I would like to count similar rows in both the arrays irrespective of the character case, i.e., "h25.12" would be equal to "H25.12".
I tried,
count = a.count - (a - b).count
But (a - b) returns
[[17360, "Z51.89"],
[17361, "S93.601A"],
[17362, "H66.91"],
[17363, "H25.12"],
[17364, "Z01.01"],
[17365, "Z00.121"],
[17366, "Z00.129"],
[17367, "K57.90"]]
I need the count as 5 since there are five similar rows when we do not consider the character case.
Instead of a - b you should do this:
a.map{|k,v| [k,v.downcase]} - b.map{|k,v| [k,v.downcase]} # case-insensitive
You can convert Arrays to Hash, and use Enumerable#count with a block.
b_hash = b.to_h
a.to_h.count {|k, v| b_hash[k] && b_hash[k].downcase == v.downcase }
# => 5
It will convert second element of inner array to upcase for both array then you can perform subtraction, then It will return exact result that you want
a.map{|first,second| [first,second.upcase]} - b.map{|first,second| [first,second.upcase]}
You can zip them and then use the block form of count:
a.zip(b).count{|e| e[0][1].downcase == e[1][1].downcase}
a.count - (a.map{|e| [e[0],e[1].downcase] } - b.map{|e| [e[0],e[1].downcase] }).count
The above maps a and b to new arrays where the second sub-array element is downcase.
You want to count similar, so &(AND) operation is more suitable.
(a.map { |k, v| [k, v.upcase] } & b.map { |k, v| [k, v.upcase] }).count
Using Proc and '&':
procedure = Proc.new { |i, j| [i, j.upcase] }
(a.map(&procedure) & b.map(&procedure)).count
#=> 5
For better understanding, let's simplify it:
new_a = a.map {|i, j| [i, j.upcase]}
new_b = b.map {|i, j| [i, j.upcase]}
# Set intersection using '&'
(new_a & new_b).count
#=> 5
I have assumed that the ith element of a is to be compared with the ith element of b. (Edit: a subsequent comment by the OP confirmed this interpretation.)
I would be inclined to use indices to avoid the construction of relatively large temporary arrays. Here are two ways that might be done.
#1 Use indices
[a.size,b.size].min.size.times.count do |i|
af,al=a[i]
bf,bl=b[i];
af==bf && al.downcase==bl.downcase
end
#=> 5
#2 Use Refinements
My purpose in giving this solution is to illustrate the use of Refinements. I would not argue for its use for the problem at hand, but this problem provides a good vehicle for showing how the technique can be applied.
I could not figure out how best to do this, so I posted this question on SO. I've applied #ZackAnderson's answer below.
module M
refine String do
alias :dbl_eql :==
def ==(other)
downcase.dbl_eql(other.downcase)
end
end
refine Array do
def ==(other)
zip(other).all? {|x, y| x == y}
end
end
end
'a' == 'A' #=> false (as expected)
[1,'a'] == [1,'A'] #=> false (as expected)
using M
'a' == 'A' #=> true
[1,'a'] == [1,'A'] #=> true
I could use Enumerable#zip, but for variety I'll use Object#to_enum and Kernel#loop in conjunction with Enumerator#next:
ea, eb = a.to_enum, b.to_enum
cnt = 0
loop do
cnt += 1 if ea.next == eb.next
end
cnt #=> 5

Values in the while loop do not modify outside values

I have a long code but I tried to copy and adapt my problem in as few lines as possible . I have a method which creates an array( 2D ) with 0 and 1
array1 = newValue(2) - the number 2 represents how many 1 the array has
array2 = newValue(3)
and this loop
(0..9).each do|i|
(0..9).each do|j|
while((array1[i][j] == array2[i][j]) && (array2[i][j] == 1)) do
array1 = newvalue(2)
array2 = newvalue(3)
end
end
end
I'm using the while loop so I won t have a 1 in the same position in both arrays . But what is inside the while loop doesn't modify the values of the array . I also tried using map!/collect! but I think I did something wrong because nothing happened. I hope you can understand what I was trying to do .
Edit:
def newValue(value)
value = value.to_i
array = Array.new(10) { Array.new(10 , 0) }
(a lot of conditions on how to position the items in the array)
return array
end
Here's my take... hopefully it'll help out. It seems that what you noticed was true. The arrays are not getting reset. Probably because inside the each blocks, the scope is lost. This is probably because the are arrays. I took a slightly different approach. Put everything in a class so you can have instance variables that you can control and you know where they are and that they are always the same.
I pulled out the compare_arrays function which just returns the coordinates of the match if there is one. If not it returns nil. Then, youre while loop is simplified in the reprocess method. If you found a match, reprocess until you don't have a match any more. I used a dummy newValue method that just returned another 2d array (as you suggested yours does). This seems to do the trick from what I can tell. Give it a whirl and see what you think. You can access the two arrays after all the processing with processor.array1 as you can see I did at the bottom.
# generate a random 2d array with 0's and val's
def generateRandomArray(val=1)
array = []
(0..9).each do |i|
(0..9).each do |j|
array[i] ||= []
array[i][j] = (rand > 0.1) ? 0 : val
end
end
array
end
array1 = generateRandomArray
array2 = generateRandomArray
def newValue(val)
generateRandomArray(val)
end
class Processor
attr_reader :array1, :array2
def initialize(array1, array2)
#array1 = array1
#array2 = array2
end
def compare_arrays
found = false
for ii in 0..9
break unless for jj in 0..9
if ((#array2[ii][jj] == 1) && (#array1[ii][jj] == 1))
found = true
break
end
end
end
[ii,jj] if found
end
def reprocess
while compare_arrays
puts "Reprocessing"
#array1 = newValue(2)
#array2 = newValue(3)
reprocess
end
end
end
processor = Processor.new(array1, array2)
processor.reprocess
puts processor.array1.inspect

Ruby code to iterate through two array simulataneoulsy

How can I iterate two array in ruby simultaneously , I don't want to use for loop.
for e.g this are my array=
array 1=["a","b","c","d"]
array 2=[1,2,3,4]
You can use zip function for example like this :
array1.zip(array2).each do |array1_var, array2_var|
## whatever you want to do with array_1var and array_2 var
end
You can use Array#zip (no need to use each because zip accept optional block):
array1 = ["a","b","c","d"]
array2 = [1,2,3,4]
array1.zip(array2) do |a, b|
p [a,b]
end
Or, Array#transpose:
[array1, array2].transpose.each do |a, b|
p [a,b]
end
You can zip them together and then iterate through the pairs using each.
array1.zip(array2).each do |pair|
p pair
end
if both arrays are of the same size, you can do:
array1=["a","b","c","d"]
array2=[1,2,3,4]
for i in 0..arr1.length do
//here you do what you want with array1[i] and array2[i]
end
Assuming both arrays are the same size, you can use each_with_index to iterate through them, using the index for the second array:
array1.each_with_index do |item1, index|
item2 = array2[index]
# do something with item1, item2
end
Like so:
irb(main):007:0> array1.each_with_index do |item1, index|
irb(main):008:1* item2 = array2[index]
irb(main):009:1> puts item1, item2
irb(main):010:1> end
a
1
b
2
c
3
d
4
When both of the array are of having same size,you could do as below :
array1=["a","b","c","d"]
array2=[1,2,3,4]
array2.each_index{|i| p "#{array2[i]},#{array1[i]} at location #{i}"}
# >> "1,a at location 0"
# >> "2,b at location 1"
# >> "3,c at location 2"
# >> "4,d at location 3"
And if there is a chance that one array is larger than other array,then larger_array#each_index has to be called.

How would you implement this idiom in ruby?

As someone who came from Java background and being a newbie to Ruby,
I was wondering if there is a simple way of doing this with ruby.
new_values = foo(bar)
if new_values
if arr
arr << new_values
else
arr = new_values
end
end
Assuming "arr" is either an array or nil, I would use:
arr ||= []
arr << new_values
If you're doing this in a loop or some such, there might be more idiomatic ways to do it. For example, if you're iterating a list, passing each value to foo(), and constructing an array of results, you could just use:
arr = bars.map {|bar| foo(bar) }
If I'm understanding you correctly, I would probably do:
# Start with an empty array if it hasn't already been set
#arr ||= []
# Add the values to the array as elements
#arr.concat foo(bar)
If you use #arr << values you are adding the entire array of values to the end of the array as a single nested entry.
arr = [*arr.to_a + [*new_values.to_a]]
Start with:
arr ||= []
And then, depending on whether new_values is an array or not
arr += new_values # if array
arr << new_values # if not
arr += [*new_values] # if it could be either
Furthermore, you can get rid of the test on new_values by taking advantage of the fact that NilClass implements a .to_a => [] method and reduce everything to:
arry ||= []
arr += [*new_values.to_a]
But wait, we can use that trick again and make the entire thing into a one-liner:
arr = [*arr.to_a + [*new_values.to_a]]
I don't intend to write an inexcrutable one-liner, but I think this is quite clear. Assuming, as Phrogz, that what you really need is an extend (concat):
arr = (arr || []).concat(foo(bar) || [])
Or:
(arr ||= []).concat(foo(bar) || [])
I would use:
new_values = foo(bar)
arr ||= []
arr << new_values if new_values

Resources