How to group Arrays with similar values - ruby

I'm looking for a smart way to group any number of arrays with similar values (not necessarily in the same order). The language I'm using is ruby but I guess the problem is pretty language agnostic.
Given
a = ['foo', 'bar']
b = ['bar', 'foo']
c = ['foo', 'bar', 'baz']
d = ['what', 'ever', 'else']
e = ['foo', 'baz', 'bar']
I'd like to have a function that tells me that
a & b are in one group
c & e are in one group
d is it's own group
I can think of a number of not so smart ways of doing this very inefficient, like I could compare each array's values to each others array's values.
Or I could check if ((a - b) + (b - a)).length == 0 for all combinations of arrays and group the ones that result in 0. Or I could check if a.sort == b.sort for all combinations of arrays.
I'm sure someone before me has solved this problem way more efficiently. I just can't seem to find how.

You can do it with sort without doing it "for all combinations of arrays" but doing it only for all arrays (Schwartzian transform).
arrays = [a, b, c, d, e]
arrays.group_by{|array| array.sort}.values

Related

Using a pair of values as a key

Very frequently I've had the need to hash a pair of values. Often, I just generate a range between num1 and num2 and hash that as a key, but that's pretty slow because the distance between those two numbers can be quite large.
How can one go about hashing a pair of values to a table? For example, say I'm iterating through an array and want to hash every single possible pair of values into a hash table, where the key is the pair of nums and the value is their sum. What's an efficient way to do this? I've also thought about hashing an an array as the key, but that doesn't work.
Also, how would one go about extending this to 3,4, or 5 numbers?
EDIT:
I'm referring to hashing for O(1) lookup in a hashtable.
Just do it.
You can simply hash on the array...
Verification
Let me show a little experiment:
array = [ [1,2], [3,4], ["a", "b"], ["c", 5] ]
hash = {}
array.each do |e|
e2 = e.clone
e << "dummy"
e2 << "dummy"
hash[e] = (hash[e] || 0) + 1
hash[e2] = (hash[e2] || 0) + 1
puts "e == e2: #{(e==e2).inspect}, e.id = #{e.object_id}, e.hash = #{e.hash}, e2.id = #{e2.object_id}, e2.hash = #{e2.hash}"
end
puts hash.inspect
As you see, I take a few arrays, clone them, modify them separately; after this, we are sure that e and e2 are different arrays (i.e. different object IDs); but they contain the same elements. After this, the two different arrays are used as hash keys; and since they have the same content, are hashed together.
e == e2: true, e.id = 19797864, e.hash = -769884714, e2.id = 19797756, e2.hash = -769884714
e == e2: true, e.id = 19797852, e.hash = -642596098, e2.id = 19797588, e2.hash = -642596098
e == e2: true, e.id = 19797816, e.hash = 104945655, e2.id = 19797468, e2.hash = 104945655
e == e2: true, e.id = 19797792, e.hash = -804444135, e2.id = 19797348, e2.hash = -804444135
{[1, 2, "dummy"]=>2, [3, 4, "dummy"]=>2, ["a", "b", "dummy"]=>2, ["c", 5, "dummy"]=>2}
As you see, you can not only use arrays as keys, but it also recognizes them as being the "same" (and not some weird object identity which it could also be).
Caveat
Obviously this works only to a point. The contents of the arrays must recursively be well-defined with regards to hashing. I.e., you can use sane things like strings, numbers, other arrays, even nil in there.
Reference
From http://ruby-doc.org/core-2.4.0/Hash.html :
Two objects refer to the same hash key when their hash value is identical and the two objects are eql? to each other.
From http://ruby-doc.org/core-2.4.0/Array.html#method-i-eql-3F :
eql?(other) → true or false
Returns true if self and other are the same object, or are both arrays with the same content (according to Object#eql?).
hash → integer
Compute a hash-code for this array.
Two arrays with the same content will have the same hash code (and will compare using eql?).
Emphasis mine.
If you are using a range or array, then you can also call hash on it and use that.
(num1..num2).hash
[num1, num2].hash
That will return a key that you can use as a hash. I have no idea if this is efficient. It does show the source code on the range documentation and the array documentation
Another way I would do it is to turn the numbers into strings. This is the better solution if you are worried about hash collisions.
'num1:num2'
And the ruby-esque ways that I would solve your problem are:
number_array.combination(2).each { |arr| my_hash[arr.hash] = arr }
number_array.combination(2).each { |arr| my_hash[arr.join(":")] = arr }
A hash table, where the key is the pair of nums and the value is their sum:
h = {}
[1,4,6,8].combination(2){|ar| h[ar] = ar.sum}
p h #=>{[1, 4]=>5, [1, 6]=>7, [1, 8]=>9, [4, 6]=>10, [4, 8]=>12, [6, 8]=>14}
Note that using arrays as hash keys is no problem at all. To extend this to 3,4, or 5 numbers use combination(3) #or 4 or 5.

Ruby - Find element not in common for two arrays

I've been thinking about a following problem - there are two arrays, and I need to find elements not common for them both, for example:
a = [1,2,3,4]
b = [1,2,4]
And the expected answer is [3].
So far I've been doing it like this:
a.select { |elem| !b.include?(elem) }
But it gives me O(N ** 2) time complexity. I'm sure it can be done faster ;)
Also, I've been thinking about getting it somehow like this (using some method opposite to & which gives common elements of 2 arrays):
a !& b #=> doesn't work of course
Another way might be to add two arrays and find the unique element with some method similar to uniq, so that:
[1,1,2,2,3,4,4].some_method #=> would return 3
The simplest (in terms of using only the arrays already in place and stock array methods, anyway) solution is the union of the differences:
a = [1,2,3,4]
b = [1,2,4]
(a-b) | (b-a)
=> [3]
This may or may not be better than O(n**2). There are other options which are likely to give better peformance (see other answers/comments).
Edit: Here's a quick-ish implementation of the sort-and-iterate approach (this assumes no array has repeated elements; otherwise it will need to be modified depending on what behavior is wanted in that case). If anyone can come up with a shorter way to do it, I'd be interested. The limiting factor is the sort used. I assume Ruby uses some sort of Quicksort, so complexity averages O(n log n) with possible worst-case of O(n**2); if the arrays are already sorted, then of course the two calls to sort can be removed and it will run in O(n).
def diff a, b
a = a.sort
b = b.sort
result = []
bi = 0
ai = 0
while (ai < a.size && bi < b.size)
if a[ai] == b[bi]
ai += 1
bi += 1
elsif a[ai]<b[bi]
result << a[ai]
ai += 1
else
result << b[bi]
bi += 1
end
end
result += a[ai, a.size-ai] if ai<a.size
result += b[bi, b.size-bi] if bi<b.size
result
end
As #iamnotmaynard noted in the comments, this is traditionally a set operation (called the symmetric difference). Ruby's Set class includes this operation, so the most idiomatic way to express it would be with a Set:
Set.new(a) ^ b
That should give O(n) performance (since a set membership test is constant-time).
a = [1, 2, 3]
b = [2, 3, 4]
a + b - (a & b)
# => [1, 4]
The solution for Array divergences is like:
a = [1, 2, 3]
b = [2, 3, 4]
(a - b) | (b - a)
# => [1, 4]
You can also read my blog post about Array coherences

Add two array of arrays in Ruby

I have two arrays:
a = [[1,2],[3,4]];
b = [[5,6],[7,8]];
I want the resultant array to be their sum, i.e.,
c = [[6,8],[10,12]];
Would there be an elegant way to do so?
Note:
I currently know that to simply add a = [1,2] with b = [3,4] to get c = [4,6] I need to do
c = [a,b].transpose.map{|x| x.reduce(:+)};
but I'm not sure how to, if possible, extend this to my problem.
a.zip(b).map { |x,y| x.zip(y).map { |s| s.inject(:+) } }
c = [a, b].transpose.map{|ary| ary.transpose.map{|ary| ary.inject(:+)}}
An alternative, with better expression for manipulating numbers, would be to use 'narray'
require 'narray'
a = NArray[[1,2],[3,4]]
b = NArray[[5,6],[7,8]]
c = a + b
. . . yes really, c = a + b and it is much faster too.
You do pay for this though - NArray expects all the elements to contain the same type of object. If that's the case, and especially if your real-world problem has much larger matrices, then I highly recommend narray for handling this kind of data

Two indexes in Ruby for loop

can you have a ruby for loop that has two indexes?
ie:
for i,j in 0..100
do something
end
Can't find anything in google
EDIT: Adding in more details
I need to compare two different arrays like such
Index: Array1: Array2:
0 a a
1 a b
2 a b
3 a b
4 b b
5 c b
6 d b
7 d b
8 e c
9 e d
10 e d
11 e
12 e
But knowing that they both have the same items (abcde)
This is my logic in pseudo, lets assume this whole thing is inside a loop
#tese two if states are for handling end-of-array cases
If Array1[index_a1] == nil
Errors += Array1[index_a1-1]
break
If Array2[index_a1] == nil
Errors += Array2[index_a2-1]
break
#this is for handling mismach
If Array1[index_a1] != Array2[index_a2]
Errors += Array1[index_a1-1] #of course, first entry of array will always be same
if Array1[index_a1] != Array1[index_a1 - 1]
index_a2++ until Array1[index_a1] == Array2[index_a2]
index_a2 -=1 (these two lines are for the loop's sake in next iteration)
index_a1 -=1
if Array2[index_a2] != Array2[index_a2 - 1]
index_a1++ until Array1[index_a1] == Array2[index_a2]
index_a2 -=1 (these two lines are for the loop's sake in next iteration)
index_a1 -=1
In a nutshell, in the example above,
Errors looks like this
a,b,e
As c and d are good.
You could iterate over two arrays using Enumerators instead of numerical indices. This example iterates over a1 and a2 simultaneously, echoing the first word in a2 that starts with the corresponding letter in a1, skipping duplicates in a2:
a1 = ["a", "b", "c", "d"]
a2 = ["apple", "angst", "banana", "clipper", "crazy", "dizzy"]
e2 = a2.each
a1.each do |letter|
puts e2.next
e2.next while e2.peek.start_with?(letter) rescue nil
end
(It assumes all letters in a1 have at least one word in a2 and that both are sorted -- but you get the idea.)
The for loop is not the best way to approach iterating over an array in Ruby. With the clarification of your question, I think you have a few possibly strategies.
You have two arrays, a and b.
If both arrays are the same length:
a.each_index do |index|
if a[index] == b[index]
do something
else
do something else
end
end
This also works if A is shorter than B.
If you don't know which one is shorter, you could write something like:
controlArray = a.length < b.length ? a : b to assign the controlArray, the use controlArray.each_index. Or you could use (0..[a.length, b.length].min).each{|index| ...} to accomplish the same thing.
Looking over your edit to your question, I think I can rephrase it like this: given an array with duplicates, how can I obtain a count of each item in each array and compare the counts? In your case, I think the easiest way to do that would be like this:
a = [:a,:a,:a,:b,:b,:c,:c,:d,:e,:e,:e]
b = [:a,:a,:b,:b,:b,:c,:c,:c,:d,:e,:e,:e]
not_alike = []
a.uniq.each{|value| not_alike << value if a.count(value) != b.count(value)}
not_alike
Running that code gives me [:a,:b,:c].
If it is possible that a does not contain every symbol, then you will need to have an array which just contains the symbols and use that instead of a.uniq, and another and statement in the conditional could deal with nil or 0 counts.
the two arrays are praticatly the same except for a few elements that i have to skip in either/or every once in a while
Instead of skipping during iterating, could you pre-select the non-skippable ones?
a.select{ ... }.zip( b.select{ ... } ).each do |a1,b1|
# a1 is an entry from a's subset
# b1 is the paired entry bfrom b's subset
end

Is there a pre-built function to add the elements in two arrays?

If I have two arrays:
a = [1,2,3]
b = [2,3,4]
Is there a pre-built function to add the two arrays to give
c = a + b = [3,5,7]
i.e. add the values of each element in the array?
No, there isn't one method for this. But you can combine zip and map like this:
c = a.zip(b).map {|a,b| a+b}
I think the closest thing to what you ask is:
[1,2,3].zip([2,3,4]).map{|x| x.reduce(:+)}
it works even with more arrays
[1,2,3].zip([2,3,4], [3,4,5], [4,5,6]).map{|x| x.reduce(:+)}
That looks a lot like vector addition. Here's one way to accomplish that:
require 'matrix'
a = Vector[1,2,3]
b = Vector[2,3,4]
puts a+b
#=> Vector[3,5,7]
Simply use to_a on a Vector to get an array.

Resources