Best practice for writing complex, three-part, interchangeable "uniq" ruby block - ruby

I have an array of hashes:
array = [
{foo: 1, bar1: 2 bar2: 3, bar3: 4},
{foo: 2, bar1: 3 bar2: 4, bar3: 5},
{foo: 3, bar1: 4 bar2: 5, bar4: 6},
etc
]
I want to eliminate some redundant results from this array. Specifically, I want to eliminate any results where foo, bar1, and bar2 are identical across multiple objects, which can easily be done like so:
array.uniq! { |object| [object.foo, object.bar1, object.bar2] }
However, there is an additional edge case where I must also eliminate one of the following objects, which I don't know how to solve:
{foo: 1, bar1: 3 bar2: 2,...}
{foo: 1, bar1: 2 bar2: 3,...}
Specifically, bar1 and bar2 may be switched in some of the data, and I want to only have unique results where those two are collectively the same pair. (2, 3 should be considered redundant as 3, 2).

After fully writing up this question I realized I had an answer, but I'm not sure how ideal it is. I simply combined the two interchangeable variables into a single array and then sorted them, which guarantees that they will always be identical even if they two values are switched:
array.uniq! { |object| [ object.foo, [object.bar1, object.bar2].sort ] }
I'd love to know if anyone has better solutions.
Also, unsurprisingly, inserting a uniq! method into a large sorting action is causing some performance issues, so I'm exploring ways to further optimize it by adding additional filters etc. This is all for a cache for an API endpoint.

Since you have special equality rules, it seems like the most performant solution would be to override the Object#hash and Object#eql? functions as these are what is used by Array#uniq. If you have millions of records this may well be necessary for adequate performance.
require 'pp'
class MyHash < Hash
def hash
# Note that the XOR operator is commutative, so the three values
# can be in any order and still output the same hash.
self[:foo].hash ^ self[:bar1].hash ^ self[:bar2].hash
end
def eql?(other)
# I think this is a bit ugly, and welcome suggestions for better
# performance and readability.
self[:foo] == other[:foo] && (
self[:bar1] == other[:bar1] && self[:bar2] == other[:bar2]
) || (
self[:bar1] == other[:bar2] && self[:bar2] == other[:bar1]
)
end
end
a = MyHash[foo: 10, bar1: 2, bar2: 3, ignored: 'a']
b = MyHash[foo: 10, bar1: 3, bar2: 2, ignored: 'b']
c = MyHash[foo: 20, bar1: 2, bar2: 3, ignored: 'c']
d = MyHash[foo: 20, bar1: 3, bar2: 2, ignored: 'd']
e = MyHash[foo: 2, bar1: 20, bar2: 3, ignored: 'e']
f = MyHash[foo: 3, bar1: 2, bar2: 20, ignored: 'f']
puts a.hash #=> 3556565295874809176
puts b.hash #=> 3556565295874809176
puts c.hash #=> 2914353897173641784
puts d.hash #=> 2914353897173641784
puts e.hash #=> 2914353897173641784
puts f.hash #=> 2914353897173641784
array = [a, b, c, d, e, f]
pp array #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>10, :bar1=>3, :bar2=>2, :ignored=>"b"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>20, :bar1=>3, :bar2=>2, :ignored=>"d"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
pp array.uniq #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
If you just have thousands of records then the solution you proposed should be completely fine.
array.uniq! { |object| [ object[:foo], [object[:bar1], object[:bar2]].sort ] }

Related

How to stop shift from affecting other instances of an array

I need to shift through an array and keep a copy of the original array for future.
I tried creating another variable using a = b, but both are affected when I shift a.
rb(main):001:0> a = [1,2,3,4,5]
# => [1, 2, 3, 4, 5]
irb(main):002:0> b = a
# => [1, 2, 3, 4, 5]
irb(main):003:0> c = a.shift
# => 1
irb(main):004:0> a
# => [2, 3, 4, 5]
irb(main):005:0> b
# => [2, 3, 4, 5]
irb(main):006:0> c
# => 1
Is there a way to keep this from happening?
In Ruby it's important to remember variables are object references which behave a lot like pointers, so b = a does not make a copy, it is another reference to the same object.
To make a copy you must be explicit and use dup or clone to achieve this:
b = a.dup
If you're ever confused by Ruby's behaviour, stop and look at the objects you're dealing with:
a = [ 1 ]
b = a
a.object_id == b.object_id
# => true
They're exactly the same object, but when cloned:
b = a.dup
a.object_id == b.object_id
# => false
Now they're independent, at least on the top-level.
Note that this comes with some caveats, as this is only a shallow copy:
a = [ [ 1 ] ]
b = a.dup
b[0].object_id == a[0].object_id
# => true
This is where deep_clone tools come in handy if you need a complete clone, something available from various gems but most popularly ActiveSupport from Rails.
One thing you'll find in Ruby is it tends to steer towards a more functional style, as in if you wanted to strip an element from a and avoid mangling b:
a = [ 1, 2, 3, 4, 5 ]
b = a
a = a.drop(1)
# => [2, 3, 4, 5]
Where drop skips over the first N entries and returns the rest as a copy:
b
# => [1, 2, 3, 4, 5]

Create a Hash from two arrays of different sizes and iterate until none of the keys are empty

Having two arrays of different sizes, I'd like to get the longer array as keys and the shorter one as values. However, I don't want any keys to remain empty, so that is why I need to keep iterating on the shorter array until all keys have a value.
EDIT: I want to keep array longer intact, but without empty values, that means keep iterating on shorter until all keys have a value.
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
Hash[longer.zip(shorter)]
#=> {1=>"a", 2=>"b", 3=>"c", 4=>nil, 5=>nil, 6=>nil, 7=>nil}
Expected Result
#=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
Here's an elegant one. You can "loop" the short array
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
longer.zip(shorter.cycle).to_h # => {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
A crude way until you find something more elegant:
Slice the longer array as per length of shorter one, and iterate over it to re-map the values.
mapped = longer.each_slice(shorter.length).to_a.map do |slice|
Hash[slice.zip(shorter)]
end
=> [{1=>"a", 2=>"b", 3=>"c"}, {4=>"a", 5=>"b", 6=>"c"}, {7=>"a"}]
Merge all hashes withing the mapped array into a single hash
final = mapped.reduce Hash.new, :merge
=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
Here's a fun answer.
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
h = Hash.new do |h,k|
idx = longer.index(k)
idx ? shorter[idx % shorter.size] : nil
end
#=> {}
h[1] #=> a
h[2] #=> b
h[3] #=> c
h[4] #=> a
h[5] #=> b
h[6] #=> c
h[7] #=> a
h[8] #=> nil
h #=> {}
h.values_at(3,5) #=> ["c", "b"]
If this is not good enough (e.g., if you wish to use Hash methods such as keys, key?, merge, to_a and so on), you could create the associated hash quite easily:
longer.each { |n| h[n] = h[n] }
h #=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}

Confusion with splat operator and Range in Ruby

I was trying to see how splat operator worked with range in Ruby. To do so ran the below code in my IRB:
*a = (1..8)
#=> 1..8
When the above is fine, what happened with below? means why a gives []?
*a,b = (1..8)
#=> 1..8
b
#=> 1..8
a
#=> []
means why b gives []?
a,*b = (1..8)
#=> 1..8
a
#=> 1..8
b
#=> []
What precedence took place in the below Rvalues ?
a,*b = *(2..8),*3,*5
# => [2, 3, 4, 5, 6, 7, 8, 3, 5]
b
# => [3, 4, 5, 6, 7, 8, 3, 5]
a
# => 2
Here is another try to the splat operator(*) :-
While I know that in parallel assignment we couldn't use multiple splatted variable, but why not the same when splat is used with Rvalues?
*a,*b = [1,2,3,4,5]
SyntaxError: (irb):1: syntax error, unexpected tSTAR
*a,*b = [1,2,3,4,5]
^
from /usr/bin/irb:12:in `<main>'
The above is as expected.
a = *2,*3,*5
#=> [2, 3, 5]
But couldn't understand the above.
I think of parallel assignment as setting an array of variables equal to another array with pattern matching.
One point is that a range is a single value until you convert it to an array or splat it. For instance [1..5] which is a one element array of the range 1..5 and not [1,2,3,4,5]. To get the array of ints you need to do (1..5).to_a or [*(1..5)]
The first one i think is the trickiest. If the splatted var is assigned to one element, the var itself must be a one-element array:
*a = 5
a
# => [ 5 ]
For the next two, splat takes 0 or more not already assigned values into an array. So the following makes sense:
*a, b = (1..8)
is like
*a, b = "hey"
which is like
*a, b = [ "hey" ]
so *a is [] and b is "hey" and by the same logic that if *a is nothing, a must be an empty array. Same idea for
a, *b = (1..5)
For the next one, the range is splatted, so the assignment makes a lot of sense again:
[*(2..4), 9, 5]
# => [2, 3, 4, 9, 5]
And parallel assignment with a splat again. Next one is similar:
[*3, *4, *5]
# => [3, 4, 5]
So that's like
a = 3, 4, 5
which is like
a = [3, 4, 5]
splat has a very low precedence, almost anything will be executed earlier than the splat.
The code is splatting but the result is thrown away: b = *a = (1..8); p b #=> [1, 2, 3, 4, 5, 6, 7, 8]

Basic Array Iteration in Ruby

What's a better way to traverse an array while iterating through another array? For example, if I have two arrays like the following:
names = [ "Rover", "Fido", "Lassie", "Calypso"]
breeds = [ "Terrier", "Lhasa Apso", "Collie", "Bulldog"]
Assuming the arrays correspond with one another - that is, Rover is a Terrier, Fido is a Lhasa Apso, etc. - I'd like to create a dog class, and a new dog object for each item:
class Dog
attr_reader :name, :breed
def initialize(name, breed)
#name = name
#breed = breed
end
end
I can iterate through names and breeds with the following:
index = 0
names.each do |name|
Dog.new("#{name}", "#{breeds[index]}")
index = index.next
end
However, I get the feeling that using the index variable is the wrong way to go about it. What would be a better way?
dogs = names.zip(breeds).map { |name, breed| Dog.new(name, breed) }
Array#zip interleaves the target array with elements of the arguments, so
irb> [1, 2, 3].zip(['a', 'b', 'c'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
You can use arrays of different lengths (in which case the target array determines the length of the resulting array, with the extra entries filled in with nil).
irb> [1, 2, 3, 4, 5].zip(['a', 'b', 'c'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'], [4, nil], [5, nil] ]
irb> [1, 2, 3].zip(['a', 'b', 'c', 'd', 'e'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
You can also zip more than two arrays together:
irb> [1,2,3].zip(['a', 'b', 'c'], [:alpha, :beta, :gamma])
#=> [ [1, 'a', :alpha], [2, 'b', :beta], [3, 'c', :gamma] ]
Array#map is a great way to transform an array, since it returns an array where each entry is the result of running the block on the corresponding entry in the target array.
irb> [1,2,3].map { |n| 10 - n }
#=> [ 9, 8, 7 ]
When using iterators over arrays of arrays, if you give a multiple parameter block, the array entries will be automatically broken into those parameters:
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each { |array| p array }
[ 1, 'a' ]
[ 2, 'b' ]
[ 3, 'c' ]
#=> nil
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each do |num, char|
...> puts "number: #{num}, character: #{char}"
...> end
number 1, character: a
number 2, character: b
number 3, character: c
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
Like Matt Briggs mentioned, #each_with_index is another good tool to know about. It iterates through the elements of an array, passing a block each element in turn.
irb> ['a', 'b', 'c'].each_with_index do |char, index|
...> puts "character #{char} at index #{index}"
...> end
character a at index 0
character b at index 1
character c at index 2
#=> [ 'a', 'b', 'c' ]
When using an iterator like #each_with_index you can use parentheses to break up array elements into their constituent parts:
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each_with_index do |(num, char), index|
...> puts "number: #{num}, character: #{char} at index #{index}"
...> end
number 1, character: a at index 0
number 2, character: b at index 1
number 3, character: c at index 2
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
each_with_index leaps to mind, it is a better way to do it the way you are doing it. rampion has a better overall answer though, this situation is what zip is for.
This is adapted from Flanagan and Matz, "The Ruby Programming Language", 5.3.5 "External Iterators", Example 5-1, p. 139:
++++++++++++++++++++++++++++++++++++++++++
require 'enumerator' # needed for Ruby 1.8
names = ["Rover", "Fido", "Lassie", "Calypso"]
breeds = ["Terrier", "Lhasa Apso", "Collie", "Bulldog"]
class Dog
attr_reader :name, :breed
def initialize(name, breed)
#name = name
#breed = breed
end
end
def bundle(*enumerables)
enumerators = enumerables.map {|e| e.to_enum}
loop {yield enumerators.map {|e| e.next} }
end
bundle(names, breeds) {|x| p Dog.new(*x) }
+++++++++++++++++++++++++++++++++++++++++++
Output:
#<Dog:0x10014b648 #name="Rover", #breed="Terrier">
#<Dog:0x10014b0d0 #name="Fido", #breed="Lhasa Apso">
#<Dog:0x10014ab80 #name="Lassie", #breed="Collie">
#<Dog:0x10014a770 #name="Calypso", #breed="Bulldog">
which I think is what we wanted!
As well as each_with_index (mentioned by Matt), there's each_index. I sometimes use this because it makes the program more symmetrical, and therefore wrong code will look wrong.
names.each_index do |i|
name, breed = dogs[i], breeds[i] #Can also use dogs.fetch(i) if you want to fail fast
Dog.new(name, breed)
end

How can I check if a Ruby array includes one of several values?

I have two Ruby arrays, and I need to see if they have any values in common. I could just loop through each of the values in one array and do include?() on the other, but I'm sure there's a better way. What is it? (The arrays both hold strings.)
Thanks.
Set intersect them:
a1 & a2
Here's an example:
> a1 = [ 'foo', 'bar' ]
> a2 = [ 'bar', 'baz' ]
> a1 & a2
=> ["bar"]
> !(a1 & a2).empty? # Returns true if there are any elements in common
=> true
Any value in common ? you can use the intersection operator : &
[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]
If you are looking for a full intersection however (with duplicates) the problem is more complex there is already a stack overflow here : How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)
Or a quick snippet which defines "real_intersection" and validates the following test
class ArrayIntersectionTests < Test::Unit::TestCase
def test_real_array_intersection
assert_equal [2], [2, 2, 2, 3, 7, 13, 49] & [2, 2, 2, 5, 11, 107]
assert_equal [2, 2, 2], [2, 2, 2, 3, 7, 13, 49].real_intersection([2, 2, 2, 5, 11, 107])
assert_equal ['a', 'c'], ['a', 'b', 'a', 'c'] & ['a', 'c', 'a', 'd']
assert_equal ['a', 'a', 'c'], ['a', 'b', 'a', 'c'].real_intersection(['a', 'c', 'a', 'd'])
end
end
Using intersection looks nice, but it is inefficient. I would use "any?" on the first array (so that iteration stops when one of the elements is found in the second array). Also, using a Set on the second array will make membership checks fast. i.e.:
a = [:a, :b, :c, :d]
b = Set.new([:c, :d, :e, :f])
c = [:a, :b, :g, :h]
# Do a and b have at least a common value?
a.any? {|item| b.include? item}
# true
# Do c and b have at least a common value?
c.any? {|item| b.include? item}
#false
Array#intersect? (Ruby 3.1+)
Starting from Ruby 3.1, there is a new Array#intersect? method,
which checks whether two arrays have at least one element in common.
Here is an example:
a = [1, 2, 3]
b = [3, 4, 5]
c = [7, 8, 9]
# 3 is the common element
a.intersect?(b)
# => true
# No common elements
a.intersect?(c)
# => false
Also, Array#intersect? can be much faster than alternatives since it avoids creating an intermediate array, returns true as soon as it finds a common element, it is implemented in C.
Sources:
Ruby 3.1 adds Array#intersect?.
Pull Request.
Discussion.
Source code.
Try this
a1 = [ 'foo', 'bar' ]
a2 = [ 'bar', 'baz' ]
a1-a2 != a1
true

Resources