I have an array of hashes:
array = [
{foo: 1, bar1: 2 bar2: 3, bar3: 4},
{foo: 2, bar1: 3 bar2: 4, bar3: 5},
{foo: 3, bar1: 4 bar2: 5, bar4: 6},
etc
]
I want to eliminate some redundant results from this array. Specifically, I want to eliminate any results where foo, bar1, and bar2 are identical across multiple objects, which can easily be done like so:
array.uniq! { |object| [object.foo, object.bar1, object.bar2] }
However, there is an additional edge case where I must also eliminate one of the following objects, which I don't know how to solve:
{foo: 1, bar1: 3 bar2: 2,...}
{foo: 1, bar1: 2 bar2: 3,...}
Specifically, bar1 and bar2 may be switched in some of the data, and I want to only have unique results where those two are collectively the same pair. (2, 3 should be considered redundant as 3, 2).
After fully writing up this question I realized I had an answer, but I'm not sure how ideal it is. I simply combined the two interchangeable variables into a single array and then sorted them, which guarantees that they will always be identical even if they two values are switched:
array.uniq! { |object| [ object.foo, [object.bar1, object.bar2].sort ] }
I'd love to know if anyone has better solutions.
Also, unsurprisingly, inserting a uniq! method into a large sorting action is causing some performance issues, so I'm exploring ways to further optimize it by adding additional filters etc. This is all for a cache for an API endpoint.
Since you have special equality rules, it seems like the most performant solution would be to override the Object#hash and Object#eql? functions as these are what is used by Array#uniq. If you have millions of records this may well be necessary for adequate performance.
require 'pp'
class MyHash < Hash
def hash
# Note that the XOR operator is commutative, so the three values
# can be in any order and still output the same hash.
self[:foo].hash ^ self[:bar1].hash ^ self[:bar2].hash
end
def eql?(other)
# I think this is a bit ugly, and welcome suggestions for better
# performance and readability.
self[:foo] == other[:foo] && (
self[:bar1] == other[:bar1] && self[:bar2] == other[:bar2]
) || (
self[:bar1] == other[:bar2] && self[:bar2] == other[:bar1]
)
end
end
a = MyHash[foo: 10, bar1: 2, bar2: 3, ignored: 'a']
b = MyHash[foo: 10, bar1: 3, bar2: 2, ignored: 'b']
c = MyHash[foo: 20, bar1: 2, bar2: 3, ignored: 'c']
d = MyHash[foo: 20, bar1: 3, bar2: 2, ignored: 'd']
e = MyHash[foo: 2, bar1: 20, bar2: 3, ignored: 'e']
f = MyHash[foo: 3, bar1: 2, bar2: 20, ignored: 'f']
puts a.hash #=> 3556565295874809176
puts b.hash #=> 3556565295874809176
puts c.hash #=> 2914353897173641784
puts d.hash #=> 2914353897173641784
puts e.hash #=> 2914353897173641784
puts f.hash #=> 2914353897173641784
array = [a, b, c, d, e, f]
pp array #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>10, :bar1=>3, :bar2=>2, :ignored=>"b"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>20, :bar1=>3, :bar2=>2, :ignored=>"d"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
pp array.uniq #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
If you just have thousands of records then the solution you proposed should be completely fine.
array.uniq! { |object| [ object[:foo], [object[:bar1], object[:bar2]].sort ] }
Related
I need to shift through an array and keep a copy of the original array for future.
I tried creating another variable using a = b, but both are affected when I shift a.
rb(main):001:0> a = [1,2,3,4,5]
# => [1, 2, 3, 4, 5]
irb(main):002:0> b = a
# => [1, 2, 3, 4, 5]
irb(main):003:0> c = a.shift
# => 1
irb(main):004:0> a
# => [2, 3, 4, 5]
irb(main):005:0> b
# => [2, 3, 4, 5]
irb(main):006:0> c
# => 1
Is there a way to keep this from happening?
In Ruby it's important to remember variables are object references which behave a lot like pointers, so b = a does not make a copy, it is another reference to the same object.
To make a copy you must be explicit and use dup or clone to achieve this:
b = a.dup
If you're ever confused by Ruby's behaviour, stop and look at the objects you're dealing with:
a = [ 1 ]
b = a
a.object_id == b.object_id
# => true
They're exactly the same object, but when cloned:
b = a.dup
a.object_id == b.object_id
# => false
Now they're independent, at least on the top-level.
Note that this comes with some caveats, as this is only a shallow copy:
a = [ [ 1 ] ]
b = a.dup
b[0].object_id == a[0].object_id
# => true
This is where deep_clone tools come in handy if you need a complete clone, something available from various gems but most popularly ActiveSupport from Rails.
One thing you'll find in Ruby is it tends to steer towards a more functional style, as in if you wanted to strip an element from a and avoid mangling b:
a = [ 1, 2, 3, 4, 5 ]
b = a
a = a.drop(1)
# => [2, 3, 4, 5]
Where drop skips over the first N entries and returns the rest as a copy:
b
# => [1, 2, 3, 4, 5]
Having two arrays of different sizes, I'd like to get the longer array as keys and the shorter one as values. However, I don't want any keys to remain empty, so that is why I need to keep iterating on the shorter array until all keys have a value.
EDIT: I want to keep array longer intact, but without empty values, that means keep iterating on shorter until all keys have a value.
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
Hash[longer.zip(shorter)]
#=> {1=>"a", 2=>"b", 3=>"c", 4=>nil, 5=>nil, 6=>nil, 7=>nil}
Expected Result
#=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
Here's an elegant one. You can "loop" the short array
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
longer.zip(shorter.cycle).to_h # => {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
A crude way until you find something more elegant:
Slice the longer array as per length of shorter one, and iterate over it to re-map the values.
mapped = longer.each_slice(shorter.length).to_a.map do |slice|
Hash[slice.zip(shorter)]
end
=> [{1=>"a", 2=>"b", 3=>"c"}, {4=>"a", 5=>"b", 6=>"c"}, {7=>"a"}]
Merge all hashes withing the mapped array into a single hash
final = mapped.reduce Hash.new, :merge
=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
Here's a fun answer.
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
h = Hash.new do |h,k|
idx = longer.index(k)
idx ? shorter[idx % shorter.size] : nil
end
#=> {}
h[1] #=> a
h[2] #=> b
h[3] #=> c
h[4] #=> a
h[5] #=> b
h[6] #=> c
h[7] #=> a
h[8] #=> nil
h #=> {}
h.values_at(3,5) #=> ["c", "b"]
If this is not good enough (e.g., if you wish to use Hash methods such as keys, key?, merge, to_a and so on), you could create the associated hash quite easily:
longer.each { |n| h[n] = h[n] }
h #=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
I was trying to see how splat operator worked with range in Ruby. To do so ran the below code in my IRB:
*a = (1..8)
#=> 1..8
When the above is fine, what happened with below? means why a gives []?
*a,b = (1..8)
#=> 1..8
b
#=> 1..8
a
#=> []
means why b gives []?
a,*b = (1..8)
#=> 1..8
a
#=> 1..8
b
#=> []
What precedence took place in the below Rvalues ?
a,*b = *(2..8),*3,*5
# => [2, 3, 4, 5, 6, 7, 8, 3, 5]
b
# => [3, 4, 5, 6, 7, 8, 3, 5]
a
# => 2
Here is another try to the splat operator(*) :-
While I know that in parallel assignment we couldn't use multiple splatted variable, but why not the same when splat is used with Rvalues?
*a,*b = [1,2,3,4,5]
SyntaxError: (irb):1: syntax error, unexpected tSTAR
*a,*b = [1,2,3,4,5]
^
from /usr/bin/irb:12:in `<main>'
The above is as expected.
a = *2,*3,*5
#=> [2, 3, 5]
But couldn't understand the above.
I think of parallel assignment as setting an array of variables equal to another array with pattern matching.
One point is that a range is a single value until you convert it to an array or splat it. For instance [1..5] which is a one element array of the range 1..5 and not [1,2,3,4,5]. To get the array of ints you need to do (1..5).to_a or [*(1..5)]
The first one i think is the trickiest. If the splatted var is assigned to one element, the var itself must be a one-element array:
*a = 5
a
# => [ 5 ]
For the next two, splat takes 0 or more not already assigned values into an array. So the following makes sense:
*a, b = (1..8)
is like
*a, b = "hey"
which is like
*a, b = [ "hey" ]
so *a is [] and b is "hey" and by the same logic that if *a is nothing, a must be an empty array. Same idea for
a, *b = (1..5)
For the next one, the range is splatted, so the assignment makes a lot of sense again:
[*(2..4), 9, 5]
# => [2, 3, 4, 9, 5]
And parallel assignment with a splat again. Next one is similar:
[*3, *4, *5]
# => [3, 4, 5]
So that's like
a = 3, 4, 5
which is like
a = [3, 4, 5]
splat has a very low precedence, almost anything will be executed earlier than the splat.
The code is splatting but the result is thrown away: b = *a = (1..8); p b #=> [1, 2, 3, 4, 5, 6, 7, 8]
What's a better way to traverse an array while iterating through another array? For example, if I have two arrays like the following:
names = [ "Rover", "Fido", "Lassie", "Calypso"]
breeds = [ "Terrier", "Lhasa Apso", "Collie", "Bulldog"]
Assuming the arrays correspond with one another - that is, Rover is a Terrier, Fido is a Lhasa Apso, etc. - I'd like to create a dog class, and a new dog object for each item:
class Dog
attr_reader :name, :breed
def initialize(name, breed)
#name = name
#breed = breed
end
end
I can iterate through names and breeds with the following:
index = 0
names.each do |name|
Dog.new("#{name}", "#{breeds[index]}")
index = index.next
end
However, I get the feeling that using the index variable is the wrong way to go about it. What would be a better way?
dogs = names.zip(breeds).map { |name, breed| Dog.new(name, breed) }
Array#zip interleaves the target array with elements of the arguments, so
irb> [1, 2, 3].zip(['a', 'b', 'c'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
You can use arrays of different lengths (in which case the target array determines the length of the resulting array, with the extra entries filled in with nil).
irb> [1, 2, 3, 4, 5].zip(['a', 'b', 'c'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'], [4, nil], [5, nil] ]
irb> [1, 2, 3].zip(['a', 'b', 'c', 'd', 'e'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
You can also zip more than two arrays together:
irb> [1,2,3].zip(['a', 'b', 'c'], [:alpha, :beta, :gamma])
#=> [ [1, 'a', :alpha], [2, 'b', :beta], [3, 'c', :gamma] ]
Array#map is a great way to transform an array, since it returns an array where each entry is the result of running the block on the corresponding entry in the target array.
irb> [1,2,3].map { |n| 10 - n }
#=> [ 9, 8, 7 ]
When using iterators over arrays of arrays, if you give a multiple parameter block, the array entries will be automatically broken into those parameters:
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each { |array| p array }
[ 1, 'a' ]
[ 2, 'b' ]
[ 3, 'c' ]
#=> nil
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each do |num, char|
...> puts "number: #{num}, character: #{char}"
...> end
number 1, character: a
number 2, character: b
number 3, character: c
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
Like Matt Briggs mentioned, #each_with_index is another good tool to know about. It iterates through the elements of an array, passing a block each element in turn.
irb> ['a', 'b', 'c'].each_with_index do |char, index|
...> puts "character #{char} at index #{index}"
...> end
character a at index 0
character b at index 1
character c at index 2
#=> [ 'a', 'b', 'c' ]
When using an iterator like #each_with_index you can use parentheses to break up array elements into their constituent parts:
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each_with_index do |(num, char), index|
...> puts "number: #{num}, character: #{char} at index #{index}"
...> end
number 1, character: a at index 0
number 2, character: b at index 1
number 3, character: c at index 2
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
each_with_index leaps to mind, it is a better way to do it the way you are doing it. rampion has a better overall answer though, this situation is what zip is for.
This is adapted from Flanagan and Matz, "The Ruby Programming Language", 5.3.5 "External Iterators", Example 5-1, p. 139:
++++++++++++++++++++++++++++++++++++++++++
require 'enumerator' # needed for Ruby 1.8
names = ["Rover", "Fido", "Lassie", "Calypso"]
breeds = ["Terrier", "Lhasa Apso", "Collie", "Bulldog"]
class Dog
attr_reader :name, :breed
def initialize(name, breed)
#name = name
#breed = breed
end
end
def bundle(*enumerables)
enumerators = enumerables.map {|e| e.to_enum}
loop {yield enumerators.map {|e| e.next} }
end
bundle(names, breeds) {|x| p Dog.new(*x) }
+++++++++++++++++++++++++++++++++++++++++++
Output:
#<Dog:0x10014b648 #name="Rover", #breed="Terrier">
#<Dog:0x10014b0d0 #name="Fido", #breed="Lhasa Apso">
#<Dog:0x10014ab80 #name="Lassie", #breed="Collie">
#<Dog:0x10014a770 #name="Calypso", #breed="Bulldog">
which I think is what we wanted!
As well as each_with_index (mentioned by Matt), there's each_index. I sometimes use this because it makes the program more symmetrical, and therefore wrong code will look wrong.
names.each_index do |i|
name, breed = dogs[i], breeds[i] #Can also use dogs.fetch(i) if you want to fail fast
Dog.new(name, breed)
end
I have two Ruby arrays, and I need to see if they have any values in common. I could just loop through each of the values in one array and do include?() on the other, but I'm sure there's a better way. What is it? (The arrays both hold strings.)
Thanks.
Set intersect them:
a1 & a2
Here's an example:
> a1 = [ 'foo', 'bar' ]
> a2 = [ 'bar', 'baz' ]
> a1 & a2
=> ["bar"]
> !(a1 & a2).empty? # Returns true if there are any elements in common
=> true
Any value in common ? you can use the intersection operator : &
[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]
If you are looking for a full intersection however (with duplicates) the problem is more complex there is already a stack overflow here : How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)
Or a quick snippet which defines "real_intersection" and validates the following test
class ArrayIntersectionTests < Test::Unit::TestCase
def test_real_array_intersection
assert_equal [2], [2, 2, 2, 3, 7, 13, 49] & [2, 2, 2, 5, 11, 107]
assert_equal [2, 2, 2], [2, 2, 2, 3, 7, 13, 49].real_intersection([2, 2, 2, 5, 11, 107])
assert_equal ['a', 'c'], ['a', 'b', 'a', 'c'] & ['a', 'c', 'a', 'd']
assert_equal ['a', 'a', 'c'], ['a', 'b', 'a', 'c'].real_intersection(['a', 'c', 'a', 'd'])
end
end
Using intersection looks nice, but it is inefficient. I would use "any?" on the first array (so that iteration stops when one of the elements is found in the second array). Also, using a Set on the second array will make membership checks fast. i.e.:
a = [:a, :b, :c, :d]
b = Set.new([:c, :d, :e, :f])
c = [:a, :b, :g, :h]
# Do a and b have at least a common value?
a.any? {|item| b.include? item}
# true
# Do c and b have at least a common value?
c.any? {|item| b.include? item}
#false
Array#intersect? (Ruby 3.1+)
Starting from Ruby 3.1, there is a new Array#intersect? method,
which checks whether two arrays have at least one element in common.
Here is an example:
a = [1, 2, 3]
b = [3, 4, 5]
c = [7, 8, 9]
# 3 is the common element
a.intersect?(b)
# => true
# No common elements
a.intersect?(c)
# => false
Also, Array#intersect? can be much faster than alternatives since it avoids creating an intermediate array, returns true as soon as it finds a common element, it is implemented in C.
Sources:
Ruby 3.1 adds Array#intersect?.
Pull Request.
Discussion.
Source code.
Try this
a1 = [ 'foo', 'bar' ]
a2 = [ 'bar', 'baz' ]
a1-a2 != a1
true