How can I check if a Ruby array includes one of several values? - ruby

I have two Ruby arrays, and I need to see if they have any values in common. I could just loop through each of the values in one array and do include?() on the other, but I'm sure there's a better way. What is it? (The arrays both hold strings.)
Thanks.

Set intersect them:
a1 & a2
Here's an example:
> a1 = [ 'foo', 'bar' ]
> a2 = [ 'bar', 'baz' ]
> a1 & a2
=> ["bar"]
> !(a1 & a2).empty? # Returns true if there are any elements in common
=> true

Any value in common ? you can use the intersection operator : &
[ 1, 1, 3, 5 ] & [ 1, 2, 3 ] #=> [ 1, 3 ]
If you are looking for a full intersection however (with duplicates) the problem is more complex there is already a stack overflow here : How to return a Ruby array intersection with duplicate elements? (problem with bigrams in Dice Coefficient)
Or a quick snippet which defines "real_intersection" and validates the following test
class ArrayIntersectionTests < Test::Unit::TestCase
def test_real_array_intersection
assert_equal [2], [2, 2, 2, 3, 7, 13, 49] & [2, 2, 2, 5, 11, 107]
assert_equal [2, 2, 2], [2, 2, 2, 3, 7, 13, 49].real_intersection([2, 2, 2, 5, 11, 107])
assert_equal ['a', 'c'], ['a', 'b', 'a', 'c'] & ['a', 'c', 'a', 'd']
assert_equal ['a', 'a', 'c'], ['a', 'b', 'a', 'c'].real_intersection(['a', 'c', 'a', 'd'])
end
end

Using intersection looks nice, but it is inefficient. I would use "any?" on the first array (so that iteration stops when one of the elements is found in the second array). Also, using a Set on the second array will make membership checks fast. i.e.:
a = [:a, :b, :c, :d]
b = Set.new([:c, :d, :e, :f])
c = [:a, :b, :g, :h]
# Do a and b have at least a common value?
a.any? {|item| b.include? item}
# true
# Do c and b have at least a common value?
c.any? {|item| b.include? item}
#false

Array#intersect? (Ruby 3.1+)
Starting from Ruby 3.1, there is a new Array#intersect? method,
which checks whether two arrays have at least one element in common.
Here is an example:
a = [1, 2, 3]
b = [3, 4, 5]
c = [7, 8, 9]
# 3 is the common element
a.intersect?(b)
# => true
# No common elements
a.intersect?(c)
# => false
Also, Array#intersect? can be much faster than alternatives since it avoids creating an intermediate array, returns true as soon as it finds a common element, it is implemented in C.
Sources:
Ruby 3.1 adds Array#intersect?.
Pull Request.
Discussion.
Source code.

Try this
a1 = [ 'foo', 'bar' ]
a2 = [ 'bar', 'baz' ]
a1-a2 != a1
true

Related

Best practice for writing complex, three-part, interchangeable "uniq" ruby block

I have an array of hashes:
array = [
{foo: 1, bar1: 2 bar2: 3, bar3: 4},
{foo: 2, bar1: 3 bar2: 4, bar3: 5},
{foo: 3, bar1: 4 bar2: 5, bar4: 6},
etc
]
I want to eliminate some redundant results from this array. Specifically, I want to eliminate any results where foo, bar1, and bar2 are identical across multiple objects, which can easily be done like so:
array.uniq! { |object| [object.foo, object.bar1, object.bar2] }
However, there is an additional edge case where I must also eliminate one of the following objects, which I don't know how to solve:
{foo: 1, bar1: 3 bar2: 2,...}
{foo: 1, bar1: 2 bar2: 3,...}
Specifically, bar1 and bar2 may be switched in some of the data, and I want to only have unique results where those two are collectively the same pair. (2, 3 should be considered redundant as 3, 2).
After fully writing up this question I realized I had an answer, but I'm not sure how ideal it is. I simply combined the two interchangeable variables into a single array and then sorted them, which guarantees that they will always be identical even if they two values are switched:
array.uniq! { |object| [ object.foo, [object.bar1, object.bar2].sort ] }
I'd love to know if anyone has better solutions.
Also, unsurprisingly, inserting a uniq! method into a large sorting action is causing some performance issues, so I'm exploring ways to further optimize it by adding additional filters etc. This is all for a cache for an API endpoint.
Since you have special equality rules, it seems like the most performant solution would be to override the Object#hash and Object#eql? functions as these are what is used by Array#uniq. If you have millions of records this may well be necessary for adequate performance.
require 'pp'
class MyHash < Hash
def hash
# Note that the XOR operator is commutative, so the three values
# can be in any order and still output the same hash.
self[:foo].hash ^ self[:bar1].hash ^ self[:bar2].hash
end
def eql?(other)
# I think this is a bit ugly, and welcome suggestions for better
# performance and readability.
self[:foo] == other[:foo] && (
self[:bar1] == other[:bar1] && self[:bar2] == other[:bar2]
) || (
self[:bar1] == other[:bar2] && self[:bar2] == other[:bar1]
)
end
end
a = MyHash[foo: 10, bar1: 2, bar2: 3, ignored: 'a']
b = MyHash[foo: 10, bar1: 3, bar2: 2, ignored: 'b']
c = MyHash[foo: 20, bar1: 2, bar2: 3, ignored: 'c']
d = MyHash[foo: 20, bar1: 3, bar2: 2, ignored: 'd']
e = MyHash[foo: 2, bar1: 20, bar2: 3, ignored: 'e']
f = MyHash[foo: 3, bar1: 2, bar2: 20, ignored: 'f']
puts a.hash #=> 3556565295874809176
puts b.hash #=> 3556565295874809176
puts c.hash #=> 2914353897173641784
puts d.hash #=> 2914353897173641784
puts e.hash #=> 2914353897173641784
puts f.hash #=> 2914353897173641784
array = [a, b, c, d, e, f]
pp array #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>10, :bar1=>3, :bar2=>2, :ignored=>"b"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>20, :bar1=>3, :bar2=>2, :ignored=>"d"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
pp array.uniq #=> [{:foo=>10, :bar1=>2, :bar2=>3, :ignored=>"a"},
# {:foo=>20, :bar1=>2, :bar2=>3, :ignored=>"c"},
# {:foo=>2, :bar1=>20, :bar2=>3, :ignored=>"e"},
# {:foo=>3, :bar1=>2, :bar2=>20, :ignored=>"f"}]
If you just have thousands of records then the solution you proposed should be completely fine.
array.uniq! { |object| [ object[:foo], [object[:bar1], object[:bar2]].sort ] }

Create a Hash from two arrays of different sizes and iterate until none of the keys are empty

Having two arrays of different sizes, I'd like to get the longer array as keys and the shorter one as values. However, I don't want any keys to remain empty, so that is why I need to keep iterating on the shorter array until all keys have a value.
EDIT: I want to keep array longer intact, but without empty values, that means keep iterating on shorter until all keys have a value.
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
Hash[longer.zip(shorter)]
#=> {1=>"a", 2=>"b", 3=>"c", 4=>nil, 5=>nil, 6=>nil, 7=>nil}
Expected Result
#=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
Here's an elegant one. You can "loop" the short array
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
longer.zip(shorter.cycle).to_h # => {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
A crude way until you find something more elegant:
Slice the longer array as per length of shorter one, and iterate over it to re-map the values.
mapped = longer.each_slice(shorter.length).to_a.map do |slice|
Hash[slice.zip(shorter)]
end
=> [{1=>"a", 2=>"b", 3=>"c"}, {4=>"a", 5=>"b", 6=>"c"}, {7=>"a"}]
Merge all hashes withing the mapped array into a single hash
final = mapped.reduce Hash.new, :merge
=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}
Here's a fun answer.
longer = [1, 2, 3, 4, 5, 6, 7]
shorter = ['a', 'b', 'c']
h = Hash.new do |h,k|
idx = longer.index(k)
idx ? shorter[idx % shorter.size] : nil
end
#=> {}
h[1] #=> a
h[2] #=> b
h[3] #=> c
h[4] #=> a
h[5] #=> b
h[6] #=> c
h[7] #=> a
h[8] #=> nil
h #=> {}
h.values_at(3,5) #=> ["c", "b"]
If this is not good enough (e.g., if you wish to use Hash methods such as keys, key?, merge, to_a and so on), you could create the associated hash quite easily:
longer.each { |n| h[n] = h[n] }
h #=> {1=>"a", 2=>"b", 3=>"c", 4=>"a", 5=>"b", 6=>"c", 7=>"a"}

Three ways to create a range, hash, array in ruby

I am doing a tutorial course on ruby and it asks for 3 ways to create range, hash, array.
I can only think of 2: (1..3) and Range.new(1,3) (and similarly for hash and array).
What is the third way?
The tutorial in question is The Odin Project
Ranges may be constructed using the s..e and s...e literals, or with ::new.
Ranges constructed using .. run from the beginning to the end inclusively.
Those created using ... exclude the end value. When used as an iterator, ranges return each value in the sequence.
(0..2) == (0..2) #=> true
(0..2) == Range.new(0,2) #=> true
(0..2) == (0...2) #=> false
Read More Here
For Arrays there's Array::[] (example taken directly from the docs):
Array.[]( 1, 'a', /^A/ ) # => [1, "a", /^A/]
Array[ 1, 'a', /^A/ ] # => [1, "a", /^A/]
[ 1, 'a', /^A/ ] # => [1, "a", /^A/]
Similarly there's Hash::[]. Not sure about Ranges; in fact, the docs (as far as I can tell) only mention literals and Range::new.
I can't see why you'd use these over a literal, but there you go.
You can also make a exclusive range, using (1...4), which if turned into an array would become [1, 2, 3]
(1..3) is an inclusive range, so it contains all numbers, from 1 to 3, but if you used (1...3), having 3 dots instead of 2 makes it exclusive, so it contains all numbers from 1, up to but not including 3.
As for arrays and hashes, #to_a, Array#[], #to_h, and Hash#[] will work.
(1..3).to_a
=> [1, 2, 3]
Array[1, 2, 3]
=> [1, 2, 3]
[[1, 2], [3, 4], [5, 6]].to_h
=> {1=>2, 3=>4, 5=>6}
Hash[ [[1, 2], [3, 4], [5, 6]] ]
=> {1=>2, 3=>4, 5=>6}
But they are probably looking for Array#[] and Hash#[] on the array and hash part.

Ruby each and collect change array of arrays

I expected that Array.each and Array.collect would never change an object, like in this example:
a = [1, 2, 3]
a.each { |x| x = 5 }
a #output => [1, 2, 3]
But this doesn't seem to be the case when you are working with an array of arrays or an array of hashes:
a = [[1, 2, 3], [10, 20], ["a"]]
a.each { |x| x[0]=5 }
a #output => [[5, 2, 3], [5, 20], [5]]
Is this behaviour expected or am I doing something wrong?
Doesn't this make ruby behaviour a little unexpected? For example, in C++ if a function argument is declared const, one can be confident the function won't mess with it (ok, it can be mutable, but you got the point).
a = [[1, 2, 3], [10, 20], ["a"]]
a.each { |x| x[0]=5 }
In the above example, x is an array ( which you are passing to the block in each iteration ), from which you are accessing an element from its 0th index, and updating it. As array is mutable object, it also updating. Here a is an array of array.
In 1st iteration x is [1, 2, 3]. Now you are calling, Array#[]= method to update the 0th element of [1, 2, 3].
In 2nd iteration x is [10, 20]. same as above.
..and so on.. Thus after #each has completed its iterations, you got modified a.
a = [1, 2, 3]
a.each { |x| x = 5 }
In the above example, you are passing the array element to the each block, which are Fixnum object, and not mutable also. Here a ia an array of elements, and you are just accessing those elements.
update ( to clear OP's comment )
a = [[1, 2, 3], [10, 20], ["a"]]
a.each do |x|
# here x is holding the object from the source array `a`.
x # => [1, 2, 3]
x.object_id # => 72635790
# here you assgined a new array object, which has the same content as the
# inner array element [1, 2, 3]. But strictly these are 2 different object. Check
# out the object_id of those two.
x = [1, 2, 3]
x # => [1, 2, 3]
x.object_id # => 72635250
break # used break to stop iteration after 1st one.
end
Using each or map does not change the array itself. But is might look like it changes elements in the array. In fact when a array is holding references to other object, that references are keep unchanged, but the referenced object itself can change. I agree it is surprising when you learn it.
What you noticed:
a = ['a', 'b', 'c']
a.each { |x| x[0] = 'x' }
puts a # => ['x', 'x', 'x']
Here the first array element still references the same string, but the string has change.
Why it is important to understand this references?
array = ['a', 'b', 'c']
a = array
b = array
puts b # => ['a', 'b', 'c']
a[0] = 'x'
puts b # => ['x', 'b', 'c']
Does freeze protect us from changes?
a = ['a', 'b', 'c'].freeze
a << ['d'] # throws 'can't modify frozen Array (RuntimeError)'
Seems so. But again only for the array itself. It does not deep freeze the array.
a[0][0] = 'x'
puts a.inspect ['x', 'b', 'c']
I suggest the read about topics like referenced objects, pointers, call by value vs. call by reference.

Basic Array Iteration in Ruby

What's a better way to traverse an array while iterating through another array? For example, if I have two arrays like the following:
names = [ "Rover", "Fido", "Lassie", "Calypso"]
breeds = [ "Terrier", "Lhasa Apso", "Collie", "Bulldog"]
Assuming the arrays correspond with one another - that is, Rover is a Terrier, Fido is a Lhasa Apso, etc. - I'd like to create a dog class, and a new dog object for each item:
class Dog
attr_reader :name, :breed
def initialize(name, breed)
#name = name
#breed = breed
end
end
I can iterate through names and breeds with the following:
index = 0
names.each do |name|
Dog.new("#{name}", "#{breeds[index]}")
index = index.next
end
However, I get the feeling that using the index variable is the wrong way to go about it. What would be a better way?
dogs = names.zip(breeds).map { |name, breed| Dog.new(name, breed) }
Array#zip interleaves the target array with elements of the arguments, so
irb> [1, 2, 3].zip(['a', 'b', 'c'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
You can use arrays of different lengths (in which case the target array determines the length of the resulting array, with the extra entries filled in with nil).
irb> [1, 2, 3, 4, 5].zip(['a', 'b', 'c'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'], [4, nil], [5, nil] ]
irb> [1, 2, 3].zip(['a', 'b', 'c', 'd', 'e'])
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
You can also zip more than two arrays together:
irb> [1,2,3].zip(['a', 'b', 'c'], [:alpha, :beta, :gamma])
#=> [ [1, 'a', :alpha], [2, 'b', :beta], [3, 'c', :gamma] ]
Array#map is a great way to transform an array, since it returns an array where each entry is the result of running the block on the corresponding entry in the target array.
irb> [1,2,3].map { |n| 10 - n }
#=> [ 9, 8, 7 ]
When using iterators over arrays of arrays, if you give a multiple parameter block, the array entries will be automatically broken into those parameters:
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each { |array| p array }
[ 1, 'a' ]
[ 2, 'b' ]
[ 3, 'c' ]
#=> nil
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each do |num, char|
...> puts "number: #{num}, character: #{char}"
...> end
number 1, character: a
number 2, character: b
number 3, character: c
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
Like Matt Briggs mentioned, #each_with_index is another good tool to know about. It iterates through the elements of an array, passing a block each element in turn.
irb> ['a', 'b', 'c'].each_with_index do |char, index|
...> puts "character #{char} at index #{index}"
...> end
character a at index 0
character b at index 1
character c at index 2
#=> [ 'a', 'b', 'c' ]
When using an iterator like #each_with_index you can use parentheses to break up array elements into their constituent parts:
irb> [ [1, 'a'], [2, 'b'], [3, 'c'] ].each_with_index do |(num, char), index|
...> puts "number: #{num}, character: #{char} at index #{index}"
...> end
number 1, character: a at index 0
number 2, character: b at index 1
number 3, character: c at index 2
#=> [ [1, 'a'], [2, 'b'], [3, 'c'] ]
each_with_index leaps to mind, it is a better way to do it the way you are doing it. rampion has a better overall answer though, this situation is what zip is for.
This is adapted from Flanagan and Matz, "The Ruby Programming Language", 5.3.5 "External Iterators", Example 5-1, p. 139:
++++++++++++++++++++++++++++++++++++++++++
require 'enumerator' # needed for Ruby 1.8
names = ["Rover", "Fido", "Lassie", "Calypso"]
breeds = ["Terrier", "Lhasa Apso", "Collie", "Bulldog"]
class Dog
attr_reader :name, :breed
def initialize(name, breed)
#name = name
#breed = breed
end
end
def bundle(*enumerables)
enumerators = enumerables.map {|e| e.to_enum}
loop {yield enumerators.map {|e| e.next} }
end
bundle(names, breeds) {|x| p Dog.new(*x) }
+++++++++++++++++++++++++++++++++++++++++++
Output:
#<Dog:0x10014b648 #name="Rover", #breed="Terrier">
#<Dog:0x10014b0d0 #name="Fido", #breed="Lhasa Apso">
#<Dog:0x10014ab80 #name="Lassie", #breed="Collie">
#<Dog:0x10014a770 #name="Calypso", #breed="Bulldog">
which I think is what we wanted!
As well as each_with_index (mentioned by Matt), there's each_index. I sometimes use this because it makes the program more symmetrical, and therefore wrong code will look wrong.
names.each_index do |i|
name, breed = dogs[i], breeds[i] #Can also use dogs.fetch(i) if you want to fail fast
Dog.new(name, breed)
end

Resources