Determine if one array contains all elements of another array - ruby

I need to tell if an array contains all of the elements of another array with duplicates.
[1,2,3].contains_all? [1,2] #=> true
[1,2,3].contains_all? [1,2,2] #=> false (this is where (a1-a2).empty? fails)
[2,1,2,3].contains_all? [1,2,2] #=> true
So the first array must contain as many or equal of the number of each unique element in the second array.
This question answers it for those using an array as a set, but I need to control for duplicates.
Update: Benchmarks
On Ruby 1.9.3p194
def bench
puts Benchmark.measure {
10000.times do
[1,2,3].contains_all? [1,2]
[1,2,3].contains_all? [1,2,2]
[2,1,2,3].contains_all? [1,2,2]
end
}
end
Results in:
Rohit 0.100000 0.000000 0.100000 ( 0.104486)
Chris 0.040000 0.000000 0.040000 ( 0.040178)
Sergio 0.160000 0.020000 0.180000 ( 0.173940)
sawa 0.030000 0.000000 0.030000 ( 0.032393)
Update 2: Larger Arrays
#a1 = (1..10000).to_a
#a2 = (1..1000).to_a
#a3 = (1..2000).to_a
def bench
puts Benchmark.measure {
1000.times do
#a1.contains_all? #a2
#a1.contains_all? #a3
#a3.contains_all? #a2
end
}
end
Results in:
Rohit 9.750000 0.410000 10.160000 ( 10.158182)
Chris 10.250000 0.180000 10.430000 ( 10.433797)
Sergio 14.570000 0.070000 14.640000 ( 14.637870)
sawa 3.460000 0.020000 3.480000 ( 3.475513)

class Array
def contains_all? other
other = other.dup
each{|e| if i = other.index(e) then other.delete_at(i) end}
other.empty?
end
end

Here's a naive and straightforward implementation (not the most efficient one, likely). Just count the elements and compare both elements and their occurrence counts.
class Array
def contains_all? ary
# group the arrays, so that
# [2, 1, 1, 3] becomes {1 => 2, 2 => 1, 3 => 1}
my_groups = group_and_count self
their_groups = group_and_count ary
their_groups.each do |el, cnt|
if !my_groups[el] || my_groups[el] < cnt
return false
end
end
true
end
private
def group_and_count ary
ary.reduce({}) do |memo, el|
memo[el] ||= 0
memo[el] += 1
memo
end
end
end
[1, 2, 3].contains_all? [1, 2] # => true
[1, 2, 3].contains_all? [1, 2, 2] # => false
[2, 1, 2, 3].contains_all? [1, 2, 2] # => true
[1, 2, 3].contains_all? [] # => true
[].contains_all? [1, 2] # => false

It seems you need a multiset. Check out this gem, I think it does what you need.
You can use is and do something like (if the intersection is equal to the second multiset then the first one includes all of its elements):
#ms1 & #ms2 == #ms2

Counting the number of occurrences and comparing them seems to be the obvious way to go.
class Array
def contains_all? arr
h = self.inject(Hash.new(0)) {|h, i| h[i] += 1; h}
arr.each do |i|
return false unless h.has_key?(i)
return false if h[i] == 0
h[i] -= 1
end
true
end
end

class Array
def contains_all?(ary)
ary.uniq.all? { |x| count(x) >= ary.count(x) }
end
end
test
irb(main):131:0> %w[a b c c].contains_all? %w[a b c]
=> true
irb(main):132:0> %w[a b c c].contains_all? %w[a b c c]
=> true
irb(main):133:0> %w[a b c c].contains_all? %w[a b c c c]
=> false
irb(main):134:0> %w[a b c c].contains_all? %w[a]
=> true
irb(main):135:0> %w[a b c c].contains_all? %w[x]
=> false
irb(main):136:0> %w[a b c c].contains_all? %w[]
=> true
The following version is faster and shorter in code.
class Array
def contains_all?(ary)
ary.all? { |x| count(x) >= ary.count(x) }
end
end

Answering with my own implementation, but definitely want to see if someone can come up with a more efficient way. (I won't accept my own answer)
class Array
def contains_all?(a2)
a2.inject(self.dup) do |copy, el|
if copy.include? el
index = copy.index el
copy.delete_at index
else
return false
end
copy
end
true
end
end
And the tests:
1.9.3p194 :016 > [1,2,3].contains_all? [1,2] #=> true
=> true
1.9.3p194 :017 > [1,2,3].contains_all? [1,2,2] #=> false (this is where (a1-a2).empty? fails)
=> false
1.9.3p194 :018 > [2,1,2,3].contains_all? [1,2,2] #=> true
=> true

This solution will only iterate through both lists once, and hence run in linear time. It might however be too much overhead if the lists are expected to be very small.
class Array
def contains_all?(other)
return false if other.size > size
elem_counts = other.each_with_object(Hash.new(0)) { |elem,hash| hash[elem] += 1 }
each do |elem|
elem_counts.delete(elem) if (elem_counts[elem] -= 1) <= 0
return true if elem_counts.empty?
end
false
end
end

If you can't find a method, you can build one using ruby's include? method.
Official documentation: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-include-3F
Usage:
array = [1, 2, 3, 4]
array.include? 3 #=> true
Then, you can do a loop:
def array_includes_all?( array, comparision_array )
contains = true
for i in comparision_array do
unless array.include? i
contains = false
end
end
return contains
end
array_includes_all?( [1,2,3,2], [1,2,2] ) #=> true

Related

Detect if nested array contains similar elements

I have a method that gets an array of arrays and detects if any sub array is occurs more than one time, regardless of its order:
def has_similar_content?(array)
array.each.with_index do |prop1, index1|
array.each.with_index do |prop2, index2|
next if index1 == index2
return true if prop1.sort == prop2.sort
end
end
false
end
> has_similar_content?([%w[white xl], %w[red xl]])
=> false
> has_similar_content?([%w[blue xl], %w[xl blue cotton]])
=> false
> has_similar_content?([%w[blue xl], %w[xl blue]])
=> true
> has_similar_content?([%w[xl green], %w[red xl], %w[green xl]])
=> true
My problem is the runtime of this method, it has a quadratic complexity and needs an additional sort of the arrays to detect if the elements are the same.
Is there a more efficient way to do this?
I have assumed the question is as stated in my comment on the question.
Code
def disregarding_order_any_dups?(arr)
arr.map do |a|
a.each_with_object(Hash.new(0)) do |k,h|
h[k] += 1
end
end.uniq.size < arr.size
end
Examples
disregarding_order_any_dups? [%w[white xl], %w[red xl]]
#=> false
disregarding_order_any_dups? [%w[blue xl],
%w[xl blue cotton]]
#=> false
disregarding_order_any_dups? [%w[blue xl], %w[xl blue]]
#=> true
disregarding_order_any_dups? [%w[xl green], %w[red xl],
%w[green xl]]
#=> true
disregarding_order_any_dups? [[1,2,3,2], [3,1,3,2],
[2,3,1,2]]
#=> true
Complexity
If n = arr.size and m = arr.map(&:size).max, the computational complexity is O(n*m). The single statement within map's block could be replaced with a.sort, but that would increase the computational complexity to O(n*m*log(m)).
Explanation
For the last example the steps are as follows.
arr = [[1,2,3,2], [3,1,3,2], [2,3,1,2]]
b = arr.map do |a|
a.each_with_object(Hash.new(0)) do |k,h|
h[k] += 1
end
end
#=> [{1=>1, 2=>2, 3=>1}, {3=>2, 1=>1, 2=>1},
# {2=>2, 3=>1, 1=>1}]
c = b.uniq
#=> [{1=>1, 2=>2, 3=>1}, {3=>2, 1=>1, 2=>1}]
d = c.size
#=> 2
e = arr.size
#=> 3
d < e
#=> true
The expression
h = Hash.new(0)
creates a counting hash. Ruby expands h[k] += 1 to
h[k] = h[k] + 1
The hash instance methods are :[]= on the left, :[] on the right. If h does not have a key k, h[k] on the right is replaced with h's default value, which has been defined to equal zero, resulting in:
h[k] = 0 + 1
If h has a key k, h[k] on the right, the value of k, is not replaced with h's default value. See the version of Hash::new which takes an argument equal to the hash's default value.
this way is simpler:
array.
group_by(&:sort).
transform_values(&:length).
values.any? { |count| count > 1 }
This is still quadratic but it is faster :
def has_similar_content?(array)
# sort subarray only once. O( n * m * log(m) )
sorted_array= array.map(&:sort)
# if you can change the input array, this prevent object allocation :
# array.map!(&:sort!)
# compare each pair only once O( n * n/2 )
nb_elements= sorted_array.size
0.upto(nb_elements - 1).each do |i|
(i + 1).upto(nb_elements - 1).each do |j|
return true if sorted_array[i] == sorted_array[j]
end
end
return false
end

How to return the index of the array items occuring under a certain threshold

In Ruby, given an array of elements, what is the easiest way to return the indices of the elements that are not identical?
array = ['a','b','a','a','a','c'] #=> [1,5]
Expanded question:
Assuming that the identity threshold is based on the most frequent element in the array.
array = ['a','c','a','a','a','d','d'] #=> [1,5,6]
For an array with two equally frequent elements, return the indices of either of the 2 elements. e.g.
array = ['a','a','a','b','b','b'] #=>[0,1,2] or #=> [3,4,5]
Answer edited after question edit:
def idx_by_th(arr)
idx = []
occur = arr.inject(Hash.new(0)) { |k,v| k[v] += 1; k }
th = arr.sort_by { |v| occur[v] }.last
arr.each_index {|i| idx << i if arr[i]!=th}
idx
end
idx_by_th ['a','b','a','a','a','c'] # => [1, 5]
idx_by_th ['a','c','a','a','a','d','d'] # => [1, 5, 6]
idx_by_th ['a','a','a','b','b','b'] # => [0, 1, 2]
These answers are valid for the first version of the question:
ruby < 1.8.7
def get_uniq_idx(arr)
test=[]; idx=[]
arr.each_index do |i|
idx << i if !(arr[i+1..arr.length-1] + test).include?(arr[i])
test << arr[i]
end
return idx
end
puts get_uniq_idx(['a','b','a','a','a','c']).inspect # => [1, 5]
ruby >= 1.8.7:
idxs=[]
array.each_index {|i| idxs<<i if !(array.count(array[i]) > 1)}
puts idxs.inspect # => [1, 5]
It's not quite clear what you're looking for, but is something like this what you want?
array = ['a','b','a','a','a','c']
array.uniq.inject([]) do |arr, elem|
if array.count(elem) == 1
arr << array.index(elem)
end
arr
end
# => [1,5]

How to find the unique elements, and the occurrrence count of each one, in an array in Ruby

I have an array with some elements. How can I get the number of occurrences of each element in the array?
For example, given:
a = ['cat', 'dog', 'fish', 'fish']
The result should be:
a2 #=> {'cat' => 1, 'dog' => 1, 'fish' => 2}
How can I do that?
You can use Enumerable#group_by to do this:
res = Hash[a.group_by {|x| x}.map {|k,v| [k,v.count]}]
#=> {"cat"=>1, "dog"=>1, "fish"=>2}
a2 = a.reduce(Hash.new(0)) { |a, b| a[b] += 1; a }
#=> {"cat"=>1, "fish"=>2, "dog"=>1}
Ruby 2.7 has tally method for this.
tally → a_hash
Tallies the collection, i.e., counts the occurrences of each element. Returns a hash with the elements of the collection as keys and the corresponding counts as values.
['cat', 'dog', 'fish', 'fish'].tally
=> {"cat"=>1, "dog"=>1, "fish"=>2}
a2 = {}
a.uniq.each{|e| a2[e]= a.count(e)}
In 1.9.2 you can do it like this, from my experience quite a lot of people find each_with_object more readable than reduce/inject (the ones who know about it at least):
a = ['cat','dog','fish','fish']
#=> ["cat", "dog", "fish", "fish"]
a2 = a.each_with_object(Hash.new(0)) { |animal, hash| hash[animal] += 1 }
#=> {"cat"=>1, "dog"=>1, "fish"=>2}
Use the count method of Array to get the count.
a.count('cat')
m = {}
a.each do |e|
m[e] = 0 if m[e].nil?
m[e] = m[e] + 1
end
puts m
a.inject({}){|h, e| h[e] = h[e].to_i+1; h }
#=> {"cat"=>1, "fish"=>2, "dog"=>1}
or n2 solution
a.uniq.inject({}){|h, e| h[e] = a.count(e); h }
#=> {"cat"=>1, "fish"=>2, "dog"=>1}
a = ['cat','dog','fish','fish']
a2 = Hash[a.uniq.map {|i| [i, a.count(i)]}]
['cat','dog','fish','fish'].group_by(&:itself).transform_values(&:count)
=> {
"cat" => 1,
"dog" => 1,
"fish" => 2
}
a = ['cat','dog','fish','fish']
a2 = Hash.new(0)
a.each do |e|
a2[e] += 1
end
a2
ruby fu!
count = Hash[Hash[rows.group_by{|x| x}.map {|k,v| [k, v.count]}].sort_by{|k,v| v}.reverse]

How to get the number which repeated the fewest times in array?

How to get the number which repeated the fewest times ?
For example:
from [1,2,3,4,5,6,6,2,3,4,6] return [1] because "1" is only repeated onces while others are repeated 2 or more times.
from [1,1,1,2,3,3,4,4,5,6,6,2,3,4] return [2,6] because both "2" and "6" are only repeated twice instead of three or more times for other numbers.
This should work:
a.group_by{|e| a.count(e)}.min[1].uniq
ruby-1.9.2-p136 :040 > a = [1,1,1,2,3,3,4,4,6,6,2,3,4]
ruby-1.9.2-p136 :041 > a.group_by{|e| a.count(e)}.min[1].uniq
=> [2, 6]
ruby-1.9.2-p136 :044 > a = [1,2,3,4,6,6,2,3,4,6]
ruby-1.9.2-p136 :045 > a.group_by{|e| a.count(e)}.min[1].uniq
=> [1]
Update: O(n) time
def least_frequent(a)
counts = Hash.new(0)
a.each{|e| counts[e] += 1}
least =[nil, []]
counts.each do |k,v|
if least[0].nil?
least[0] = v
least[1] = k
elsif v < least[0]
least[0] = v
least[1] = [k]
elsif v == least[0]
least[1] << k
end
end
least[1]
end
Here are benchmarks(running this test 10,000 times) between the first and second method:
user system total real
first 10.950000 0.020000 10.970000 ( 10.973345)
better 0.510000 0.000000 0.510000 ( 0.511417)
with an array set to:
a = [1,1,1,2,3,3,4,4,6,6,2,3,4] * 10
You can do:
a = [1,1,1,2,3,3,4,4,5,6,6,2,3,4]
a.group_by{|i| a.count(i) }
#=> {1=>[5], 2=>[2, 2, 6, 6], 3=>[1, 1, 1, 3, 3, 3, 4, 4, 4]}
And then pick from that Hash as to what you want (the hash's key is the number of items)
>> h = [1,1,1,2,3,3,4,4,5,6,6,2,3,4].inject(Hash.new(0)) { |x,y| x[y]+=1;x }.select{|x,y| y>1 }
=> {1=>3, 2=>2, 3=>3, 4=>3, 6=>2}
>> h.values.min
=> 2
>> h.each{|x,y| puts "#{x} #{y}" if y==h.values.min }
2 2
6 2
=> {1=>3, 2=>2, 3=>3, 4=>3, 6=>2}
>>

Split array into sub-arrays based on value

I was looking for an Array equivalent String#split in Ruby Core, and was surprised to find that it did not exist. Is there a more elegant way than the following to split an array into sub-arrays based on a value?
class Array
def split( split_on=nil )
inject([[]]) do |a,v|
a.tap{
if block_given? ? yield(v) : v==split_on
a << []
else
a.last << v
end
}
end.tap{ |a| a.pop if a.last.empty? }
end
end
p (1..9 ).to_a.split{ |i| i%3==0 },
(1..10).to_a.split{ |i| i%3==0 }
#=> [[1, 2], [4, 5], [7, 8]]
#=> [[1, 2], [4, 5], [7, 8], [10]]
Edit: For those interested, the "real-world" problem which sparked this request can be seen in this answer, where I've used #fd's answer below for the implementation.
Sometimes partition is a good way to do things like that:
(1..6).partition { |v| v.even? }
#=> [[2, 4, 6], [1, 3, 5]]
I tried golfing it a bit, still not a single method though:
(1..9).chunk{|i|i%3==0}.reject{|sep,ans| sep}.map{|sep,ans| ans}
Or faster:
(1..9).chunk{|i|i%3==0 || nil}.map{|sep,ans| sep&&ans}.compact
Also, Enumerable#chunk seems to be Ruby 1.9+, but it is very close to what you want.
For example, the raw output would be:
(1..9).chunk{ |i|i%3==0 }.to_a
=> [[false, [1, 2]], [true, [3]], [false, [4, 5]], [true, [6]], [false, [7, 8]], [true, [9]]]
(The to_a is to make irb print something nice, since chunk gives you an enumerator rather than an Array)
Edit: Note that the above elegant solutions are 2-3x slower than the fastest implementation:
module Enumerable
def split_by
result = [a=[]]
each{ |o| yield(o) ? (result << a=[]) : (a << o) }
result.pop if a.empty?
result
end
end
Here are benchmarks aggregating the answers (I'll not be accepting this answer):
require 'benchmark'
a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
%w[ split_with_inject split_with_inject_no_tap split_with_each
split_with_chunk split_with_chunk2 split_with_chunk3 ].each do |method|
x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
end
end
#=> user system total real
#=> split_with_inject 1.857000 0.015000 1.872000 ( 1.879188)
#=> split_with_inject_no_tap 1.357000 0.000000 1.357000 ( 1.353135)
#=> split_with_each 1.123000 0.000000 1.123000 ( 1.123113)
#=> split_with_chunk 3.962000 0.000000 3.962000 ( 3.984398)
#=> split_with_chunk2 3.682000 0.000000 3.682000 ( 3.687369)
#=> split_with_chunk3 2.278000 0.000000 2.278000 ( 2.281228)
The implementations being tested (on Ruby 1.9.2):
class Array
def split_with_inject
inject([[]]) do |a,v|
a.tap{ yield(v) ? (a << []) : (a.last << v) }
end.tap{ |a| a.pop if a.last.empty? }
end
def split_with_inject_no_tap
result = inject([[]]) do |a,v|
yield(v) ? (a << []) : (a.last << v)
a
end
result.pop if result.last.empty?
result
end
def split_with_each
result = [a=[]]
each{ |o| yield(o) ? (result << a=[]) : (a << o) }
result.pop if a.empty?
result
end
def split_with_chunk
chunk{ |o| !!yield(o) }.reject{ |b,a| b }.map{ |b,a| a }
end
def split_with_chunk2
chunk{ |o| !!yield(o) }.map{ |b,a| b ? nil : a }.compact
end
def split_with_chunk3
chunk{ |o| yield(o) || nil }.map{ |b,a| b && a }.compact
end
end
Other Enumerable methods you might want to consider is each_slice or each_cons
I don't know how general you want it to be, here's one way
>> (1..9).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
=> nil
>> (1..10).each_slice(3) {|a| p a.size>1?a[0..-2]:a}
[1, 2]
[4, 5]
[7, 8]
[10]
here is another one (with a benchmark comparing it to the fastest split_with_each here https://stackoverflow.com/a/4801483/410102):
require 'benchmark'
class Array
def split_with_each
result = [a=[]]
each{ |o| yield(o) ? (result << a=[]) : (a << o) }
result.pop if a.empty?
result
end
def split_with_each_2
u, v = [], []
each{ |x| (yield x) ? (u << x) : (v << x) }
[u, v]
end
end
a = *(1..5000); N = 1000
Benchmark.bmbm do |x|
%w[ split_with_each split_with_each_2 ].each do |method|
x.report( method ){ N.times{ a.send(method){ |i| i%3==0 || i%5==0 } } }
end
end
user system total real
split_with_each 2.730000 0.000000 2.730000 ( 2.742135)
split_with_each_2 2.270000 0.040000 2.310000 ( 2.309600)

Resources