How to determine whether an array is contained in another array - ruby

The question is, given [1,2,3,4,5] and [2,4,5], to determine whether (every element in) the second array is contained in the first one. The answer is true.
What's the most succinct and efficient way to do better than:
arr2.reject { |e| arr1.include?(e) } .empty?

Array subtraction should work, as in
(arr2 - arr1).empty?
Description of method:
Returns a new array that is a copy of the original array, removing any
items that also appear in [the second array]. The order is preserved from the
original array.
It compares elements using their hash and eql? methods for efficiency.
I don't consider myself an expert on efficiency, but #Ryan indicated in comments to his answer that it's reasonably efficient at scale.

The bad O(n²) one-liner would look like this:
arr2.all? { |x| arr1.include? x }
arr2.all? &arr1.method(:include?) # alternative
If your objects are hashable, you can make this O(n) by making a set out of the first array:
require 'set'
arr2.all? &Set.new(arr1).method(:include?)
If your objects are totally, like, ordered, you can make it O(n log n) with a sort and a binary search:
arr1.sort!
arr2.all? { |x| arr1.bsearch { |y| x <=> y } }

As mentioned by #Ryan you can use sets. In which case Set#subset? is available to you which is pretty readable (note the two different ways of defining a set from an array):
require 'set'
s1 = Set.new([1, 2, 3])
s2 = [1, 2].to_set
s3 = [1, 3].to_set
s4 = [1, 4].to_set
s1.subset? s1 #=> true
s2.subset? s1 #=> true
s3.subset? s1 #=> true
s4.subset? s1 #=> false
Also consider using Set#proper_subset if required.
s1.proper_subset? s1 #=> false
s2.proper_subset? s1 #=> true
NB A set contains no duplicate elements e.g. Set.new([1,2,3,3]) #=> #<Set: {1, 2, 3}>

Related

Find indices of array elements that fulfill a condition

I have an array, and I need an array of subscripts of the original array's elements that satisfy a certain condition.
map doesn't do because it yields an array of the same size. select doesn't do because it yields references to the individual array elements, not their indices. I came up with the following solution:
my_array.map.with_index {|elem,i| cond(elem) ? i : nil}.compact
If the array is large and only a few elements fulfill the conditions, another possibility would be
index_array=[]
my_array.each_with_index {|elem,i| index_array << i if cond(elem)}
Both work, but isn't there a simpler way?
Nope, there is nothing inbuilt or much simpler that what you already got.
Variation:
my_array.each_with_index.with_object([]) do |(elem, idx), indices|
indices << idx if cond(elem)
end
Another possible alternative:
my_array.select.with_index {|elem, _| cond(elem) }.map(&:last)
You can use Array#each_index with select
arr = [1, 2, 3, 4]
arr.each_index.select {|i| arr[i].odd? }
# => [0, 2]

Sort an Array Mixed With Integers and Strings - Ruby

I have an array that must be sorted with low number to high number and then alphabetical order. Must use Array#sort_by
i_want_dogs = ["I", "want", 5, "dogs", "but", "only", "have", 3]
I want it to output:
=> [3,5,"I","but","dogs","have","only","want"]
I tried:
i_want_dogs.sort_by {|x,y| x <=> y }
I know that is obviously wrong, but I can't figure it out with the integers and the strings combined.
Use the sort method with a block that defines a comparator that does what you want. I wrote a simple one that compares values when the classes are the same and class names when they are different.
def comparator(x, y)
if x.class == y.class
return x <=> y
else
return x.class.to_s <=> y.class.to_s
end
end
Use it like this:
i_want_dogs.sort { |x, y| comparator(x, y) }
Use partition to separate numbers from strings, sort each separately and join the final result, e.g.
i_want_dogs.partition { |i| i.is_a?(Fixnum) }.map(&:sort).flatten
This will give you the result:
i_want_dogs.sort_by {|x| x.to_s }
UPDATE:
Thanks #vacawama who points out that it will sort numbers alphabetically. If you need to sort number by it's value, other answers will be something you need to try.
First you need to convert the elements in the array to a string. Try this
i_want_dogs.sort_by(&:to_s)
This will return
[3,5,"I", "but", "dogs", "have", "only" "want"]

Efficient way of removing similar arrays in an array of arrays

I am trying to analyze some documents and find similarities in them. After analysis, I have an array, the elements of which are arrays of data from documents considered similar. But sometimes I have two almost similar elements, and naturally I want to leave the biggest of them. For simplification:
data = [[1,2,3,4,5,6], [7,8,9,10], [1,2,3,5,6]...]
How do I efficiently process the data that I get:
data = [[1,2,3,4,5,6], [7,8,9,10]...]
I suppose I could intersect every array, and if the intersected array matches one of the original arrays - I ignore it. Here is a quick code I wrote:
data = [[1,2,3,4,5,6], [7,8,9,10], [1,2,3,5,6], [7,9,10]]
cleaned = []
data.each_index do |i|
similar = false
data.each_index do |j|
if i == j
next
elsif data[i]&data[j] == data[i]
similar = true
break
end
end
unless similar
cleaned << data[i]
end
end
puts cleaned.inspect
Is this an efficient way to go? Also, the current behaviour only allows to leave out arrays that are a few elements short, and I might want to merge similar arrays if they occur:
[[1,2,3,4,5], [1,3,4,5,6]] => [[1,2,3,4,5,6]]
You can delete any element in the list if it is fully contained in another element:
data.delete_if do |arr|
data.any? { |a2| !a2.equal?(arr) && arr - a2 == [] }
end
# => [[1, 2, 3, 4, 5, 6], [7, 8, 9, 10]]
This is a bit more efficient than your suggestion since once you decide that an element should be removed, you don't check against it in the next iterations.

The most idiomatic way to iterate through a Ruby array, exiting when an arbitrary condition met?

I want to iterate through an array, each element of which is an array of two integers (e.g. `[3,5]'); for each of these elements, I want to calculate the sum of the two integers, exiting the loop when any of these sums exceeds a certain arbitrary value. The source array is quite large, and I will likely find the desired value near the beginning, so looping through all of the unneeded elements is not a good option.
I have written three loops to do this, all of which produce the desired result. My question is: which is more idiomatic Ruby? Or--better yet--is there a better way? I try not to use non-local loop variables in, but break statements look kind of hackish to my (admittedly novice) eye.
# Loop A
pairs.each do |pair|
pair_sum = pair.inject(:+)
arr1 << pair_sum
break if pair_sum > arr2.max
end
#Loop B - (just A condensed)
pairs.each { |pair| arr1.last <= arr2.max ? arr1 << pair.inject(:+) : break }
#Loop C
i = 0
pair_sum = 0
begin
pair_sum = pairs[i].inject(:+)
arr1 << pair_sum
i += 1
end until pair_sum > arr2.max
A similar question was asked at escaping the .each { } iteration early in Ruby, but the responses were essentially that, while using .each or .each_with_index and exiting with break when the target index was reached would work, .take(num_elements).each is more idiomatic. In my situation, however, I don't know in advance how many elements I'll have to iterate through, presenting me with what appears to be a boundary case.
This is from a project Euler-type problem I've already solved, btw. Just wondering about the community-preferred syntax. Thanks in advance for your valuable time.
take and drop have a variant take_while and drop_while where instead of providing a fixed number of elements you provide a block. Ruby will accumulate values from the receiver (in the case of take_while) as long as the block returns true. Your code could be rewritten as
array.take_while {|pair| pair.sum < foo}.map(&:sum)
This does mean that you calculate the sum of some of these pairs twice.
In Ruby 2.0 there's Enumerable#lazy which returns a lazy enumerator:
sums = pairs.lazy.map { |a, b| a + b }.take_while { |pair_sum| pair_sum < some_max_value }.force
This avoids calculating the sums twice.
[[1, 2], [3, 4], [5, 6]].find{|x, y| x + y > 6}
# => [3, 4]
[[1, 2], [3, 4], [5, 6]].find{|x, y| x + y > 6}.inject(:+)
#=> 7

Get index of array element faster than O(n)

Given I have a HUGE array, and a value from it. I want to get index of the value in array. Is there any other way, rather then call Array#index to get it? The problem comes from the need of keeping really huge array and calling Array#index enormous amount of times.
After a couple of tries I found that caching indexes inside elements by storing structs with (value, index) fields instead of the value itself gives a huge step in performance (20x times win).
Still I wonder if there's a more convenient way of finding index of en element without caching (or there's a good caching technique that will boost up the performance).
Why not use index or rindex?
array = %w( a b c d e)
# get FIRST index of element searched
puts array.index('a')
# get LAST index of element searched
puts array.rindex('a')
index: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-index
rindex: http://www.ruby-doc.org/core-1.9.3/Array.html#method-i-rindex
Convert the array into a hash. Then look for the key.
array = ['a', 'b', 'c']
hash = Hash[array.map.with_index.to_a] # => {"a"=>0, "b"=>1, "c"=>2}
hash['b'] # => 1
Other answers don't take into account the possibility of an entry listed multiple times in an array. This will return a hash where each key is a unique object in the array and each value is an array of indices that corresponds to where the object lives:
a = [1, 2, 3, 1, 2, 3, 4]
=> [1, 2, 3, 1, 2, 3, 4]
indices = a.each_with_index.inject(Hash.new { Array.new }) do |hash, (obj, i)|
hash[obj] += [i]
hash
end
=> { 1 => [0, 3], 2 => [1, 4], 3 => [2, 5], 4 => [6] }
This allows for a quick search for duplicate entries:
indices.select { |k, v| v.size > 1 }
=> { 1 => [0, 3], 2 => [1, 4], 3 => [2, 5] }
Is there a good reason not to use a hash? Lookups are O(1) vs. O(n) for the array.
If your array has a natural order use binary search.
Use binary search.
Binary search has O(log n) access time.
Here are the steps on how to use binary search,
What is the ordering of you array? For example, is it sorted by name?
Use bsearch to find elements or indices
Code example
# assume array is sorted by name!
array.bsearch { |each| "Jamie" <=> each.name } # returns element
(0..array.size).bsearch { |n| "Jamie" <=> array[n].name } # returns index
If it's a sorted array you could use a Binary search algorithm (O(log n)). For example, extending the Array-class with this functionality:
class Array
def b_search(e, l = 0, u = length - 1)
return if lower_index > upper_index
midpoint_index = (lower_index + upper_index) / 2
return midpoint_index if self[midpoint_index] == value
if value < self[midpoint_index]
b_search(value, lower_index, upper_index - 1)
else
b_search(value, lower_index + 1, upper_index)
end
end
end
Taking a combination of #sawa's answer and the comment listed there you could implement a "quick" index and rindex on the array class.
class Array
def quick_index el
hash = Hash[self.map.with_index.to_a]
hash[el]
end
def quick_rindex el
hash = Hash[self.reverse.map.with_index.to_a]
array.length - 1 - hash[el]
end
end
Still I wonder if there's a more convenient way of finding index of en element without caching (or there's a good caching technique that will boost up the performance).
You can use binary search (if your array is ordered and the values you store in the array are comparable in some way). For that to work you need to be able to tell the binary search whether it should be looking "to the left" or "to the right" of the current element. But I believe there is nothing wrong with storing the index at insertion time and then using it if you are getting the element from the same array.

Resources