I am working on a project that involves checking if the input is a n-dimensional matrix(and find its dimensions) and raise error if not. For example
arr = [ [[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]] ]
is a matrix of dimensions [3 2 2]. What would be the simplest generic way to do that ?
A recursive solution, but not pretty easy to understand.
arr1 = [[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]
arr2 = [[[1, 2], [4]], [6, [7, 8]]]
def dimensions(m)
if m.any? { |e| e.is_a?(Array) }
d = m.group_by { |e| e.is_a?(Array) && dimensions(e) }.keys
[m.size] + d.first if d.size == 1 && d.first
else
[m.size]
end
end
dimensions(arr1) #=> [3, 2, 2]
dimensions(arr2) #=> nil
Explaination
The algorithm checks first for nested arrays, m.any? { |e| e.is_a?(Array) }.
If there aren't nested arrays then you have just one dimension and it returns the size of the given array via [m.size] within the else block.
dimensions([1,2,3]) #=> [3]
If there is at least one nested array then you have to ensure that all elements are arrays and the arrays have the same dimensions. This check is done via d = m.group_by { |e| e.is_a?(Array) && dimensions(e) }.keys which groups all elements by its dimensions.
[[5, 6], [7, 8]].group_by { |e| ... }.keys
#=> [[2]], all nested array dimensions are equal [2]
[[1, 2], [4]].group_by { |e| ... }.keys
#=> [[1], [2]], different dimensions
[6, [7, 8]].group_by { |e| ... }.keys
#=> [false, [2]], an element isn't an array
The algorithm takes only the valid results of the group_by with if d.size == 1 && d.first and adds the dimensions of the nested arrays to the result via [m.size] + d.first.
If there are more than one key element or only nil which means all nested arrays are invalid then it returns nil implicitly.
That's all.
Ruby actually has a Matrix class, maybe use that?
Matrix[[[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]]]
#=> Matrix[[[1, 2], [3, 4]], [[5, 6], [7, 8]], [[9, 10], [11, 12]]]
Matrix[[1,2], [3]]
# ExceptionForMatrix::ErrDimensionMismatch: row size differs (1 should be 2)
Look for patterns in your data
If you look at your example of
[[[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]]]
and dimension [3, 2, 2] you can read the dimension element-by-element in the following way:
An array of 3 items and each of the items is ...
an array of 2 items and each of the subitems is ...
an array of 2 items.
This suggest that a dimension can be computed by calling Array#size on each level of depth.
Compute the dimension
The method above can be implemented as:
def unchecked_matrix_dimension(matrix)
dimension = []
while matrix.is_a?(Array)
dimension << matrix.size
matrix = matrix[0]
end
dimension
end
This code looks at elements in the first position only so [[1], []] is reported as having dimension of [2, 1] but it's not a valid matrix at all.
Wishful coding
Assume for a moment that we have a function matrix_dimension?(matrix, dimension) that returns true if matrix is of the specified dimension and false otherwise. We can use it to detect invalid matrices like this:
def matrix_dimension(matrix)
dimension = unchecked_matrix_dimension(matrix)
if matrix_dimension?(matrix, dimension)
dimension
else
nil
end
end
It turns out that writing matrix_dimension? is easy!
Wishes come true
We can define matrix_dimension? in a recursive fashion:
If dimension == [] then we expect a scalar value.
If dimension == [d_1] then we expect an array of d_1 submatrices of dimension [] (i.e. scalars).
If dimension == [d_1, d_2] then we expect an array of d_1 submatrices of dimension [d_2] (i.e. arrays of d_2 scalars).
In general, if dimension == [d_1, ..., d_n] then we expect an array of d_1 elements and each of these elements should be of dimension [d_2, ..., d_n. In Ruby:
def matrix_dimension?(matrix, dimension)
if dimension == []
!matrix.is_a?(Array)
else
matrix.size == dimension[0] &&
matrix.all? { |submatrix| matrix_dimension?(submatrix, dimension[1..-1]) }
end
end
With this definition of matrix_dimension? our matrix_dimension function will return the dimension, if the argument is a valid n-dimension matrix, or nil otherwise.
Complete code
def unchecked_matrix_dimension(matrix)
dimension = []
while matrix.is_a?(Array)
dimension << matrix.size
matrix = matrix[0]
end
dimension
end
def matrix_dimension(matrix)
dimension = unchecked_matrix_dimension(matrix)
if matrix_dimension?(matrix, dimension)
dimension
else
nil
end
end
def matrix_dimension?(matrix, dimension)
if dimension == []
!matrix.is_a?(Array)
else
matrix.size == dimension[0] &&
matrix.all? { |submatrix| matrix_dimension?(submatrix, dimension[1..-1]) }
end
end
Without use of Matrix class:
input = [ [[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]] ]
m3 = input.map { |a| a.map(&:size) }
m2 = input.map(&:size)
m1 = input.size
checker = ->(e, memo) { raise unless e == memo; e }
[ m1, m2.reduce(&checker), m3.reduce(&checker).reduce(&checker) ]
#⇒ [3, 2, 2]
I've solved this using recursion. If the array represents an n-dimensional matrix, an array of dimensions is returned; else false is returned.
Code
def ndim_matrix(arr)
return false if arr.map(&:size).uniq != [arr.first.size]
arrays, literals = arr.partition { |e| e.is_a? Array }
return [arr.size] if arrays.empty?
return false unless literals.empty?
res = arr.map { |e| ndim_matrix(e) }.uniq
return false if res.size > 1 or res == [false]
[arr.size, *res.first]
end
Examples
arr = [1,2]
ndim_matrix(arr)
#=> [2]
arr = [ [1,2,3],[4,5,6] ]
ndim_matrix(arr)
#=> [2,3]
arr = [ [1,2,3],[4,5,6,7] ]
ndim_matrix(arr)
#=> false
arr = [ [[1,2],[3,4]], [[5,6],[7,8]], [[9,10],[11,12]] ]
ndim_matrix(arr)
#=> [3,2,2]
arr = [ [[1,2],[3,4]], [[5,6],[7,8]], [[9,10]] ]
ndim_matrix(arr)
#=> false
arr = [ [[1,2],[3,4]], [[5,6,7],[7,8]], [[9,10],[11,12]] ]
ndim_matrix(arr)
#=> false
arr = [ [[[1,2,3],[2,1,3]],[[3,4,5],[4,3,2]]],
[[[5,6,7],[6,5,7]],[[7,8,9],[8,7,6]]],
[[[9,10,11],[10,9,8]],[[11,12,13],[12,11,10]]] ]
ndim_matrix(arr)
#=> [3, 2, 2, 3]
arr = [ [[[1,2,3],[2,1,3]],[[3,4],[4,3]]],
[[[5,6,7],[6,5,7]],[[7,8,9],[8,7,6]]],
[[[9,10,11],[10,9,8]],[[11,12,13],[12,11,10]]] ]
ndim_matrix(arr)
#=> false
Related
Here's a function in Ruby to find if 2 unique number in an array add up to a sum:
def sum_eq_n? (arr, n)
return true if arr.empty? && n == 0
p "first part array:" + String(arr.product(arr).reject { |a,b| a == b })
puts "\n"
p "first part bool:" + String(arr.product(arr).reject { |a,b| a == b }.any?)
puts "\n"
p "second part:" + String(arr.product(arr).reject { |a,b| a + b == n } )
puts "\n"
result = arr.product(arr).reject { |a,b| a == b }.any? { |a,b| a + b == n }
return result
end
#define inputs
l1 = [1, 2, 3, 4, 5, 5]
n = 10
#run function
print "Result is: " + String(sum_eq_n?(l1, n))
I'm confused how the calculation works to produce result. As you can see I've broken the function down into a few parts to visualize this. I've researched and understand the .reject and the .any? methods individually.
However, I'm still confused on how it fits all together in the 1 liner. How are the 2 blocks evaluated in combination? I've only found examples with .reject with 1 code block afterwards. Is .reject applied to both? I also thought there might be an implicit AND in between the 2 code blocks, but I tried to add a 3rd dummy block and it failed, so at this point I'm just not really sure how it works at all.
You can interpret the expression via these equivalent substitutions:
# orig
arr.product(arr).reject { |a,b| a == b }.any? { |a,b| a + b == n }
# same as
pairs = arr.product(arr)
pairs.reject { |a,b| a == b }.any? { |a,b| a + b == n }
# same as
pairs = arr.product(arr)
different_pairs = pairs.reject { |a,b| a == b }
different_pairs.any? { |a,b| a + b == n }
Each block is an argument for the respective method -- one for reject, and one for any?. They are evaluated in order, and are not combined. The parts that make up the expression can be wrapped in parenthesis to show this:
((arr.product(arr)).reject { |a,b| a == b }).any? { |a,b| a + b == n }
# broken up lines:
(
(
arr.product(arr) # pairs
).reject { |a,b| a == b } # different_pairs
).any? { |a,b| a + b == n }
Blocks in Ruby Are Method Arguments
Blocks in Ruby are first-class syntax structures for passing closures as arguments to methods. If you're more familiar with object-oriented concepts than functional ones, here is an example of an object (kind of) acting as a closure:
class MultiplyPairStrategy
def perform(a, b)
a * b
end
end
def convert_using_strategy(pairs, strategy)
new_array = []
for pair in pairs do
new_array << strategy.perform(*pair)
end
new_array
end
pairs = [
[2, 3],
[5, 4],
]
multiply_pair = MultiplyPairStrategy.new
convert_using_strategy(pairs, multiply_pair) # => [6, 20]
Which is the same as:
multiply_pair = Proc.new { |a, b| a * b }
pairs.map(&multiply_pair)
Which is the same as the most idiomatic:
pairs.map { |a, b| a * b }
The return result of the first method is returned and used by the second method.
This:
result = arr.product(arr).reject { |a,b| a == b }.any? { |a,b| a + b == n }
is functionality equivalent to:
results = arr.product(arr).reject { |a,b| a == b} # matrix of array pairs with identical values rejected
result = results.any? { |a,b| a + b == n } #true/false
This might be best visualized in pry (comments mine)
[1] pry(main)> arr = [1, 2, 3, 4, 5]
=> [1, 2, 3, 4, 5]
[2] pry(main)> n = 10
=> 10
[3] pry(main)> result_reject = arr.product(arr).reject { |a,b| a == b } # all combinations of array elements, with identical ones removed
=> [[1, 2],
[1, 3],
[1, 4],
[1, 5],
[1, 5],
[2, 1],
[2, 3],
[2, 4],
[2, 5],
[2, 5],
[3, 1],
[3, 2],
[3, 4],
[3, 5],
[3, 5],
[4, 1],
[4, 2],
[4, 3],
[4, 5],
[4, 5],
[5, 1],
[5, 2],
[5, 3],
[5, 4],
[5, 1],
[5, 2],
[5, 3],
[5, 4]]
[4] pry(main)> result_reject.any? { |a,b| a + b == n } # do any of the pairs of elements add together to equal ` n` ?
=> false
[5] pry(main)> arr.product(arr).reject { |a,b| a == b }.any? { |a,b| a + b == n } # the one liner
=> false
Each operation "chains" into the next, which visualized looks like:
arr.product(arr).reject { |a,b| a == b }.any? { |a,b| a + b == n }
|--|------A----->-----------B----------->-------------C----------|
Where part A, calling .product(arr), evaluates to an object. This object has a reject method that's called subsequently, and this object has an any? method that's called in turn. It's a fancy version of a.b.c.d where one call is used to generate an object for a subsequent call.
What's not apparent from that is the fact that product returns an Enumerator, which is an object that can be used to fetch the results, but is not the actual results per-se. It's more like an intent to return the results, and an ability to fetch them in a multitude of ways. These can be chained together to get the desired end product.
As a note this code can be reduced to:
arr.repeated_permutation(2).map(&:sum).include?(n)
Where the repeated_permutation method gives you all 2-digit combinations of numbers without duplicate numbers. This can be easily scaled up to N digits by changing that parameter. include? tests if the target is present.
If you're working with large arrays you may want to slightly optimize this:
arr.repeated_permutation(2).lazy.map(&:sum).include?(n)
Where that will stop on the first match found and avoid further sums. The lazy call has the effect of propagating individual values through to the end of the chain instead of each stage of the chain running to completion before forwarding to the next.
The idea of lazy is one of the interesting things about Enumerable. You can control how the values flow through those chains.
I have an array and I want to create a hash whose keys are the elements of the array and whose values are (an array of) the indices of the array. I want to get something like:
array = [1,3,4,5]
... # => {1=>0, 3=>1, 4=>2, 5=>3}
array = [1,3,4,5,6,6,6]
... # => {1=>0, 3=>1, 4=>2, 5=>3, 6=>[4,5,6]}
This code:
hash = Hash.new 0
array.each_with_index do |x, y|
hash[x] = y
end
works fine only if I don't have duplicate elements. When I have duplicate elements, it does not.
Any idea on how I can get something like this?
You can change the logic to special-case the situation when the key already exists, turning it into an array and pushing the new index:
arr = %i{a a b a c}
result = arr.each.with_object({}).with_index do |(elem, memo), idx|
memo[elem] = memo.key?(elem) ? [*memo[elem], idx] : idx
end
puts result
# => {:a=>[0, 1, 3], :b=>2, :c=>4}
It's worth mentioning, though, that whatever you're trying to do here could possibly be accomplished in a different way ... we have no context. In general, it's a good idea to keep key-val data types uniform, e.g. the fact that values here can be numbers or arrays is a bit of a code smell.
Also note that it doesn't make sense to use Hash.new(0) here unless you're intentionally setting a default value (which there's no reason to do). Use {} instead
I'm adding my two cents:
array = [1,3,4,5,6,6,6,8,8,8,9,7,7,7]
hash = {}
array.map.with_index {|val, idx| [val, idx]}.group_by(&:first).map do |k, v|
hash[k] = v[0][1] if v.size == 1
hash[k] = v.map(&:last) if v.size > 1
end
p hash #=> {1=>0, 3=>1, 4=>2, 5=>3, 6=>[4, 5, 6], 8=>[7, 8, 9], 9=>10, 7=>[11, 12, 13]}
It fails with duplicated element not adjacent, of course.
This is the expanded version, step by step, to show how it works.
The basic idea is to build a temporary array with pairs of value and index, then work on it.
array = [1,3,4,5,6,6,6]
tmp_array = []
array.each_with_index do |val, idx|
tmp_array << [val, idx]
end
p tmp_array #=> [[1, 0], [3, 1], [4, 2], [5, 3], [6, 4], [6, 5], [6, 6]]
tmp_hash = tmp_array.group_by { |e| e[0] }
p tmp_hash #=> {1=>[[1, 0]], 3=>[[3, 1]], 4=>[[4, 2]], 5=>[[5, 3]], 6=>[[6, 4], [6, 5], [6, 6]]}
hash = {}
tmp_hash.map do |k, v|
hash[k] = v[0][0] if v.size == 1
hash[k] = v.map {|e| e[1]} if v.size > 1
end
p hash #=> {1=>1, 3=>3, 4=>4, 5=>5, 6=>[4, 5, 6]}
It can be written as one line as:
hash = {}
array.map.with_index.group_by(&:first).map { |k, v| v.size == 1 ? hash[k] = v[0][1] : hash[k] = v.map(&:last) }
p hash
If you are prepared to accept
{ 1=>[0], 3=>[1], 4=>[2], 5=>[3], 6=>[4,5,6] }
as the return value you may write the following.
array.each_with_index.group_by(&:first).transform_values { |v| v.map(&:last) }
#=> {1=>[0], 3=>[1], 4=>[2], 5=>[3], 6=>[4, 5, 6]}
The first step in this calculation is the following.
array.each_with_index.group_by(&:first)
#=> {1=>[[1, 0]], 3=>[[3, 1]], 4=>[[4, 2]], 5=>[[5, 3]], 6=>[[6, 4], [6, 5], [6, 6]]}
This may help readers to follow the subsequent calculations.
I think you will find this return value generally more convenient to use than the one given in the question.
Here are a couple of examples where it's clearly preferable for all values to be arrays. Let:
h_orig = { 1=>0, 3=>1, 4=>2, 5=>3, 6=>[4,5,6] }
h_mod { 1=>[0], 3=>[1], 4=>[2], 5=>[3], 6=>[4,5,6] }
Create a hash h whose keys are unique elements of array and whose values are the numbers of times the key appears in the array
h_mod.transform_values(&:count)
#=> {1=>1, 3=>1, 4=>1, 5=>1, 6=>3}
h_orig.transform_values { |v| v.is_a?(Array) ? v.count : 1 }
Create a hash h whose keys are unique elements of array and whose values equal the index of the first instance of the element in the array.
h_mod.transform_values(&:min)
#=> {1=>0, 3=>1, 4=>2, 5=>3, 6=>4}
h_orig.transform_values { |v| v.is_a?(Array) ? v.min : v }
In these examples, given h_orig, we could alternatively convert values that are indices to arrays containing a single index.
h_orig.transform_values { |v| [*v].count }
h_orig.transform_values { |v| [*v].min }
This is hardly proof that it is generally more convenient for all values to be arrays, but that has been my experience and the experience of many others.
I need to merge values of hash a into out with sort keys in a.
a = {"X"=>{12=>1, 11=>4}, "Y"=>{11=>5}, "Z"=>{12=>5}}
out = [
{"X": [4, 1]},
{"Y": [5, 0]},
{"Z": [0, 5]},
]
I would do something like this:
a = {"X"=>{12=>1, 11=>4}, "Y"=>{11=>5}, "Z"=>{12=>5}}
sorted_keys = a.values.flat_map(&:keys).uniq.sort
#=> [11, 12]
a.map { |k, v| { k => v.values_at(*sorted_keys).map(&:to_i) } }
#=> [ { "X" => [4, 1] }, { "Y" => [5, 0] }, { "Z" => [0, 5] }]
Code
def modify_values(g)
sorted_keys = g.reduce([]) {|arr,(_,v)| arr | v.keys}.sort
g.each_with_object({}) {|(k,v),h| h[k] = Hash.new(0).merge(v).values_at(*sorted_keys)}
end
Example
g = {"X"=>{12=>1, 11=>4}, "Y"=>{11=>5}, "Z"=>{12=>5}}
modify_values(g)
#=> {"X"=>[4, 1], "Y"=>[5, 0], "Z"=>[0, 5]}
Explanation
The steps are as follows (for the hash a in the example). First obtain an array of the unique keys from g's values (see Enumerable#reduce and Array#|), then sort that array.
b = a.reduce([]) {|arr,(_,v)| arr | v.keys}
#=> [12, 11]
sorted_keys = b.sort
#=> [11, 12]
The first key-value pair of a, together with an empty hash, is passed to each_with_object's block. The block variables are computed using parallel assignment:
(k,v),h = [["X", {12=>1, 11=>4}], {}]
k #=> "X"
v #=> {12=>1, 11=>4}
h #=> {}
The block calculation is then performed. First an empty hash with a default value 0 is created:
f = Hash.new(0)
#=> {}
The hash v is then merged into f. The result is hash with the same key-value pairs as v but with a default value of 0. The significance of the default value is that if f does not have a key k, f[k] returns the default value. See Hash::new.
g = f.merge(v)
#=> {12=>1, 11=>4}
g.default
#=> 0 (yup)
Then extract the values corresponding to sorted_keys:
h[k] = g.values_at(*sorted_keys)
#=> {12=>1, 11=>4}.values_at(11, 12)
#=> [4, 1]
When a's next key-value pair is passed to the block, the calculations are as follows.
(k,v),h = [["Y", {11=>5}], {"X"=>[4, 1]}] # Note `h` has been updated
k #=> "Y"
v #=> {11=>5}
h #=> {"X"=>[4, 1]}
f = Hash.new(0)
#=> {}
g = f.merge(v)
#=> {11=>5}
h[k] = g.values_at(*sorted_keys)
#=> {11=>5}.values_at(11, 12)
#=> [5, 0] (Note h[12] equals h's default value)
and now
h #=> {"X"=>[4, 1], "Y"=>[5, 0]}
The calculation for the third key-value pair of a is similar.
I have this program with a class DNA. The program counts the most frequent k-mer in a string. So, it is looking for the most common substring in a string with a length of k.
An example would be creating a dna1 object with a string of AACCAATCCG. The count k-mer method will look for a subtring with a length of k and output the most common answer. So, if we set k = 1 then 'A' and 'C' will be the most occurrence in the string because it appears four times. See example below:
dna1 = DNA.new('AACCAATCCG')
=> AACCAATCCG
>> dna1.count_kmer(1)
=> [#<Set: {"A", "C"}>, 4]
>> dna1.count_kmer(2)
=> [#<Set: {"AA", "CC"}>, 2]
Here is my DNA class :
class DNA
def initialize (nucleotide)
#nucleotide = nucleotide
end
def length
#nucleotide.length
end
protected
attr_reader :nucleotide
end
Here is my count kmer method that I am trying to implement:
# I have k as my only parameter because I want to pass the nucleotide string in the method
def count_kmer(k)
# I created an array as it seems like a good way to split up the nucleotide string.
counts = []
#this tries to count how many kmers of length k there are
num_kmers = self.nucleotide.length- k + 1
#this should try and look over the kmer start positions
for i in num_kmers
#Slice the string, so that way we can get the kmer
kmer = self.nucleotide.split('')
end
#add kmer if its not present
if !kmer = counts
counts[kmer] = 0
#increment the count for kmer
counts[kmer] +=1
end
#return the final count
return counts
end
#end dna class
end
I'm not sure where my method went wrong.
Something like this?
require 'set'
def count_kmer(k)
max_kmers = kmers(k)
.each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
.group_by { |_,v| v }
.max
[Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
end
def kmers(k)
nucleotide.chars.each_cons(k).map(&:join)
end
EDIT: Here's the full text of the class:
require 'set'
class DNA
def initialize (nucleotide)
#nucleotide = nucleotide
end
def length
#nucleotide.length
end
def count_kmer(k)
max_kmers = kmers(k)
.each_with_object(Hash.new(0)) { |value, count| count[value] += 1 }
.group_by { |_,v| v }
.max
[Set.new(max_kmers[1].map { |e| e[0] }), max_kmers[0]]
end
def kmers(k)
nucleotide.chars.each_cons(k).map(&:join)
end
protected
attr_reader :nucleotide
end
This produces the following output, using Ruby 2.2.1, using the class and method you specified:
>> dna1 = DNA.new('AACCAATCCG')
=> #<DNA:0x007fe15205bc30 #nucleotide="AACCAATCCG">
>> dna1.count_kmer(1)
=> [#<Set: {"A", "C"}>, 4]
>> dna1.count_kmer(2)
=> [#<Set: {"AA", "CC"}>, 2]
As a bonus, you can also do:
>> dna1.kmers(2)
=> ["AA", "AC", "CC", "CA", "AA", "AT", "TC", "CC", "CG"]
Code
def most_frequent_substrings(str, k)
(0..str.size-k).each_with_object({}) do |i,h|
b = []
str[i..-1].scan(Regexp.new str[i,k]) { b << Regexp.last_match.begin(0) + i }
(h[b.size] ||= []) << b
end.max_by(&:first).last.each_with_object({}) { |a,h| h[str[a.first,k]] = a }
end
Example
str = "ABBABABBABCATSABBABB"
most_frequent_substrings(str, 4)
#=> {"ABBA"=>[0, 5, 14], "BBAB"=>[1, 6, 15]}
This shows that the most frequently-occurring 4-character substring of strappears 3 times. There are two such substrings: "ABBA" and "BBAB". "ABBA" begins at offsets (into str) 0, 5 and 14, "BBAB" substrings begin at offsets 1, 6 and 15.
Explanation
For the example above the steps are as follows.
k = 4
n = str.size - k
#=> 20 - 4 => 16
e = (0..n).each_with_object([])
#<Enumerator: 0..16:each_with_object([])>
We can see the values that will be generated by this enumerator by converting it to an array.
e.to_a
#=> [[0, []], [1, []], [2, []], [3, []], [4, []], [5, []], [6, []], [7, []], [8, []],
# [9, []], [10, []], [11, []], [12, []], [13, []], [14, []], [15, []], [16, []]]
Note the empty array contained in each element will be modified as the array is built. Continuing, the first element of e is passed to the block and the block variables are assigned using parallel assignment:
i,a = e.next
#=> [0, []]
i #=> 0
a #=> []
We are now considering the substring of size 4 that begins at str offset i #=> 0, which is seen to be "ABBA". Now the block calculation is performed.
b = []
r = Regexp.new str[i,k]
#=> Regexp.new str[0,4]
#=> Regexp.new "ABBA"
#=> /ABAB/
str[i..-1].scan(r) { b << Regexp.last_match.begin(0) + i }
#=> "ABBABABBABCATSABBABB".scan(r) { b << Regexp.last_match.begin(0) + i }
b #=> [0, 5, 14]
We next have
(h[b.size] ||= []) << b
which becomes
(h[b.size] = h[b.size] || []) << b
#=> (h[3] = h[3] || []) << [0, 5, 14]
Since h has no key 3, h[3] on the right side equals nil. Continuing,
#=> (h[3] = nil || []) << [0, 5, 14]
#=> (h[3] = []) << [0, 5, 14]
h #=> { 3=>[[0, 5, 14]] }
Notice that we throw away scan's return value. All we need is b
This tells us the "ABBA" appears thrice in str, beginning at offsets 0, 5 and 14.
Now observe
e.to_a
#=> [[0, [[0, 5, 14]]], [1, [[0, 5, 14]]], [2, [[0, 5, 14]]],
# ...
# [16, [[0, 5, 14]]]]
After all elements of e have been passed to the block, the block returns
h #=> {3=>[[0, 5, 14], [1, 6, 15]],
# 1=>[[2], [3], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]],
# 2=>[[4, 16], [5, 14], [6, 15]]}
Consider substrings that appear just once: h[1]. One of those is [2]. This pertains to the 4-character substring beginning at str offset 2:
str[2,4]
#=> "BABA"
That is found to be the only instance of that substring. Similarly, among the substrings that appear twice is str[4,4] = str[16,4] #=> "BABB", given by h[2][0] #=> [4, 16].
Next we determine the greatest frequency of a substring of length 4:
c = h.max_by(&:first)
#=> [3, [[0, 5, 14], [1, 6, 15]]]
(which could also be written c = h.max_by { |k,_| k }).
d = c.last
#=> [[0, 5, 14], [1, 6, 15]]
For convenience, convert d to a hash:
d.each_with_object({}) { |a,h| h[str[a.first,k]] = a }
#=> {"ABBA"=>[0, 5, 14], "BBAB"=>[1, 6, 15]}
and return that hash from the method.
There is one detail that deserves mention. It is possible that d will contain two or more arrays that reference the same substring, in which case the value of the associated key (the substring) will equal the last of those arrays. Here's a simple example.
str = "AAA"
k = 2
In this case the array d above will equal
d = [[0], [1]]
Both of these reference str[0,2] #=> str[1,2] #=> "AA". In building the hash the first is overwritten by the second:
d.each_with_object({}) { |a,h| h[str[a.first,k]] = a }
#=> {"AA"=>[1]}
Is there an easy way or a method to partition an array into arrays of contiguous numbers in Ruby?
[1,2,3,5,6,8,10] => [[1,2,3],[5,6],[8],[10]]
I can make some routine for that but wonder if there's a quick way.
Sam
I like to inject:
numbers = [1, 2, 3, 5, 6, 8, 10]
contiguous_arrays = []
contiguous_arrays << numbers[1..-1].inject([numbers.first]) do |contiguous, n|
if n == contiguous.last.succ
contiguous << n
else
contiguous_arrays << contiguous
[n]
end
end
#=> [[1, 2, 3], [5, 6], [8], [10]]
A smörgåsbord of approaches, with:
arr = [1,2,3,5,6,8,10]
#1
# If subarray is empty or the current value n is not the last value + 1,
# add the subarray [n] to the collection; else append the current value
# to the last subarray that was added to the collection.
arr.each_with_object([]) { |n,a|
(a.empty? || n != a.last.last+1) ? a << [n] : a[-1] << n }
#=> [[1, 2, 3], [5, 6], [8], [10]]
#2
# Change the value of 'group' to the current value n if it is the first
# element in arr or it is not equal to the previous element in arr + 1,
# then 'chunk' on 'group' and extract the result from the resulting chunked
# array.
arr.map.with_index do |n,i|
group = n if i == 0 || n != arr[i-1] + 1
[n, group]
end.chunk(&:last)
.map { |_,c| c.map(&:first) }
#=> [[1, 2, 3], [5, 6], [8], [10]]
#3
# If n is the last element of arr, append any number other than n+1 to
# a copy of arr and convert to an enumerator. Step though the enumerator
# arr.size times, adding the current value to a subarray b, and using
# 'peek' to see if the next value of 'arr' equals the current value plus 1.
# If it does, add the subarray b to the collecton a and set b => [].
enum = (arr+[arr.last]).to_enum
a, b = [], []
arr.size.times do
curr = enum.next
b << curr
(a << b; b = []) unless curr + 1 == enum.peek
end
end
a
#=> [[1, 2, 3], [5, 6], [8], [10]]
#4
# Add elements n of arr sequentially to an array a, each time first inserting
# an arbitrary separator string SEP when n does not equal the previous value
# of arr + 1, map each element of a to a string, join(' '), split on SEP and
# convert each resulting array of strings to an array of integers.
SEP = '+'
match_val = arr.first
arr.each_with_object([]) do |n,a|
(a << SEP) unless n == match_val
a << n
match_val = n + 1
end.map(&:to_s)
.join(' ')
.split(SEP)
.map { |s| s.split(' ').map(&:to_i) }
#=> [[1, 2, 3], [5, 6], [8], [10]]
All of the above methods work when arr contains negative integers.
arr = [1,2,3,5,6,8,10]
prev = arr[0]
result = arr.slice_before { |e|
prev, prev2 = e, prev
e != prev2.succ
}.entries
p result
Not very original, lifted right out of the Ruby docs actually.
Another method with enumerator:
module Enumerable
def split_if
enum = each
result = []
tmp = [enum.peek]
loop do
v1, v2 = enum.next, enum.peek
if yield(v1, v2)
result << tmp
tmp = [enum.peek]
else
tmp << v2
end
end
result
end
end
[1,2,3,5,6,8,10].split_if {|i,j| j-i > 1}
Or:
class Array
def split_if(&block)
prev_element = nil
inject([[]]) do |results, element|
if prev_element && block.call(prev_element, element)
results << [element]
else
results.last << element
end
prev_element = element
results
end
end
end
Just do it iteratively.
x = [1,2,3,5,6,8,10]
y = []; z = []
(1..x.length - 1).each do |i|
y << x[i - 1]
if x[i] != x[i-1] + 1
z << y
y = []
end
end
y << x[x.length - 1]
z << y
z
# => [[1, 2, 3], [5, 6], [8], [10]]