How to use collect and include for multidimensional array - ruby

I have:
array1 = [[1,2,3,4,5],[7,8,9,10],[11,12,13,14]]
#student_ids = [1,2,3]
I want to replace elements in array1 that are included in #student_ids with 'X'. I want to see:
I have code that is intended to do this:
array1.collect! do |i|
if i.include?(#student_ids) #
i[i.index(#student_ids)] = 'X'; i # I want to replace all with X
If #student_ids is 1, then it works, but if #student_ids has more than one element such as 1,2,3, it raises errors. Any help?

It's faster to use a hash or a set than to repeatedly test [1,2,3].include?(n).
arr = [[1,2,3,4,5],[7,8,9,10],[11,12,13,14]]
ids = [1,2,3]
Use a hash
h = ids.product(["X"]).to_h
#=> {1=>"X", 2=>"X", 3=>"X"} { |a| { |n| h.fetch(n, n) } }
#=> [["X", "X", "X", 4, 5], [7, 8, 9, 10], [11, 12, 13, 14]]
See Hash#fetch.
Use a set
require 'set'
ids = ids.to_set
#=> #<Set: {1, 2, 3}> { |a| { |n| ids.include?(n) ? "X" : n } }
#=> [["X", "X", "X", 4, 5], [7, 8, 9, 10], [11, 12, 13, 14]]
Replace both maps with map! if the array is to be modified in place (mutated).

Try following, (taking #student_ids = [1, 2, 3])
array1.inject([]) { |m,a| m << { |x| #student_ids.include?(x) ? 'X' : x } }
# => [["X", "X", "X", 4, 5], [7, 8, 9, 10], [11, 12, 13, 14]]

You can use each_with_index and replace the item you want:
array1 = [[1,2,3,4,5],[7,8,9,10],[11,12,13,14]]
#student_ids = [1,2,3]
array1.each_with_index do |sub_array, index|
sub_array.each_with_index do |item, index2|
array1[index][index2] = 'X' if #student_ids.include?(item)

You can do the following:
def remove_student_ids(arr)
arr.each_with_index do |value, index|
arr[index] = 'X' if #student_ids.include?(value) }
end{ |sub_arr| remove_student_ids(sub_arr)}


Ruby: How to find the most frequent substring of length n? [duplicate]

I have this program with a class DNA. The program counts the most frequent k-mer in a string. So, it is looking for the most common substring in a string with a length of k.
An example would be creating a dna1 object with a string of AACCAATCCG. The count k-mer method will look for a subtring with a length of k and output the most common answer. So, if we set k = 1 then 'A' and 'C' will be the most occurrence in the string because it appears four times. See example below:
>> dna1.count_kmer(1)
=> [#<Set: {"A", "C"}>, 4]
>> dna1.count_kmer(2)
=> [#<Set: {"AA", "CC"}>, 2]
Here is my DNA class :
class DNA
def initialize (nucleotide)
#nucleotide = nucleotide
def length
attr_reader :nucleotide
Here is my count kmer method that I am trying to implement:
# I have k as my only parameter because I want to pass the nucleotide string in the method
def count_kmer(k)
# I created an array as it seems like a good way to split up the nucleotide string.
counts = []
#this tries to count how many kmers of length k there are
num_kmers = self.nucleotide.length- k + 1
#this should try and look over the kmer start positions
for i in num_kmers
#Slice the string, so that way we can get the kmer
kmer = self.nucleotide.split('')
#add kmer if its not present
if !kmer = counts
counts[kmer] = 0
#increment the count for kmer
counts[kmer] +=1
#return the final count
return counts
#end dna class
I'm not sure where my method went wrong.
Something like this?
require 'set'
def count_kmer(k)
max_kmers = kmers(k)
.each_with_object( { |value, count| count[value] += 1 }
.group_by { |_,v| v }
[[1].map { |e| e[0] }), max_kmers[0]]
def kmers(k)
EDIT: Here's the full text of the class:
require 'set'
class DNA
def initialize (nucleotide)
#nucleotide = nucleotide
def length
def count_kmer(k)
max_kmers = kmers(k)
.each_with_object( { |value, count| count[value] += 1 }
.group_by { |_,v| v }
[[1].map { |e| e[0] }), max_kmers[0]]
def kmers(k)
attr_reader :nucleotide
This produces the following output, using Ruby 2.2.1, using the class and method you specified:
>> dna1 ='AACCAATCCG')
=> #<DNA:0x007fe15205bc30 #nucleotide="AACCAATCCG">
>> dna1.count_kmer(1)
=> [#<Set: {"A", "C"}>, 4]
>> dna1.count_kmer(2)
=> [#<Set: {"AA", "CC"}>, 2]
As a bonus, you can also do:
>> dna1.kmers(2)
=> ["AA", "AC", "CC", "CA", "AA", "AT", "TC", "CC", "CG"]
def most_frequent_substrings(str, k)
(0..str.size-k).each_with_object({}) do |i,h|
b = []
str[i..-1].scan( str[i,k]) { b << Regexp.last_match.begin(0) + i }
(h[b.size] ||= []) << b
end.max_by(&:first).last.each_with_object({}) { |a,h| h[str[a.first,k]] = a }
most_frequent_substrings(str, 4)
#=> {"ABBA"=>[0, 5, 14], "BBAB"=>[1, 6, 15]}
This shows that the most frequently-occurring 4-character substring of strappears 3 times. There are two such substrings: "ABBA" and "BBAB". "ABBA" begins at offsets (into str) 0, 5 and 14, "BBAB" substrings begin at offsets 1, 6 and 15.
For the example above the steps are as follows.
k = 4
n = str.size - k
#=> 20 - 4 => 16
e = (0..n).each_with_object([])
#<Enumerator: 0..16:each_with_object([])>
We can see the values that will be generated by this enumerator by converting it to an array.
#=> [[0, []], [1, []], [2, []], [3, []], [4, []], [5, []], [6, []], [7, []], [8, []],
# [9, []], [10, []], [11, []], [12, []], [13, []], [14, []], [15, []], [16, []]]
Note the empty array contained in each element will be modified as the array is built. Continuing, the first element of e is passed to the block and the block variables are assigned using parallel assignment:
i,a =
#=> [0, []]
i #=> 0
a #=> []
We are now considering the substring of size 4 that begins at str offset i #=> 0, which is seen to be "ABBA". Now the block calculation is performed.
b = []
r = str[i,k]
#=> str[0,4]
#=> "ABBA"
#=> /ABAB/
str[i..-1].scan(r) { b << Regexp.last_match.begin(0) + i }
#=> "ABBABABBABCATSABBABB".scan(r) { b << Regexp.last_match.begin(0) + i }
b #=> [0, 5, 14]
We next have
(h[b.size] ||= []) << b
which becomes
(h[b.size] = h[b.size] || []) << b
#=> (h[3] = h[3] || []) << [0, 5, 14]
Since h has no key 3, h[3] on the right side equals nil. Continuing,
#=> (h[3] = nil || []) << [0, 5, 14]
#=> (h[3] = []) << [0, 5, 14]
h #=> { 3=>[[0, 5, 14]] }
Notice that we throw away scan's return value. All we need is b
This tells us the "ABBA" appears thrice in str, beginning at offsets 0, 5 and 14.
Now observe
#=> [[0, [[0, 5, 14]]], [1, [[0, 5, 14]]], [2, [[0, 5, 14]]],
# ...
# [16, [[0, 5, 14]]]]
After all elements of e have been passed to the block, the block returns
h #=> {3=>[[0, 5, 14], [1, 6, 15]],
# 1=>[[2], [3], [7], [8], [9], [10], [11], [12], [13], [14], [15], [16]],
# 2=>[[4, 16], [5, 14], [6, 15]]}
Consider substrings that appear just once: h[1]. One of those is [2]. This pertains to the 4-character substring beginning at str offset 2:
#=> "BABA"
That is found to be the only instance of that substring. Similarly, among the substrings that appear twice is str[4,4] = str[16,4] #=> "BABB", given by h[2][0] #=> [4, 16].
Next we determine the greatest frequency of a substring of length 4:
c = h.max_by(&:first)
#=> [3, [[0, 5, 14], [1, 6, 15]]]
(which could also be written c = h.max_by { |k,_| k }).
d = c.last
#=> [[0, 5, 14], [1, 6, 15]]
For convenience, convert d to a hash:
d.each_with_object({}) { |a,h| h[str[a.first,k]] = a }
#=> {"ABBA"=>[0, 5, 14], "BBAB"=>[1, 6, 15]}
and return that hash from the method.
There is one detail that deserves mention. It is possible that d will contain two or more arrays that reference the same substring, in which case the value of the associated key (the substring) will equal the last of those arrays. Here's a simple example.
str = "AAA"
k = 2
In this case the array d above will equal
d = [[0], [1]]
Both of these reference str[0,2] #=> str[1,2] #=> "AA". In building the hash the first is overwritten by the second:
d.each_with_object({}) { |a,h| h[str[a.first,k]] = a }
#=> {"AA"=>[1]}

Ruby reducing a number array into start end range array

I have an array of numbers as below:
[11, 12, 13, 14, 19, 20, 21, 29, 30, 33]
I would like to reduce this array to:
[[11,14], [19,21], [29,30], [33,33]]
Identify consequent numbers in an array and push only the start and end of its ranges.
How to achieve this?
Exactly some problem is solved to give an example for slice_before method in ruby docs:
a = [0, 2, 3, 4, 6, 7, 9]
prev = a[0]
p a.slice_before { |e|
prev, prev2 = e, prev
prev2 + 1 != e
}.map { |es|
es.length <= 2 ? es.join(",") : "#{es.first}-#{es.last}"
In your case you need to tweak it a little:
a = [11, 12, 13, 14, 19, 20, 21, 29, 30, 33]
prev = a[0]
p a.slice_before { |e|
prev, prev2 = e, prev
prev2 + 1 != e
}.map { |es|
[es.first, es.last]
Here's another way, using an enumerator with Enumerator#next and Enumerator#peek. It works for any collection that implements succ (aka next).
def group_consecs(a)
enum = a.each
pairs = [[]]
loop do
if pairs.last.last.succ == enum.peek
pairs.last <<
pairs << []
end { |g| (g.size > 1) ? g : g*2 }
Note that Enumerator#peek raises a StopInteration exception if the enumerator enum is already at the end when enum.peek is invoked. That exception is handled by Kernel#loop, which breaks the loop.
a = [11, 12, 13, 14, 19, 20, 21, 29, 30, 33]
#=> [[11, 12, 13, 14], [19, 20, 21], [29, 30], [33, 33]]
a = ['a','b','c','f','g','i','l','m']
#=> [["a", "b", "c"], ["f", "g"], ["i", "i"], ["l", "m"]]
a = ['aa','ab','ac','af','ag','ai','al','am']
#=> [["aa", "ab", "ac"], ["af", "ag"], ["ai, ai"], ["al", "am"]]
a = [:a,:b,:c,:f,:g,:i,:l,:m]
#=> [[:a, :b, :c], [:f, :g], [:i, :i], [:l, :m]]
Generate an array of seven date objects for an example, then group consecutive dates:
require 'date'
today =
a = { today = today.succ }.values_at(0,1,2,5,6,8,9)
#=> [#<Date: 2014-08-07 ((2456877j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-08 ((2456878j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-09 ((2456879j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-12 ((2456882j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-13 ((2456883j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-15 ((2456885j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-16 ((2456886j,0s,0n),+0s,2299161j)>]
#=> [[#<Date: 2014-08-07 ((2456877j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-08 ((2456878j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-09 ((2456879j,0s,0n),+0s,2299161j)>
# ],
# [#<Date: 2014-08-12 ((2456882j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-13 ((2456883j,0s,0n),+0s,2299161j)>
# ],
# [#<Date: 2014-08-15 ((2456885j,0s,0n),+0s,2299161j)>,
# #<Date: 2014-08-16 ((2456886j,0s,0n),+0s,2299161j)>
# ]]
This is some code I wrote for a project a while ago:
class Array
# [1,2,4,5,6,7,9,13].to_ranges # => [1..2, 4..7, 9..9, 13..13]
# [1,2,4,5,6,7,9,13].to_ranges(true) # => [1..2, 4..7, 9, 13]
def to_ranges(non_ranges_ok=false)
self.sort.each_with_index.chunk { |x, i| x - i }.map { |diff, pairs|
if (non_ranges_ok)
pairs.first[0] == pairs.last[0] ? pairs.first[0] : pairs.first[0] .. pairs.last[0]
pairs.first[0] .. pairs.last[0]
if ($0 == __FILE__)
require 'awesome_print'
ary = [1, 2, 4, 5, 6, 7, 9, 13, 12]
ary.to_ranges(false) # => [1..2, 4..7, 9..9, 12..13]
ary.to_ranges(true) # => [1..2, 4..7, 9, 12..13]
ary = [1, 2, 4, 8, 5, 6, 7, 3, 9, 11, 12, 10]
ary.to_ranges(false) # => [1..12]
ary.to_ranges(true) # => [1..12]
It's easy to change that to only return the start/end pairs:
class Array
def to_range_pairs(non_ranges_ok=false)
self.sort.each_with_index.chunk { |x, i| x - i }.map { |diff, pairs|
if (non_ranges_ok)
pairs.first[0] == pairs.last[0] ? [pairs.first[0]] : [pairs.first[0], pairs.last[0]]
[pairs.first[0], pairs.last[0]]
if ($0 == __FILE__)
require 'awesome_print'
ary = [1, 2, 4, 5, 6, 7, 9, 13, 12]
ary.to_range_pairs(false) # => [[1, 2], [4, 7], [9, 9], [12, 13]]
ary.to_range_pairs(true) # => [[1, 2], [4, 7], [9], [12, 13]]
ary = [1, 2, 4, 8, 5, 6, 7, 3, 9, 11, 12, 10]
ary.to_range_pairs(false) # => [[1, 12]]
ary.to_range_pairs(true) # => [[1, 12]]
Here's an elegant solution:
arr = [11, 12, 13, 14, 19, 20, 21, 29, 30, 33]
output = []
# Sort array
# Loop through each element in the list
arr.each do |element|
# Set defaults - for if there are no consecutive numbers in the list
start = element
endd = element
# Loop through consecutive numbers and check if they are inside the list
i = 1
while arr.include?(element+i) do
# Set element as endd
endd = element+i
# Remove element from list
# Increment i
i += 1
# Push [start, endd] pair to output
output.push([start, endd])
[Edit: Ha! I misunderstood the question. In your example, for the array
a = [11, 12, 13, 14, 19, 20, 21, 29, 30, 33]
you showed the desired array of pairs to be:
[[11,14], [19,21], [29,30], [33,33]]
which correspond to the following offsets in a:
[[0,3], [4,6], [7,8], [9,9]]
These pairs respective span the first 4 elements, the next 3 elements, then next 2 elements and the next element (by coincidence, evidently). I thought you wanted such pairs, each with a span one less than the previous, and the span of the first being as large as possible. If you have a quick look at my examples below, my assumption may be clearer. Looking back I don't know why I didn't understand the question correctly (I should have looked at the answers), but there you have it.
Despite my mistake, I'll leave this up as I found it an interesting problem, and had the opportunity to use the quadratic formula in the solution.
This is how I would do it.
def pull_pairs(a)
n = ((-1 + Math.sqrt(1.0 + 8*a.size))/2).to_i
cum = 0
n.downto(1).map do |i|
first = cum
cum += i
[a[first], a[cum-1]]
a = %w{a b c d e f g h i j k l}
#=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l"]
#=> [["a", "d"], ["e", "g"], ["h", "i"], ["j", "j"]]
a = [*(1..25)]
#=> [ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10, 11, 12, 13,
# 14, 15, 16, 17, 18, 19, 20, 21, 22, 23, 24, 25]
#=> [[1, 6], [7, 11], [12, 15], [16, 18], [19, 20], [21, 21]]
a = [*(1..990)]
#=> [1, 2,..., 990]
#=> [[1, 44], [45, 87],..., [988, 989], [990, 990]]
First, we'll compute the the number of pairs of values in the array we will produce. We are given an array (expressed algebraically):
a = [a0,a1,...a(m-1)]
where m = a.size.
Given n > 0, the array to be produced is:
[[a0,a(n-1)], [a(n),a(2n-2)],...,[a(t),a(t)]]
These elements span the first n+(n-1)+...+1 elements of a. As this is an arithmetic progession, the sum equals n(n+1)/2. Ergo,
t = n(n+1)/2 - 1
Now t <= m-1, so we maximize the number of pairs in the output array by choosing the largest n such that
n(n+1)/2 <= m
which is the float solution for n in the quadratic:
n^2+n-2m = 0
rounded down to an integer, which is
a = %w{a b c d e f g h i j k l}
Then m (=a.size) = 12, so:
n = int((-1+sqrt(97))/2) = 4
and the desired array would be:
Once n has been computed, constructing the array of pairs is straightforward.

Grouping consecutive numbers in an array

I need to add consecutive numbers to a new array and, if it is not a consecutive number, add only that value to a new array:
old_array = [1, 2, 3, 5, 7, 8, 9, 20, 21, 23, 29]
I want to get this result:
new_array = [
Is there an easier way to do this?
A little late to this party but:
old_array.slice_when { |prev, curr| curr != }.to_a
# => [[1, 2, 3], [5], [7, 8, 9], [20, 21], [23], [29]]
This is the official answer given in RDoc (slightly modified):
actual = old_array.first
old_array.slice_before do
expected, actual =, e
expected != actual
A couple other ways:
old_array = [1, 2, 3, 5, 7, 8, 9, 20, 21, 23, 29]
a, b = [], []
enum = old_array.each
loop do
b <<
unless enum.peek.eql?(b.last.succ)
a << b
b = []
a << b if b.any?
a #=> [[1, 2, 3], [5], [7, 8, 9], [20, 21], [23], [29]]
def pull_range(arr)
b = arr.take_while.with_index { |e,i| e-i == arr.first }
[b, arr[b.size..-1]]
b, l = [], a
while l.any?
f, l = pull_range(l)
b << f
b #=> [[1, 2, 3], [5], [7, 8, 9], [20, 21], [23], [29]]
Using chunk you could do:
old_array.chunk([old_array[0],old_array[0]]) do |item, block_data|
if item > block_data[1]+1
block_data[0] = item
block_data[1] = item
block_data[0] { |_, i| i }
# => [[1, 2, 3], [5], [7, 8, 9], [20, 21], [23], [29]]
Some answers seem unnecessarily long, it is possible to do this in a very compact way:
arr = [1, 2, 3, 5, 7, 8, 9, 20, 21, 23, 29]
arr.inject([]) { |a,e| (a[-1] && e == a[-1][-1] + 1) ? a[-1] << e : a << [e]; a }
# [[1, 2, 3], [5], [7, 8, 9], [20, 21], [23], [29]]
Alternatively, starting with the first element to get rid of the a[-1] condition (needed for the case when a[-1] would be nil because a is empty):
arr[1..-1].inject([[arr[0]]]) { |a,e| e == a[-1][-1] + 1 ? a[-1] << e : a << [e]; a }
# [[1, 2, 3], [5], [7, 8, 9], [20, 21], [23], [29]]
Enumerable#inject iterates all elements of the enumerable, building up a result value which starts with the given object. I give it an empty Array or an Array with the first value wrapped in an Array respectively in my solutions. Then I simply check if the next element of the input Array we are iterating is equal to the last value of the last Array in the resulting Array plus 1 (i.e, if it is the next consecutive element). If it is, I append it to the last list. Otherwise, I start a new list with that element in it and append it to the resulting Array.
You could also do it like this:
old_array=[1, 2, 3, 5, 7, 8, 9, 20, 21, 23, 29]
for i in old_array.each
if i != old_array[0]
if i - prev == 1
tmp << i
new_array << tmp
if i == old_array[-1]
new_array << tmp
tmp << i
Using a Hash you can do:
counter = 0
groups = {}
old_array.each_with_index do |e, i|
groups[counter] ||= []
groups[counter].push old_array[i]
counter += 1 unless old_array.include?
new_array = { |i| groups[i] }

How to quickly print Ruby hashes in a table format?

Is there a way to quickly print a ruby hash in a table format into a file?
Such as:
keyA keyB keyC ...
123 234 345
125 347
where the values of the hash are arrays of different sizes. Or is using a double loop the only way?
Try this gem I wrote (prints hashes, ruby objects, ActiveRecord objects in tables):
Here's a version of steenslag's that works when the arrays aren't the same size:
size = h.values.max_by { |a| a.length }.length
m = { |a| a += [nil] * (size - a.length) }.transpose.insert(0, h.keys)
nil seems like a reasonable placeholder for missing values but you can, of course, use whatever makes sense.
For example:
>> h = {:a => [1, 2, 3], :b => [4, 5, 6, 7, 8], :c => [9]}
>> size = h.values.max_by { |a| a.length }.length
>> m = { |a| a += [nil] * (size - a.length) }.transpose.insert(0, h.keys)
=> [[:a, :b, :c], [1, 4, 9], [2, 5, nil], [3, 6, nil], [nil, 7, nil], [nil, 8, nil]]
>> m.each { |r| puts { |x| x.nil?? '' : x }.inspect }
[:a, :b, :c]
[ 1, 4, 9]
[ 2, 5, ""]
[ 3, 6, ""]
["", 7, ""]
["", 8, ""]
h = {:a => [1, 2, 3], :b => [4, 5, 6], :c => [7, 8, 9]}
p h.values.transpose.insert(0, h.keys)
# [[:a, :b, :c], [1, 4, 7], [2, 5, 8], [3, 6, 9]]
No, there's no built-in function. Here's a code that would format it as you want it:
data = { :keyA => [123, 125, 4456], :keyB => [234000], :keyC => [345, 347] }
length = data.values.max_by{ |v| v.length }.length
widths = {}
data.keys.each do |key|
widths[key] = 5 # minimum column width
# longest string len of values
val_len = data[key].max_by{ |v| v.to_s.length }.to_s.length
widths[key] = (val_len > widths[key]) ? val_len : widths[key]
# length of key
widths[key] = (key.to_s.length > widths[key]) ? key.to_s.length : widths[key]
result = ""
data.keys.each {|key| result += key.to_s.ljust(widths[key]) + " " }
result += "\n"
for i in 0.upto(length)
data.keys.each { |key| result += data[key][i].to_s.ljust(widths[key]) + " " }
result += "\n"
# TODO write result to file...
Any comments and edits to refine the answer are very welcome.
