How to find all longest words in a string? - ruby

If I have a string with no spaces in it, just a concatenation like "hellocarworld", I want to get back an array of the largest dictionary words. so I would get ['hello','car','world']. I would not get back words such as 'a' because that belongs in 'car'.
The dictionary words can come from anywhere such as the dictionary on unix:
words = File.readlines("/usr/share/dict/words").collect{|x| x.strip}
string= "thishasmanywords"
How would you go about doing this?

I would suggest the following.
Code
For a given string and dictionary, dict:
string_arr = string.chars
string_arr.size.downto(1).with_object([]) { |n,arr|
string_arr.each_cons(n) { |a|
word = a.join
arr << word if (dict.include?(word) && !arr.any? {|w| w.include?(word) })}}
Examples
dict = File.readlines("/usr/share/dict/words").collect{|x| x.strip}
string = "hellocarworld"
#=> ["hello", "world", "loca", "car"]
string= "thishasmanywords"
#=> ["this", "hish", "many", "word", "sha", "sma", "as"]
"loca" is the plural of "locus". I'd never heard of "hish", "sha" or "sma". They all appear to be slang words, as I could only find them in something called the "Urban Dictonary".
Explanation
string_arr = "hellocarworld".chars
#=> ["h", "e", "l", "l", "o", "c", "a", "r", "w", "o", "r", "l", "d"]
string_arr.size
#=> 13
so for this string we have:
13.downto(1).with_object([]) { |n,arr|...
where arr is an initially-empty array that will be computed and returned. For n => 13,
enum = string_arr.each_cons(13)
#<Enumerator: ["h","e","l","l","o","c","a","r","w","o","r","l","d"]:each_cons(13)>
which enumerates over an array consisting of the single array string_arr:
enum.size #=> 1
enum.first == string_arr #=> true
That single array is assigned to the block variable a, so we obtain:
word = enum.first.join
#=> "hellocarworld"
We find
dict.include?(word) #=> false
so this word is not added to the array arr. It is was in the dictionary we would check to make sure it was not a substring of any word already in arr, which are all of the same size or larger (longer words).
Next we compute:
enum = string_arr.each_cons(12)
#<Enumerator: ["h","e","l","l","o","c","a","r","w","o","r","l","d"]:each_cons(12)>
which we can see enumerates two arrays:
enum = string_arr.each_cons(12).to_a
#=> [["h", "e", "l", "l", "o", "c", "a", "r", "w", "o", "r", "l"],
# ["e", "l", "l", "o", "c", "a", "r", "w", "o", "r", "l", "d"]]
corresponding to the words:
enum.first.join #=> "hellocarworl"
enum.last.join #=> "ellocarworld"
neither of which are in the dictionary. We continue in this fashion, until we reach n => 1:
string_arr.each_cons(1).to_a
#=> [["h"], ["e"], ["l"], ["l"], ["o"], ["c"],
# ["a"], ["r"], ["w"], ["o"], ["r"], ["l"], ["d"]]
We find only "a" in the dictionary, but as it is a substring of "loca" or "car", which are already elements of the array arr, we do not add it.

This can be a bit tricky if you're not familiar with the technique. I often lean heavily on regular expressions for this:
words = File.readlines("/usr/share/dict/words").collect(&:strip).reject(&:empty?)
regexp = Regexp.new(words.sort_by(&:length).reverse.join('|'))
phrase = "hellocarworld"
equiv = [ ]
while (m = phrase.match(regexp)) do
phrase.gsub!(m[0]) do
equiv << m[0]
'*'
end
end
equiv
# => ["hello", "car", "world"]
Update: Strip out blank strings which would cause the while loop to run forever.

Starting at the beginning of the input string, find the longest word in the dictionary. Chop that word off the beginning of the input string and repeat.
Once the input string is empty, you are done. If the string is not empty but no word was found, remove the first character and continue the process.

Related

How to find two elements of the same array that contain all vowels

I want to iterate a given array, for example:
["goat", "action", "tear", "impromptu", "tired", "europe"]
I want to look at all possible pairs.
The desired output is a new array, which contains all pairs, that combined contain all vowels. Also those pairs should be concatenated as one element of the output array:
["action europe", "tear impromptu"]
I tried the following code, but got an error message:
No implicit conversion of nil into string.
def all_vowel_pairs(words)
pairs = []
(0..words.length).each do |i| # iterate through words
(0..words.length).each do |j| # for every word, iterate through words again
pot_pair = words[i].to_s + words[j] # build string from pair
if check_for_vowels(pot_pair) # throw string to helper-method.
pairs << words[i] + " " + words[j] # if gets back true, concatenade and push to output array "pairs"
end
end
end
pairs
end
# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
vowels = "aeiou"
founds = []
string.each_char do |char|
if vowels.include?(char) && !founds.include?(char)
founds << char
end
end
if founds.length == 5
return true
end
false
end
The following code is intended to provide an efficient way to construct the desired array when the number of words is large. Note that, unlike the other answers, it does not make use of the method Array#combination.
The first part of the section Explanation (below) provides an overview of the approach taken by the algorithm. The details are then filled in.
Code
require 'set'
VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set
def all_vowel_pairs(words)
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
h.each_with_object([]) do |(k,v),a|
vowels_needed = VOWELS_SET-k
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
end
end
Example
words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]
all_vowel_pairs(words)
#=> ["action europe", "hear impromptu", "impromptu tear"]
Explanation
For the given example the steps are as follows.
VOWELS_SET = VOWELS.to_set
#=> #<Set: {"a", "e", "i", "o", "u"}>
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
#=> {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# #<Set: {"e", "a"}>=>["tear", "hear"],
# #<Set: {"i", "o", "u"}>=>["impromptu"],
# #<Set: {"i", "e"}>=>["tired"],
# #<Set: {"e", "u", "o"}>=>["europe"]}
It is seen that the keys of h are subsets of the five vowels. The values are arrays of elements of words (words) that contain the vowels given by the key and no others. The values therefore collectively form a partition of words. When the number of words is large one would expect h to have 31 keys (2**5 - 1).
We now loop through the key-value pairs of h. For each, with key k and value v, the set of missing vowels (vowels_needed) is determined, then we loop through those keys-value pairs [kk, vv] of h for which kk is a superset of vowels_needed. All combinations of elements of v and vv are then added to the array being returned (after an adjustment to avoid double-counting each pair of words).
Continuing,
enum = h.each_with_object([])
#=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# ...
# #<Set: {"e", "u", "o"}>=>["europe"]}:
# each_with_object([])>
The first value is generated by enum and passed to the block, and the block variables are assigned values:
(k,v), a = enum.next
#=> [[#<Set: {"o", "a"}>, ["goat"]], []]
See Enumerator#next.
The individual variables are assigned values by array decomposition:
k #=> #<Set: {"o", "a"}>
v #=> ["goat"]
a #=> []
The block calculations are now performed.
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "i", "u"}>
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
The word "goat" (v) has vowels "o" and "a", so it can only be matched with words that contain vowels "e", "i" and "u" (and possibly "o" and/or "a"). The expression
next unless kk.superset?(vowels_needed)
skips those keys of h (kk) that are not supersets of vowels_needed. See Set#superset?.
None of the words in words contain "e", "i" and "u" so the array a is unchanged.
The next element is now generated by enum, passed to the block and the block variables are assigned values:
(k,v), a = enum.next
#=> [[#<Set: {"a", "i", "o"}>, ["action"]], []]
k #=> #<Set: {"a", "i", "o"}>
v #=> ["action"]
a #=> []
The block calculation begins:
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "u"}>
We see that h has only one key-value pair for which the key is a superset of vowels_needed:
kk = %w|e u o|.to_set
#=> #<Set: {"e", "u", "o"}>
vv = ["europe"]
We therefore execute:
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
which adds one element to a:
a #=> ["action europe"]
The clause if w1 < w2 is to ensure that later in the calculations "europe action" is not added to a.
If v (words containing 'a', 'i' and 'u') and vv (words containing 'e', 'u' and 'o') had instead been:
v #=> ["action", "notification"]
vv #=> ["europe", "route"]
we would have added "action europe", "action route" and "notification route" to a. (”europe notification” would be added later, when k #=> #<Set: {"e", "u", "o"}.)
Benchmark
I benchmarked my method against others suggested using #theTinMan's Fruity benchmark code. The only differences were in the array of words to be tested and the addition of my method to the benchmark, which I named cary. For the array of words to be considered I selected 600 words at random from a file of English words on my computer:
words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
#=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
# "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]
This array was found to contain 46,436 pairs of words containing all five vowels.
The results were as shown below.
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
_cary { cary(words) }
}
Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1
I then compared cary with ttm3 for 1,000 randomly selected words. This array was found to contain 125,068 pairs of words containing all five vowels. That result was as follows:
Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0
To get a feel for the variability of the benchmark I ran this last comparison twice more, each with a new random selection of 1,000 words. That gave me the following results:
Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0
Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0
It is seen the there is considerable variation among the samples.
You said pairs so I assume it's a combination of two elements. I've made a combination of each two elements in the array using the #combination method. Then I #select-ed only those pairs that contain all vowels once they're joined. Finally, I made sure to join those pairs :
["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }
#=> ["action europe", "tear impromptu"]
The regex is from "What is the regex to match the words containing all the vowels?".
Starting similarly to Viktor's, I'd use a simple test to see what vowels exist in the words and compare to whether they match "aeiou" after stripping duplicates and sorting them:
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
Breaking it down so you can see what's happening.
["goat", "action", "tear", "impromptu", "tired", "europe"] # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |a| a # => ["goat", "action"], ["goat", "tear"], ["goat", "impromptu"], ["goat", "tired"], ["goat", "europe"], ["action", "tear"], ["action", "impromptu"], ["action", "tired"], ["action", "europe"], ["tear", "impromptu"], ["tear", "tired"], ["tear", "europe"], ["impromptu", "tired"], ["impromptu", "europe"], ["tired", "europe"]
.join # => "goataction", "goattear", "goatimpromptu", "goattired", "goateurope", "actiontear", "actionimpromptu", "actiontired", "actioneurope", "tearimpromptu", "teartired", "teareurope", "impromptutired", "impromptueurope", "tiredeurope"
.scan(/[aeiou]/) # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
.uniq # => ["o", "a", "i"], ["o", "a", "e"], ["o", "a", "i", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "e", "u"], ["e", "a", "i", "o", "u"], ["e", "a", "i"], ["e", "a", "u", "o"], ["i", "o", "u", "e"], ["i", "o", "u", "e"], ["i", "e", "u", "o"]
.sort # => ["a", "i", "o"], ["a", "e", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "o", "u"], ["a", "e", "i", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "i", "o", "u"], ["a", "e", "i", "o", "u"], ["a", "e", "i"], ["a", "e", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"]
.join == 'aeiou' # => false, false, false, false, false, false, false, false, true, true, false, false, false, false, false
} # => [["action", "europe"], ["tear", "impromptu"]]
Looking at the code it was jumping through hoops to find whether all the vowels exist. Every time it checked it had to step through many methods before determining whether all the vowels were found; In other words it couldn't short-circuit and fail until the very end which isn't good.
This code will:
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
But I don't like using the regular expression engine this way as it's slower than doing a direct lookup, which lead to:
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
Here's the benchmark:
require 'fruity'
words = ["goat", "action", "tear", "impromptu", "tired", "europe"]
def viktor(ary)
ary.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }
end
viktor(words) # => ["action europe", "tear impromptu"]
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
}
With the results:
# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1
Now, because this looks so much like a homework assignment, it's important to understand that schools are aware of Stack Overflow, and they look for students asking for help, so you probably don't want to reuse this code, especially not verbatim.
Your code contains two errors, one of which is causing the error message.
(0..words.length) loops from 0 to 6 . words[6] however does not exist (arrays are zero-based), so you get nil. Replacing by (0..words.length-1) (twice) should take care of that.
You will get every correct result twice, once as "action europe" and once as "europe action". This is caused by looping too much, going two times over every combination. Replace the second loop from (0..words.length-1) to (i..words.length-1).
This cumbersome bookkeeping of indexes is boring and leads to mistakes very often. This is why Ruby programmers often prefer more hassle-free methods (like combination as in other answers), avoiding indexes altogether.

How to make the array of the alphabet to rotate from "z" to "a" (ruby)

I am doing a Caesar cipher. I thought that the unless statement will work but it doesn't with or without then. Then I changed the unless with if and put ; in the place of then and it reads : undefined method `>' for nil:NilClass.
def caesar_cipher(input, key)
input.each do |x|
numbers = x.ord + key.to_i unless (numbers > 122) then numbers = x.ord + key - 26
letters = numbers.chr
print letters
end
end
puts "Write the words you want to be ciphered: "
input = gets.chomp.split(//)
puts "Write the key (1 - 26): "
key = gets.chomp
caesar_cipher(input,key)
Here are a couple of Ruby-like ways to write that:
#1
def caesar_cipher(input, key)
letters = ('a'..'z').to_a
input.each_char.map { |c| letters.include?(c) ?
letters[(letters.index(c)+key) % 26] : c }.join
end
caesar_cipher("this is your brown dog", 2)
#=> "vjku ku aqwt dtqyp fqi"
#2
def caesar_cipher(input, key)
letters = ('a'..'z').to_a
h = letters.zip(letters.rotate(key)).to_h
h.default_proc = ->(_,k) { k }
input.gsub(/./,h)
end
caesar_cipher("this is your brown dog", 2)
#=> "vjku ku aqwt dtqyp fqi"
The hash h constructed in #2 equals:
h = letters.zip(letters.rotate(key)).to_h
#=> {"a"=>"c", "b"=>"d", "c"=>"e", "d"=>"f", "e"=>"g", "f"=>"h",
# ...
# "u"=>"w", "v"=>"x", "w"=>"y", "x"=>"z", "y"=>"a", "z"=>"b"}
h.default_proc = ->(_,k) { k } causes
h[c] #=> c
if c is not a lowercase letter (e.g., a space, capital letter, number, punctuation, etc.)
If you write a branch with condition (if or unless) at the end of a line, after an initial statement, there are two things that apply and affect you:
The condition is assessed before the statement on its left. In your case that means numbers has not been assigned yet so it is nil.
The branch decision is whether or not to run the initial statement, you do not branch to the statement after the then.
You can solve this simply by converting your condition to an if and moving it to a separate line:
def caesar_cipher(input, key)
input.each do |x|
numbers = x.ord + key.to_i
if (numbers > 122)
numbers = x.ord + key - 26
end
letters = numbers.chr
print letters
end
end
There are arguably better ways of coding this cipher in Ruby, but this should solve your immediate problem.
There is a more elegant way to loop repeating sequences in ruby. Meet Enumerable#cycle.
('a'..'z').cycle.take(50)
# => ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j",
# "k", "l", "m", "n", "o", "p", "q", "r", "s", "t",
# "u", "v", "w", "x", "y", "z", "a", "b", "c", "d",
# "e", "f", "g", "h", "i", "j", "k", "l", "m", "n",
# "o", "p", "q", "r", "s", "t", "u", "v", "w", "x"]
Therefore, translating a single letter given a key can be written as:
('a'..'z').cycle.take(letter.ord + key.to_i - 'a'.ord.pred).last
And the entire method can look prettier:
def caesar_cipher(phrase, key)
phrase.each_char.map do |letter|
('a'..'z').cycle.take(letter.ord + key.to_i - 'a'.ord.pred).last
end.join
end
puts caesar_cipher('abcxyz', 3) # => defabc
Note that this is slower than the alternative, but it also has the benefit that it's easier to read and the key can be any number.

How to join every X amount of characters together in an Array - Ruby

If I want to join every X amount of letters together in an array how could I implement this?
In this case I want to join every two letters together
Input: array = ["b", "i", "e", "t", "r", "o"]
Output: array = ["bi", "et", "ro"]
each_slice (docs):
arr = 'bietro'.split ''
# grab each slice of 2 elements
p arr.each_slice(2).to_a
#=> [["b", "i"], ["e", "t"], ["r", "o"]]
# map `join' over each of the slices
p arr.each_slice(2).map(&:join)
#=> ["bi", "et", "ro"]
#Doorknow shows the best way, but here are two (among many, many) other ways:
def bunch_em(arr,n)
((arr.size+n-1)/n).times.map { |i| arr.slice(i*n,n).join }
end
arr = ["b", "i", "e", "t", "r", "o"]
bunch_em(arr,2) #=> ["bi", "et", "ro"]

How to match bar, b-a-r, b--a--r etc in a string by Regexp

Given a string, I want to find a word bar, b-a-r, b--a--r etc. where - can be any letter. But interval between letters must be the same.
All letters are lower case and there is no gap betweens.
For example bar, beayr, qbowarprr, wbxxxxxayyyyyrzzz should match this.
I tried /b[a-z]*a[a-z]*r/ but this matches bxar which is wrong.
I am wondering if I achieve this with regexp?
Here's is one way to get all matches.
Code
def all_matches_with_spacers(word, str)
word_size = word.size
word_arr = word.chars
str_arr = str.chars
(0..(str.size - word_size)/(word_size-1)).each_with_object([]) do |n, arr|
regex = Regexp.new(word_arr.join(".{#{n}}"))
str_arr.each_cons(word_size + n * (word_size - 1))
.map(&:join)
.each { |substring| arr << substring if substring =~ regex }
end
end
This requires word.size > 1.
Example
all_matches_with_spacers('bar', 'bar') #=> ["bar"]
all_matches_with_spacers('bar', 'beayr') #=> ["beayr"]
all_matches_with_spacers('bar', 'qbowarprr') #=> ["bowarpr"]
all_matches_with_spacers('bar', 'wbxxxxxayyyyyrzzz') #=> ["bxxxxxayyyyyr"]
all_matches_with_spacers('bobo', 'bobobocbcbocbcobcodbddoddbddobddoddbddob')
#=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
Explanation
Suppose
word = 'bobo'
str = 'bobobocbcbocbcobcodbddoddbddobddoddbddob'
then
word_size = word.size #=> 4
word_arr = word.chars #=> ["b", "o", "b", "o"]
str_arr = str.chars
#=> ["b", "o", "b", "o", "b", "o", "c", "b", "c", "b", "o", "c", "b", "c",
# "o", "b", "c", "o", "d", "b", "d", "d", "o", "d", "d", "b", "d", "d",
# "o", "b", "d", "d", "o", "d", "d", "b", "d", "d", "o", "b"]
If n is the number of spacers between each letter of word, we require
word.size + n * (word.size - 1) <= str.size
Hence (since str.size => 40),
n <= (str.size - word_size)/(word_size-1) #=> (40-4)/(4-1) => 12
We therefore will iterate over zero to 12 spacers:
(0..12).each_with_object([]) do |n, arr| .. end
Enumerable#each_with_object creates an initially-empty array denoted by the block variable arr. The first value passed to block is zero (spacers), assigned to the block variable n.
We then have
regex = Regexp.new(word_arr.join(".{#{0}}")) #=> /b.{0}o.{0}b.{0}o/
which is the same as /bar/. word with n spacers has length
word_size + n * (word_size - 1) #=> 19
To extract all sub-arrays of str_arr with this length, we invoke:
str_arr.each_cons(word_size + n * (word_size - 1))
Here, with n = 0, this is:
enum = str_arr.each_cons(4)
#=> #<Enumerator: ["b", "o", "b", "o", "b", "o",...,"b"]:each_cons(4)>
This enumerator will pass the following into its block:
enum.to_a
#=> [["b", "o", "b", "o"], ["o", "b", "o", "b"], ["b", "o", "b", "o"],
# ["o", "b", "o", "c"], ["b", "o", "c", "b"], ["o", "c", "b", "c"],
# ["c", "b", "c", "b"], ["b", "c", "b", "o"], ["c", "b", "o", "c"],
# ["b", "o", "c", "b"], ["o", "c", "b", "c"], ["c", "b", "c", "o"],
# ["b", "c", "o", "b"], ["c", "o", "b", "c"], ["o", "b", "c", "o"]]
We next convert these to strings:
ar = enum.map(&:join)
#=> ["bobo", "obob", "bobo", "oboc", "bocb", "ocbc", "cbcb", "bcbo",
# "cboc", "bocb", "ocbc", "cbco", "bcob", "cobc", "obco"]
and add each (assigned to the block variable substring) to the array arr for which:
substring =~ regex
ar.each { |substring| arr << substring if substring =~ regex }
arr => ["bobo", "bobo"]
Next we increment the number of spacers to n = 1. This has the following effect:
regex = Regexp.new(word_arr.join(".{#{1}}")) #=> /b.{1}o.{1}b.{1}o/
str_arr.each_cons(4 + 1 * (4 - 1)) #=> str_arr.each_cons(7)
so we now examine the strings
ar = str_arr.each_cons(7).map(&:join)
#=> ["boboboc", "obobocb", "bobocbc", "obocbcb", "bocbcbo", "ocbcboc",
# "cbcbocb", "bcbocbc", "cbocbco", "bocbcob", "ocbcobc", "cbcobco",
# "bcobcod", "cobcodb", "obcodbd", "bcodbdd", "codbddo", "odbddod",
# "dbddodd", "bddoddb", "ddoddbd", "doddbdd", "oddbddo", "ddbddob",
# "dbddobd", "bddobdd", "ddobddo", "dobddod", "obddodd", "bddoddb",
# "ddoddbd", "doddbdd", "oddbddo", "ddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
There are no matches with one spacer, so arr remains unchanged:
arr #=> ["bobo", "bobo"]
For n = 2 spacers:
regex = Regexp.new(word_arr.join(".{#{2}}")) #=> /b.{2}o.{2}b.{2}o/
str_arr.each_cons(4 + 2 * (4 - 1)) #=> str_arr.each_cons(10)
ar = str_arr.each_cons(10).map(&:join)
#=> ["bobobocbcb", "obobocbcbo", "bobocbcboc", "obocbcbocb", "bocbcbocbc",
# "ocbcbocbco", "cbcbocbcob", "bcbocbcobc", "cbocbcobco", "bocbcobcod",
# ...
# "ddoddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
arr #=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
No matches are found for more than two spacers, so the method returns
["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
For reference, there is a beautiful solution to the overall problem that is available in regex flavors that allow a capturing group to refer to itself:
^[^b]*bar|b(?:[^a](?=[^a]*a(\1?+.)))+a\1r
Sadly, Ruby doesn't allow this.
The interesting bit is on the right side of the alternation. After matching the initial b, we define a non-capturing group for the characters between b and a. This group will be repeated with the +. Between the a and r, we will inject capture group 1 with \1`. This group was captured one character at a time, overwriting itself with each pass, as each character between b and a was added.
See Quantifier Capture where the solution was demonstrated by #CasimiretHippolyte who refers to the idea behind the technique the "qtax trick".

Generate array of all letters and digits

Using ruby, is it possible to make an array of each letter in the alphabet and 0-9 easily?
[*('a'..'z'), *('0'..'9')] # doesn't work in Ruby 1.8
or
('a'..'z').to_a + ('0'..'9').to_a
or
(0...36).map{ |i| i.to_s 36 }
(the Integer#to_s method converts a number to a string representing it in a desired numeral system)
for letters or numbers you can form ranges and iterate over them. try this to get a general idea:
("a".."z").each { |letter| p letter }
to get an array out of it, just try the following:
("a".."z").to_a
You can also do it this way:
'a'.upto('z').to_a + 0.upto(9).to_a
Try this:
alphabet_array = [*'a'..'z', *'A'..'Z', *'0'..'9']
Or as string:
alphabet_string = alphabet_array.join
myarr = [*?a..?z] #generates an array of strings for each letter a to z
myarr = [*?a..?z] + [*?0..?9] # array of strings a-z and 0-9
letters = *('a'..'z')
=> ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
You can just do this:
("0".."Z").map { |i| i }

Resources