How to match bar, b-a-r, b--a--r etc in a string by Regexp - ruby

Given a string, I want to find a word bar, b-a-r, b--a--r etc. where - can be any letter. But interval between letters must be the same.
All letters are lower case and there is no gap betweens.
For example bar, beayr, qbowarprr, wbxxxxxayyyyyrzzz should match this.
I tried /b[a-z]*a[a-z]*r/ but this matches bxar which is wrong.
I am wondering if I achieve this with regexp?

Here's is one way to get all matches.
Code
def all_matches_with_spacers(word, str)
word_size = word.size
word_arr = word.chars
str_arr = str.chars
(0..(str.size - word_size)/(word_size-1)).each_with_object([]) do |n, arr|
regex = Regexp.new(word_arr.join(".{#{n}}"))
str_arr.each_cons(word_size + n * (word_size - 1))
.map(&:join)
.each { |substring| arr << substring if substring =~ regex }
end
end
This requires word.size > 1.
Example
all_matches_with_spacers('bar', 'bar') #=> ["bar"]
all_matches_with_spacers('bar', 'beayr') #=> ["beayr"]
all_matches_with_spacers('bar', 'qbowarprr') #=> ["bowarpr"]
all_matches_with_spacers('bar', 'wbxxxxxayyyyyrzzz') #=> ["bxxxxxayyyyyr"]
all_matches_with_spacers('bobo', 'bobobocbcbocbcobcodbddoddbddobddoddbddob')
#=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
Explanation
Suppose
word = 'bobo'
str = 'bobobocbcbocbcobcodbddoddbddobddoddbddob'
then
word_size = word.size #=> 4
word_arr = word.chars #=> ["b", "o", "b", "o"]
str_arr = str.chars
#=> ["b", "o", "b", "o", "b", "o", "c", "b", "c", "b", "o", "c", "b", "c",
# "o", "b", "c", "o", "d", "b", "d", "d", "o", "d", "d", "b", "d", "d",
# "o", "b", "d", "d", "o", "d", "d", "b", "d", "d", "o", "b"]
If n is the number of spacers between each letter of word, we require
word.size + n * (word.size - 1) <= str.size
Hence (since str.size => 40),
n <= (str.size - word_size)/(word_size-1) #=> (40-4)/(4-1) => 12
We therefore will iterate over zero to 12 spacers:
(0..12).each_with_object([]) do |n, arr| .. end
Enumerable#each_with_object creates an initially-empty array denoted by the block variable arr. The first value passed to block is zero (spacers), assigned to the block variable n.
We then have
regex = Regexp.new(word_arr.join(".{#{0}}")) #=> /b.{0}o.{0}b.{0}o/
which is the same as /bar/. word with n spacers has length
word_size + n * (word_size - 1) #=> 19
To extract all sub-arrays of str_arr with this length, we invoke:
str_arr.each_cons(word_size + n * (word_size - 1))
Here, with n = 0, this is:
enum = str_arr.each_cons(4)
#=> #<Enumerator: ["b", "o", "b", "o", "b", "o",...,"b"]:each_cons(4)>
This enumerator will pass the following into its block:
enum.to_a
#=> [["b", "o", "b", "o"], ["o", "b", "o", "b"], ["b", "o", "b", "o"],
# ["o", "b", "o", "c"], ["b", "o", "c", "b"], ["o", "c", "b", "c"],
# ["c", "b", "c", "b"], ["b", "c", "b", "o"], ["c", "b", "o", "c"],
# ["b", "o", "c", "b"], ["o", "c", "b", "c"], ["c", "b", "c", "o"],
# ["b", "c", "o", "b"], ["c", "o", "b", "c"], ["o", "b", "c", "o"]]
We next convert these to strings:
ar = enum.map(&:join)
#=> ["bobo", "obob", "bobo", "oboc", "bocb", "ocbc", "cbcb", "bcbo",
# "cboc", "bocb", "ocbc", "cbco", "bcob", "cobc", "obco"]
and add each (assigned to the block variable substring) to the array arr for which:
substring =~ regex
ar.each { |substring| arr << substring if substring =~ regex }
arr => ["bobo", "bobo"]
Next we increment the number of spacers to n = 1. This has the following effect:
regex = Regexp.new(word_arr.join(".{#{1}}")) #=> /b.{1}o.{1}b.{1}o/
str_arr.each_cons(4 + 1 * (4 - 1)) #=> str_arr.each_cons(7)
so we now examine the strings
ar = str_arr.each_cons(7).map(&:join)
#=> ["boboboc", "obobocb", "bobocbc", "obocbcb", "bocbcbo", "ocbcboc",
# "cbcbocb", "bcbocbc", "cbocbco", "bocbcob", "ocbcobc", "cbcobco",
# "bcobcod", "cobcodb", "obcodbd", "bcodbdd", "codbddo", "odbddod",
# "dbddodd", "bddoddb", "ddoddbd", "doddbdd", "oddbddo", "ddbddob",
# "dbddobd", "bddobdd", "ddobddo", "dobddod", "obddodd", "bddoddb",
# "ddoddbd", "doddbdd", "oddbddo", "ddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
There are no matches with one spacer, so arr remains unchanged:
arr #=> ["bobo", "bobo"]
For n = 2 spacers:
regex = Regexp.new(word_arr.join(".{#{2}}")) #=> /b.{2}o.{2}b.{2}o/
str_arr.each_cons(4 + 2 * (4 - 1)) #=> str_arr.each_cons(10)
ar = str_arr.each_cons(10).map(&:join)
#=> ["bobobocbcb", "obobocbcbo", "bobocbcboc", "obocbcbocb", "bocbcbocbc",
# "ocbcbocbco", "cbcbocbcob", "bcbocbcobc", "cbocbcobco", "bocbcobcod",
# ...
# "ddoddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
arr #=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
No matches are found for more than two spacers, so the method returns
["bobo", "bobo", "bddoddbddo", "bddoddbddo"]

For reference, there is a beautiful solution to the overall problem that is available in regex flavors that allow a capturing group to refer to itself:
^[^b]*bar|b(?:[^a](?=[^a]*a(\1?+.)))+a\1r
Sadly, Ruby doesn't allow this.
The interesting bit is on the right side of the alternation. After matching the initial b, we define a non-capturing group for the characters between b and a. This group will be repeated with the +. Between the a and r, we will inject capture group 1 with \1`. This group was captured one character at a time, overwriting itself with each pass, as each character between b and a was added.
See Quantifier Capture where the solution was demonstrated by #CasimiretHippolyte who refers to the idea behind the technique the "qtax trick".

Related

How to find two elements of the same array that contain all vowels

I want to iterate a given array, for example:
["goat", "action", "tear", "impromptu", "tired", "europe"]
I want to look at all possible pairs.
The desired output is a new array, which contains all pairs, that combined contain all vowels. Also those pairs should be concatenated as one element of the output array:
["action europe", "tear impromptu"]
I tried the following code, but got an error message:
No implicit conversion of nil into string.
def all_vowel_pairs(words)
pairs = []
(0..words.length).each do |i| # iterate through words
(0..words.length).each do |j| # for every word, iterate through words again
pot_pair = words[i].to_s + words[j] # build string from pair
if check_for_vowels(pot_pair) # throw string to helper-method.
pairs << words[i] + " " + words[j] # if gets back true, concatenade and push to output array "pairs"
end
end
end
pairs
end
# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
vowels = "aeiou"
founds = []
string.each_char do |char|
if vowels.include?(char) && !founds.include?(char)
founds << char
end
end
if founds.length == 5
return true
end
false
end
The following code is intended to provide an efficient way to construct the desired array when the number of words is large. Note that, unlike the other answers, it does not make use of the method Array#combination.
The first part of the section Explanation (below) provides an overview of the approach taken by the algorithm. The details are then filled in.
Code
require 'set'
VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set
def all_vowel_pairs(words)
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
h.each_with_object([]) do |(k,v),a|
vowels_needed = VOWELS_SET-k
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
end
end
Example
words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]
all_vowel_pairs(words)
#=> ["action europe", "hear impromptu", "impromptu tear"]
Explanation
For the given example the steps are as follows.
VOWELS_SET = VOWELS.to_set
#=> #<Set: {"a", "e", "i", "o", "u"}>
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
#=> {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# #<Set: {"e", "a"}>=>["tear", "hear"],
# #<Set: {"i", "o", "u"}>=>["impromptu"],
# #<Set: {"i", "e"}>=>["tired"],
# #<Set: {"e", "u", "o"}>=>["europe"]}
It is seen that the keys of h are subsets of the five vowels. The values are arrays of elements of words (words) that contain the vowels given by the key and no others. The values therefore collectively form a partition of words. When the number of words is large one would expect h to have 31 keys (2**5 - 1).
We now loop through the key-value pairs of h. For each, with key k and value v, the set of missing vowels (vowels_needed) is determined, then we loop through those keys-value pairs [kk, vv] of h for which kk is a superset of vowels_needed. All combinations of elements of v and vv are then added to the array being returned (after an adjustment to avoid double-counting each pair of words).
Continuing,
enum = h.each_with_object([])
#=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# ...
# #<Set: {"e", "u", "o"}>=>["europe"]}:
# each_with_object([])>
The first value is generated by enum and passed to the block, and the block variables are assigned values:
(k,v), a = enum.next
#=> [[#<Set: {"o", "a"}>, ["goat"]], []]
See Enumerator#next.
The individual variables are assigned values by array decomposition:
k #=> #<Set: {"o", "a"}>
v #=> ["goat"]
a #=> []
The block calculations are now performed.
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "i", "u"}>
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
The word "goat" (v) has vowels "o" and "a", so it can only be matched with words that contain vowels "e", "i" and "u" (and possibly "o" and/or "a"). The expression
next unless kk.superset?(vowels_needed)
skips those keys of h (kk) that are not supersets of vowels_needed. See Set#superset?.
None of the words in words contain "e", "i" and "u" so the array a is unchanged.
The next element is now generated by enum, passed to the block and the block variables are assigned values:
(k,v), a = enum.next
#=> [[#<Set: {"a", "i", "o"}>, ["action"]], []]
k #=> #<Set: {"a", "i", "o"}>
v #=> ["action"]
a #=> []
The block calculation begins:
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "u"}>
We see that h has only one key-value pair for which the key is a superset of vowels_needed:
kk = %w|e u o|.to_set
#=> #<Set: {"e", "u", "o"}>
vv = ["europe"]
We therefore execute:
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
which adds one element to a:
a #=> ["action europe"]
The clause if w1 < w2 is to ensure that later in the calculations "europe action" is not added to a.
If v (words containing 'a', 'i' and 'u') and vv (words containing 'e', 'u' and 'o') had instead been:
v #=> ["action", "notification"]
vv #=> ["europe", "route"]
we would have added "action europe", "action route" and "notification route" to a. (”europe notification” would be added later, when k #=> #<Set: {"e", "u", "o"}.)
Benchmark
I benchmarked my method against others suggested using #theTinMan's Fruity benchmark code. The only differences were in the array of words to be tested and the addition of my method to the benchmark, which I named cary. For the array of words to be considered I selected 600 words at random from a file of English words on my computer:
words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
#=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
# "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]
This array was found to contain 46,436 pairs of words containing all five vowels.
The results were as shown below.
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
_cary { cary(words) }
}
Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1
I then compared cary with ttm3 for 1,000 randomly selected words. This array was found to contain 125,068 pairs of words containing all five vowels. That result was as follows:
Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0
To get a feel for the variability of the benchmark I ran this last comparison twice more, each with a new random selection of 1,000 words. That gave me the following results:
Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0
Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0
It is seen the there is considerable variation among the samples.
You said pairs so I assume it's a combination of two elements. I've made a combination of each two elements in the array using the #combination method. Then I #select-ed only those pairs that contain all vowels once they're joined. Finally, I made sure to join those pairs :
["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }
#=> ["action europe", "tear impromptu"]
The regex is from "What is the regex to match the words containing all the vowels?".
Starting similarly to Viktor's, I'd use a simple test to see what vowels exist in the words and compare to whether they match "aeiou" after stripping duplicates and sorting them:
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
Breaking it down so you can see what's happening.
["goat", "action", "tear", "impromptu", "tired", "europe"] # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |a| a # => ["goat", "action"], ["goat", "tear"], ["goat", "impromptu"], ["goat", "tired"], ["goat", "europe"], ["action", "tear"], ["action", "impromptu"], ["action", "tired"], ["action", "europe"], ["tear", "impromptu"], ["tear", "tired"], ["tear", "europe"], ["impromptu", "tired"], ["impromptu", "europe"], ["tired", "europe"]
.join # => "goataction", "goattear", "goatimpromptu", "goattired", "goateurope", "actiontear", "actionimpromptu", "actiontired", "actioneurope", "tearimpromptu", "teartired", "teareurope", "impromptutired", "impromptueurope", "tiredeurope"
.scan(/[aeiou]/) # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
.uniq # => ["o", "a", "i"], ["o", "a", "e"], ["o", "a", "i", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "e", "u"], ["e", "a", "i", "o", "u"], ["e", "a", "i"], ["e", "a", "u", "o"], ["i", "o", "u", "e"], ["i", "o", "u", "e"], ["i", "e", "u", "o"]
.sort # => ["a", "i", "o"], ["a", "e", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "o", "u"], ["a", "e", "i", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "i", "o", "u"], ["a", "e", "i", "o", "u"], ["a", "e", "i"], ["a", "e", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"]
.join == 'aeiou' # => false, false, false, false, false, false, false, false, true, true, false, false, false, false, false
} # => [["action", "europe"], ["tear", "impromptu"]]
Looking at the code it was jumping through hoops to find whether all the vowels exist. Every time it checked it had to step through many methods before determining whether all the vowels were found; In other words it couldn't short-circuit and fail until the very end which isn't good.
This code will:
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
But I don't like using the regular expression engine this way as it's slower than doing a direct lookup, which lead to:
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
Here's the benchmark:
require 'fruity'
words = ["goat", "action", "tear", "impromptu", "tired", "europe"]
def viktor(ary)
ary.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }
end
viktor(words) # => ["action europe", "tear impromptu"]
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
}
With the results:
# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1
Now, because this looks so much like a homework assignment, it's important to understand that schools are aware of Stack Overflow, and they look for students asking for help, so you probably don't want to reuse this code, especially not verbatim.
Your code contains two errors, one of which is causing the error message.
(0..words.length) loops from 0 to 6 . words[6] however does not exist (arrays are zero-based), so you get nil. Replacing by (0..words.length-1) (twice) should take care of that.
You will get every correct result twice, once as "action europe" and once as "europe action". This is caused by looping too much, going two times over every combination. Replace the second loop from (0..words.length-1) to (i..words.length-1).
This cumbersome bookkeeping of indexes is boring and leads to mistakes very often. This is why Ruby programmers often prefer more hassle-free methods (like combination as in other answers), avoiding indexes altogether.

Intersections and Unions in Ruby for sets with repeated elements

How do we get intersections and unions in Ruby for sets that repeat elements.
# given the sets
a = ["A", "B", "B", "C", "D", "D"]
b = ["B", "C", "D", "D", "D", "E"]
# A union function that adds repetitions
union(a, b)
=> ["A", "B", "B", "C", "D", "D", "D", "E"]
# An intersection function that adds repetitions
intersection(a, b)
=> ["B", "C", "D", "D"]
The &, and | operators seem to ignore repetitions and duplicates, as written in the documentation.
# union without duplicates
a | b
=> ["A", "B", "C", "D", "E"]
# intersections without duplicates
a & b
=> ["B", "C", "D"]
def union(a,b)
(a|b).flat_map { |s| [s]*[a.count(s), b.count(s)].max }
end
union(a,b)
# => ["A", "B", "B", "C", "D", "D", "D", "E"]
def intersection(a,b)
(a|b).flat_map { |s| [s]*[a.count(s), b.count(s)].min }
end
intersection(a,b)
#=> ["B", "C", "D", "D"]
Building upon Cary Swoveland's answer, you could create a temporary hash to count the number of occurrences of each member in each array: (I've generalized the number of arguments)
def multiplicities(*arrays)
m = Hash.new { |h, k| h[k] = Array.new(arrays.size, 0) }
arrays.each_with_index { |ary, idx| ary.each { |x| m[x][idx] += 1 } }
m
end
multiplicities(a, b)
#=> {"A"=>[1, 0], "B"=>[2, 1], "C"=>[1, 1], "D"=>[2, 3], "E"=>[0, 1]}
Implementing union and intersection is straight forward:
def union(*arrays)
multiplicities(*arrays).flat_map { |x, m| Array.new(m.max, x) }
end
def intersection(*arrays)
multiplicities(*arrays).flat_map { |x, m| Array.new(m.min, x) }
end
union(a, b) #=> ["A", "B", "B", "C", "D", "D", "D", "E"]
intersection(a, b) #=> ["B", "C", "D", "D"]
With this approach each array has to be traversed only once.

palindrome partition ruby no output

Hi I'm doing the palindrome partition problem using recursion. This problem is return all possible palindrome partitions of a given string input.
Input: "aab"Output: [["aa", "b"], ["a", "a", "b"]]
A palindrome partition definition: given a string S, a partition is a set of substrings, each containing one or more characters, such that every substring is a palindrome
My code is below. The issue I'm having is that the result array never gets correctly populated. From a high level I feel like my logic makes sense, but when I try to debug it I'm not really sure what is going on.
def partition(string)
result = []
output = []
dfs(string, 0, output, result)
result
end
def dfs(string, start, output, result)
if start == string.length
result << output
return
end
(start..string.length-1).to_a.each do |i|
if is_palindrome(string, start, i)
output << string[start..(i-start+1)]
dfs(string, i+1, output, result)
output.pop
end
end
end
def is_palindrome(string, start, end_value)
result = true
while start < end_value do
result = false if string[start] != string[end_value]
start += 1
end_value -= 1
end
result
end
puts partition("aab")
Yes, you do want to use recursion. I haven't analyzed your code carefully, but I see one problem is the following in the method dfs:
if start == string.length
result << output
return
end
If the if condition is satisfied, return without an argument will return nil. Perhaps you want return result.
Here is a relatively compact, Ruby-like way of writing it.
def pps(str)
return [[]] if str.empty?
(1..str.size).each_with_object([]) do |i,a|
s = str[0,i]
next unless is_pal?(s)
pps(str[i..-1]).each { |b| a << [s, *b] }
end
end
def is_pal?(str)
str == str.reverse
end
pps "aab"
#=> [["a", "a", "b"],
# ["aa", "b"]]
pps "aabbaa"
#=> [["a", "a", "b", "b", "a", "a"],
# ["a", "a", "b", "b", "aa"],
# ["a", "a", "bb", "a", "a"],
# ["a", "a", "bb", "aa"],
# ["a", "abba", "a"],
# ["aa", "b", "b", "a", "a"],
# ["aa", "b", "b", "aa"],
# ["aa", "bb", "a", "a"],
# ["aa", "bb", "aa"],
# ["aabbaa"]]
pps "aabbbxaa"
#=> [["a", "a", "b", "b", "b", "x", "a", "a"],
# ["a", "a", "b", "b", "b", "x", "aa"],
# ["a", "a", "b", "bb", "x", "a", "a"],
# ["a", "a", "b", "bb", "x", "aa"],
# ["a", "a", "bb", "b", "x", "a", "a"],
# ["a", "a", "bb", "b", "x", "aa"],
# ["a", "a", "bbb", "x", "a", "a"],
# ["a", "a", "bbb", "x", "aa"],
# ["aa", "b", "b", "b", "x", "a", "a"],
# ["aa", "b", "b", "b", "x", "aa"],
# ["aa", "b", "bb", "x", "a", "a"],
# ["aa", "b", "bb", "x", "aa"],
# ["aa", "bb", "b", "x", "a", "a"],
# ["aa", "bb", "b", "x", "aa"],
# ["aa", "bbb", "x", "a", "a"],
# ["aa", "bbb", "x", "aa"]]
pps "abcdefghijklmnopqrstuvwxyz"
#=> [["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m",
# "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]]
The best way of understanding how this recursion works is add some puts statements and re-run it.
def pps(str)
puts "\nstr=#{str}"
return [[]] if str.empty?
rv = (1..str.size).each_with_object([]) do |i,a|
s = str[0,i]
puts "i=#{i}, a=#{a}, s=#{s}, is_pal?(s)=#{is_pal?(s)}"
next unless is_pal?(s)
pps(str[i..-1]).each { |b| puts "b=#{b}, [s,*b]=#{[s,*b]}"; a << [s, *b] }
puts "a after calling pps=#{a}"
end
puts "rv=#{rv}"
rv
end
pps "aab"
str=aab
i=1, a=[], s=a, is_pal?(s)=true
str=ab
i=1, a=[], s=a, is_pal?(s)=true
str=b
i=1, a=[], s=b, is_pal?(s)=true
str=
b=[], [s,*b]=["b"]
a after calling pps=[["b"]]
rv=[["b"]]
b=["b"], [s,*b]=["a", "b"]
a after calling pps=[["a", "b"]]
i=2, a=[["a", "b"]], s=ab, is_pal?(s)=false
rv=[["a", "b"]]
b=["a", "b"], [s,*b]=["a", "a", "b"]
a after calling pps=[["a", "a", "b"]]
i=2, a=[["a", "a", "b"]], s=aa, is_pal?(s)=true
str=b
i=1, a=[], s=b, is_pal?(s)=true
str=
b=[], [s,*b]=["b"]
a after calling pps=[["b"]]
rv=[["b"]]
b=["b"], [s,*b]=["aa", "b"]
a after calling pps=[["a", "a", "b"], ["aa", "b"]]
i=3, a=[["a", "a", "b"], ["aa", "b"]], s=aab, is_pal?(s)=false
rv=[["a", "a", "b"], ["aa", "b"]]
#=> [["a", "a", "b"], ["aa", "b"]]

Why can't I sort an array of strings by `count`?

With this code:
line = ("Ignore punctuation, please :)")
string = line.strip.downcase.split(//)
string.select! {|x| /[a-z]/.match(x) }
string.sort_by!{ |x| string.count(x)}
the result is:
["r", "g", "s", "l", "c", "o", "o", "p", "u", "i", "t", "u", "a", "t", "i", "a", "p", "n", "e", "e", "n", "n", "e"]
Does sorting by count not work in this case? Why? Is there a better way to isolate the words by frequency?
By your comment, I suppose that you want to sort characters by frequency and alphabetically. When the only sort_by! criteria is string.count(x), frequency groups with the same number of characters can appear mixed with each other. To sort each group alphabetically you have to add a second criteria in the sort_by! method:
line = ("Ignore punctuation, please :)")
string = line.strip.downcase.split(//)
string.select! {|x| /[a-z]/.match(x) }
string.sort_by!{ |x| [string.count(x), x]}
Then the output will be
["c", "g", "l", "r", "s", "a", "a", "i", "i", "o", "o", "p", "p", "t", "t", "u", "u", "e", "e", "e", "n", "n", "n"]
Let's look at your code line-by-line.
line = ("Ignore punctuation, please :)")
s = line.strip.downcase
#=> "ignore punctuation, please :)"
There's no particular reason to strip here, as you will be removing spaces and punctuation later anyway.
string = s.split(//)
#=> ["i", "g", "n", "o", "r", "e", " ", "p", "u", "n", "c", "t",
# "u", "a", "t", "i", "o", "n", ",", " ", "p", "l", "e", "a",
# "s", "e", " ", ":", ")"]
You've chosen to split the sentence into characters, which is fine, but as I'll mention at the end, you could just use String methods. In any case,
string = s.chars
does the same thing and is arguably more clear. What you have now is an array named string. Isn't that a bit confusing? Let's instead call it arr:
arr = s.chars
(One often sees s and str for names of strings, a and arr for names of arrays, h and hash for names of hashes, and so on.)
arr.select! {|x| /[a-z]/.match(x) }
#=> ["i", "g", "n", "o", "r", "e", "p", "u", "n", "c", "t", "u",
# "a", "t", "i", "o", "n", "p", "l", "e", "a", "s", "e"]
Now you've eliminated all but lowercase letters. You could also write that:
arr.select! {|x| s =~ /[a-z]/ }
or
arr.select! {|x| s[/[a-z]/] }
You are now ready to sort.
arr.sort_by!{ |x| arr.count(x) }
#=> ["l", "g", "s", "c", "r", "i", "p", "u", "a", "o", "t", "p",
# "a", "t", "i", "o", "u", "n", "n", "e", "e", "n", "e"]
This is OK, but it's not good practice to be sorting an array in place and counting the frequency of its elements at the same time. Better would be:
arr1 = arr.sort_by{ |x| arr.count(x) }
which gives the same ordering. Is the resulting sorted array correct? Let's count the number of times each letter appears in the string.
I will create a hash whose keys are the unique elements of arr and whose values are the number of times the associated key appears in arr. There are a few ways to do this. A simple but not very efficient way is as follows:
h = {}
a = arr.uniq
#=> ["l", "g", "s", "c", "r", "i", "p", "u", "a", "o", "t", "n", "e"]
a.each { |c| h[c] = arr.count(c) }
h #=> {"l"=>1, "g"=>1, "s"=>1, "c"=>1, "r"=>1, "i"=>2, "p"=>2,
# "u"=>2, "a"=>2, "o"=>2, "t"=>2, "n"=>3, "e"=>3}
This would normally be written:
h = arr.uniq.each_with_object({}) { |c,h| h[c] = arr.count(c) }
The elements of h are in increasing order of value, but that's just coincidence. To ensure they are in that order (to make it easier to see the order), we would need to construct an array, sort it, then convert it to a hash:
a = arr.uniq.map { |c| [c, arr.count(c)] }
#=> [["l", 1], ["g", 1], ["s", 1], ["c", 1], ["r", 1], ["a", 2], ["p", 2],
# ["u", 2], ["i", 2], ["o", 2], ["t", 2], ["n", 3], ["e", 3]]
a = a.sort_by { |_,count| count }
#=> [["l", 1], ["g", 1], ["s", 1], ["c", 1], ["r", 1], ["a", 2], ["t", 2],
# ["u", 2], ["i", 2], ["o", 2], ["p", 2], ["n", 3], ["e", 3]]
h = Hash[a]
#=> {"l"=>1, "g"=>1, "s"=>1, "c"=>1, "r"=>1, "i"=>2, "t"=>2,
# "u"=>2, "a"=>2, "o"=>2, "p"=>2, "n"=>3, "e"=>3}
One would normally see this written:
h = Hash[arr.uniq.map { |c| [c, arr.count(c)] }.sort_by(&:last)]
or, in Ruby v2.0+:
h = arr.uniq.map { |c| [c, arr.count(c)] }.sort_by(&:last).to_h
Note that, prior to Ruby 1.9, there was no concept of key ordering in hashes.
The values of h's key-value pairs show that your sort is correct. It is not, however, very efficient. That's because in:
arr.sort_by { |x| arr.count(x) }
you repeatedly traverse arr, counting frequencies of elements. It's better to construct the hash above:
h = arr.uniq.each_with_object({}) { |c,h| h[c] = arr.count(c) }
before performing the sort, then:
arr.sort_by { |x| h[x] }
As an aside, let me mention a more efficient way to construct the hash h, one which requires only a single pass through arr:
h = Hash.new(0)
arr.each { |x| h[x] += 1 }
h #=> {"l"=>1, "g"=>1, "s"=>1, "c"=>1, "r"=>1, "a"=>2, "p"=>2,
# "u"=>2, "i"=>2, "o"=>2, "t"=>2, "n"=>3, "e"=>3}
or, more succinctly:
h = arr.each_with_object(Hash.new(0)) { |x,h| h[x] += 1 }
Here h is called a counting hash:
h = Hash.new(0)
creates an empty hash whose default value is zero. This means that if h does not have a key k, h[k] will return zero. The abbreviated assignment h[c] += 1 expands to:
h[c] = h[c] + 1
and if h does not have a key c, the default value is assigned to h[c] on the right side:
h[c] = 0 + 1 #=> 1
but the next time c is encountered:
h[c] = h[c] + 1
#=> 1 + 1 => 2
Lastly, let's start over and do as much as we can with String methods:
line = ("Ignore punctuation, please :)")
s = line.strip.downcase.gsub(/./) { |c| (c =~ /[a-z]/) ? c : '' }
#=> "ignorepunctuationplease"
h = s.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
#=> {"i"=>2, "g"=>1, "n"=>3, "o"=>2, "r"=>1, "e"=>3, "p"=>2,
# "u"=>2, "c"=>1, "t"=>2, "a"=>2, "l"=>1, "s"=>1}
s.each_char.sort_by { |c| h[c] }
#=> ["l", "g", "s", "c", "r", "i", "p", "u", "a", "o", "t", "p",
# "a", "t", "i", "o", "u", "n", "n", "e", "e", "n", "e"]

How to join every X amount of characters together in an Array - Ruby

If I want to join every X amount of letters together in an array how could I implement this?
In this case I want to join every two letters together
Input: array = ["b", "i", "e", "t", "r", "o"]
Output: array = ["bi", "et", "ro"]
each_slice (docs):
arr = 'bietro'.split ''
# grab each slice of 2 elements
p arr.each_slice(2).to_a
#=> [["b", "i"], ["e", "t"], ["r", "o"]]
# map `join' over each of the slices
p arr.each_slice(2).map(&:join)
#=> ["bi", "et", "ro"]
#Doorknow shows the best way, but here are two (among many, many) other ways:
def bunch_em(arr,n)
((arr.size+n-1)/n).times.map { |i| arr.slice(i*n,n).join }
end
arr = ["b", "i", "e", "t", "r", "o"]
bunch_em(arr,2) #=> ["bi", "et", "ro"]

Resources