Why can't I sort an array of strings by `count`? - ruby

With this code:
line = ("Ignore punctuation, please :)")
string = line.strip.downcase.split(//)
string.select! {|x| /[a-z]/.match(x) }
string.sort_by!{ |x| string.count(x)}
the result is:
["r", "g", "s", "l", "c", "o", "o", "p", "u", "i", "t", "u", "a", "t", "i", "a", "p", "n", "e", "e", "n", "n", "e"]
Does sorting by count not work in this case? Why? Is there a better way to isolate the words by frequency?

By your comment, I suppose that you want to sort characters by frequency and alphabetically. When the only sort_by! criteria is string.count(x), frequency groups with the same number of characters can appear mixed with each other. To sort each group alphabetically you have to add a second criteria in the sort_by! method:
line = ("Ignore punctuation, please :)")
string = line.strip.downcase.split(//)
string.select! {|x| /[a-z]/.match(x) }
string.sort_by!{ |x| [string.count(x), x]}
Then the output will be
["c", "g", "l", "r", "s", "a", "a", "i", "i", "o", "o", "p", "p", "t", "t", "u", "u", "e", "e", "e", "n", "n", "n"]

Let's look at your code line-by-line.
line = ("Ignore punctuation, please :)")
s = line.strip.downcase
#=> "ignore punctuation, please :)"
There's no particular reason to strip here, as you will be removing spaces and punctuation later anyway.
string = s.split(//)
#=> ["i", "g", "n", "o", "r", "e", " ", "p", "u", "n", "c", "t",
# "u", "a", "t", "i", "o", "n", ",", " ", "p", "l", "e", "a",
# "s", "e", " ", ":", ")"]
You've chosen to split the sentence into characters, which is fine, but as I'll mention at the end, you could just use String methods. In any case,
string = s.chars
does the same thing and is arguably more clear. What you have now is an array named string. Isn't that a bit confusing? Let's instead call it arr:
arr = s.chars
(One often sees s and str for names of strings, a and arr for names of arrays, h and hash for names of hashes, and so on.)
arr.select! {|x| /[a-z]/.match(x) }
#=> ["i", "g", "n", "o", "r", "e", "p", "u", "n", "c", "t", "u",
# "a", "t", "i", "o", "n", "p", "l", "e", "a", "s", "e"]
Now you've eliminated all but lowercase letters. You could also write that:
arr.select! {|x| s =~ /[a-z]/ }
or
arr.select! {|x| s[/[a-z]/] }
You are now ready to sort.
arr.sort_by!{ |x| arr.count(x) }
#=> ["l", "g", "s", "c", "r", "i", "p", "u", "a", "o", "t", "p",
# "a", "t", "i", "o", "u", "n", "n", "e", "e", "n", "e"]
This is OK, but it's not good practice to be sorting an array in place and counting the frequency of its elements at the same time. Better would be:
arr1 = arr.sort_by{ |x| arr.count(x) }
which gives the same ordering. Is the resulting sorted array correct? Let's count the number of times each letter appears in the string.
I will create a hash whose keys are the unique elements of arr and whose values are the number of times the associated key appears in arr. There are a few ways to do this. A simple but not very efficient way is as follows:
h = {}
a = arr.uniq
#=> ["l", "g", "s", "c", "r", "i", "p", "u", "a", "o", "t", "n", "e"]
a.each { |c| h[c] = arr.count(c) }
h #=> {"l"=>1, "g"=>1, "s"=>1, "c"=>1, "r"=>1, "i"=>2, "p"=>2,
# "u"=>2, "a"=>2, "o"=>2, "t"=>2, "n"=>3, "e"=>3}
This would normally be written:
h = arr.uniq.each_with_object({}) { |c,h| h[c] = arr.count(c) }
The elements of h are in increasing order of value, but that's just coincidence. To ensure they are in that order (to make it easier to see the order), we would need to construct an array, sort it, then convert it to a hash:
a = arr.uniq.map { |c| [c, arr.count(c)] }
#=> [["l", 1], ["g", 1], ["s", 1], ["c", 1], ["r", 1], ["a", 2], ["p", 2],
# ["u", 2], ["i", 2], ["o", 2], ["t", 2], ["n", 3], ["e", 3]]
a = a.sort_by { |_,count| count }
#=> [["l", 1], ["g", 1], ["s", 1], ["c", 1], ["r", 1], ["a", 2], ["t", 2],
# ["u", 2], ["i", 2], ["o", 2], ["p", 2], ["n", 3], ["e", 3]]
h = Hash[a]
#=> {"l"=>1, "g"=>1, "s"=>1, "c"=>1, "r"=>1, "i"=>2, "t"=>2,
# "u"=>2, "a"=>2, "o"=>2, "p"=>2, "n"=>3, "e"=>3}
One would normally see this written:
h = Hash[arr.uniq.map { |c| [c, arr.count(c)] }.sort_by(&:last)]
or, in Ruby v2.0+:
h = arr.uniq.map { |c| [c, arr.count(c)] }.sort_by(&:last).to_h
Note that, prior to Ruby 1.9, there was no concept of key ordering in hashes.
The values of h's key-value pairs show that your sort is correct. It is not, however, very efficient. That's because in:
arr.sort_by { |x| arr.count(x) }
you repeatedly traverse arr, counting frequencies of elements. It's better to construct the hash above:
h = arr.uniq.each_with_object({}) { |c,h| h[c] = arr.count(c) }
before performing the sort, then:
arr.sort_by { |x| h[x] }
As an aside, let me mention a more efficient way to construct the hash h, one which requires only a single pass through arr:
h = Hash.new(0)
arr.each { |x| h[x] += 1 }
h #=> {"l"=>1, "g"=>1, "s"=>1, "c"=>1, "r"=>1, "a"=>2, "p"=>2,
# "u"=>2, "i"=>2, "o"=>2, "t"=>2, "n"=>3, "e"=>3}
or, more succinctly:
h = arr.each_with_object(Hash.new(0)) { |x,h| h[x] += 1 }
Here h is called a counting hash:
h = Hash.new(0)
creates an empty hash whose default value is zero. This means that if h does not have a key k, h[k] will return zero. The abbreviated assignment h[c] += 1 expands to:
h[c] = h[c] + 1
and if h does not have a key c, the default value is assigned to h[c] on the right side:
h[c] = 0 + 1 #=> 1
but the next time c is encountered:
h[c] = h[c] + 1
#=> 1 + 1 => 2
Lastly, let's start over and do as much as we can with String methods:
line = ("Ignore punctuation, please :)")
s = line.strip.downcase.gsub(/./) { |c| (c =~ /[a-z]/) ? c : '' }
#=> "ignorepunctuationplease"
h = s.each_char.with_object(Hash.new(0)) { |c,h| h[c] += 1 }
#=> {"i"=>2, "g"=>1, "n"=>3, "o"=>2, "r"=>1, "e"=>3, "p"=>2,
# "u"=>2, "c"=>1, "t"=>2, "a"=>2, "l"=>1, "s"=>1}
s.each_char.sort_by { |c| h[c] }
#=> ["l", "g", "s", "c", "r", "i", "p", "u", "a", "o", "t", "p",
# "a", "t", "i", "o", "u", "n", "n", "e", "e", "n", "e"]

Related

How to find two elements of the same array that contain all vowels

I want to iterate a given array, for example:
["goat", "action", "tear", "impromptu", "tired", "europe"]
I want to look at all possible pairs.
The desired output is a new array, which contains all pairs, that combined contain all vowels. Also those pairs should be concatenated as one element of the output array:
["action europe", "tear impromptu"]
I tried the following code, but got an error message:
No implicit conversion of nil into string.
def all_vowel_pairs(words)
pairs = []
(0..words.length).each do |i| # iterate through words
(0..words.length).each do |j| # for every word, iterate through words again
pot_pair = words[i].to_s + words[j] # build string from pair
if check_for_vowels(pot_pair) # throw string to helper-method.
pairs << words[i] + " " + words[j] # if gets back true, concatenade and push to output array "pairs"
end
end
end
pairs
end
# helper-method to check for if a string has all vowels in it
def check_for_vowels(string)
vowels = "aeiou"
founds = []
string.each_char do |char|
if vowels.include?(char) && !founds.include?(char)
founds << char
end
end
if founds.length == 5
return true
end
false
end
The following code is intended to provide an efficient way to construct the desired array when the number of words is large. Note that, unlike the other answers, it does not make use of the method Array#combination.
The first part of the section Explanation (below) provides an overview of the approach taken by the algorithm. The details are then filled in.
Code
require 'set'
VOWELS = ["a", "e", "i", "o", "u"]
VOWELS_SET = VOWELS.to_set
def all_vowel_pairs(words)
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
h.each_with_object([]) do |(k,v),a|
vowels_needed = VOWELS_SET-k
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
end
end
Example
words = ["goat", "action", "tear", "impromptu", "tired", "europe", "hear"]
all_vowel_pairs(words)
#=> ["action europe", "hear impromptu", "impromptu tear"]
Explanation
For the given example the steps are as follows.
VOWELS_SET = VOWELS.to_set
#=> #<Set: {"a", "e", "i", "o", "u"}>
h = words.each_with_object({}) {|w,h| (h[(w.chars & VOWELS).to_set] ||= []) << w}
#=> {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# #<Set: {"e", "a"}>=>["tear", "hear"],
# #<Set: {"i", "o", "u"}>=>["impromptu"],
# #<Set: {"i", "e"}>=>["tired"],
# #<Set: {"e", "u", "o"}>=>["europe"]}
It is seen that the keys of h are subsets of the five vowels. The values are arrays of elements of words (words) that contain the vowels given by the key and no others. The values therefore collectively form a partition of words. When the number of words is large one would expect h to have 31 keys (2**5 - 1).
We now loop through the key-value pairs of h. For each, with key k and value v, the set of missing vowels (vowels_needed) is determined, then we loop through those keys-value pairs [kk, vv] of h for which kk is a superset of vowels_needed. All combinations of elements of v and vv are then added to the array being returned (after an adjustment to avoid double-counting each pair of words).
Continuing,
enum = h.each_with_object([])
#=> #<Enumerator: {#<Set: {"o", "a"}>=>["goat"],
# #<Set: {"a", "i", "o"}>=>["action"],
# ...
# #<Set: {"e", "u", "o"}>=>["europe"]}:
# each_with_object([])>
The first value is generated by enum and passed to the block, and the block variables are assigned values:
(k,v), a = enum.next
#=> [[#<Set: {"o", "a"}>, ["goat"]], []]
See Enumerator#next.
The individual variables are assigned values by array decomposition:
k #=> #<Set: {"o", "a"}>
v #=> ["goat"]
a #=> []
The block calculations are now performed.
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "i", "u"}>
h.each do |kk,vv|
next unless kk.superset?(vowels_needed)
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
end
The word "goat" (v) has vowels "o" and "a", so it can only be matched with words that contain vowels "e", "i" and "u" (and possibly "o" and/or "a"). The expression
next unless kk.superset?(vowels_needed)
skips those keys of h (kk) that are not supersets of vowels_needed. See Set#superset?.
None of the words in words contain "e", "i" and "u" so the array a is unchanged.
The next element is now generated by enum, passed to the block and the block variables are assigned values:
(k,v), a = enum.next
#=> [[#<Set: {"a", "i", "o"}>, ["action"]], []]
k #=> #<Set: {"a", "i", "o"}>
v #=> ["action"]
a #=> []
The block calculation begins:
vowels_needed = VOWELS_SET-k
#=> #<Set: {"e", "u"}>
We see that h has only one key-value pair for which the key is a superset of vowels_needed:
kk = %w|e u o|.to_set
#=> #<Set: {"e", "u", "o"}>
vv = ["europe"]
We therefore execute:
v.each {|w1| vv.each {|w2| a << "%s %s" % [w1, w2] if w1 < w2}}
which adds one element to a:
a #=> ["action europe"]
The clause if w1 < w2 is to ensure that later in the calculations "europe action" is not added to a.
If v (words containing 'a', 'i' and 'u') and vv (words containing 'e', 'u' and 'o') had instead been:
v #=> ["action", "notification"]
vv #=> ["europe", "route"]
we would have added "action europe", "action route" and "notification route" to a. (”europe notification” would be added later, when k #=> #<Set: {"e", "u", "o"}.)
Benchmark
I benchmarked my method against others suggested using #theTinMan's Fruity benchmark code. The only differences were in the array of words to be tested and the addition of my method to the benchmark, which I named cary. For the array of words to be considered I selected 600 words at random from a file of English words on my computer:
words = IO.readlines('/usr/share/dict/words', chomp: true).sample(600)
words.first 10
#=> ["posadaship", "explosively", "expensilation", "conservatively", "plaiting",
# "unpillared", "intertwinement", "nonsolidified", "uraemic", "underspend"]
This array was found to contain 46,436 pairs of words containing all five vowels.
The results were as shown below.
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
_cary { cary(words) }
}
Running each test once. Test will take about 44 seconds.
_cary is faster than _ttm3 by 5x ± 0.1
_ttm3 is faster than _viktor by 50.0% ± 1.0%
_viktor is faster than _ttm2 by 30.000000000000004% ± 1.0%
_ttm2 is faster than _ttm1 by 2.4x ± 0.1
I then compared cary with ttm3 for 1,000 randomly selected words. This array was found to contain 125,068 pairs of words containing all five vowels. That result was as follows:
Running each test once. Test will take about 19 seconds.
_cary is faster than _ttm3 by 3x ± 1.0
To get a feel for the variability of the benchmark I ran this last comparison twice more, each with a new random selection of 1,000 words. That gave me the following results:
Running each test once. Test will take about 17 seconds.
_cary is faster than _ttm3 by 5x ± 1.0
Running each test once. Test will take about 18 seconds.
_cary is faster than _ttm3 by 4x ± 1.0
It is seen the there is considerable variation among the samples.
You said pairs so I assume it's a combination of two elements. I've made a combination of each two elements in the array using the #combination method. Then I #select-ed only those pairs that contain all vowels once they're joined. Finally, I made sure to join those pairs :
["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }
#=> ["action europe", "tear impromptu"]
The regex is from "What is the regex to match the words containing all the vowels?".
Starting similarly to Viktor's, I'd use a simple test to see what vowels exist in the words and compare to whether they match "aeiou" after stripping duplicates and sorting them:
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
Breaking it down so you can see what's happening.
["goat", "action", "tear", "impromptu", "tired", "europe"] # => ["goat", "action", "tear", "impromptu", "tired", "europe"]
.combination(2)
.select { |a| a # => ["goat", "action"], ["goat", "tear"], ["goat", "impromptu"], ["goat", "tired"], ["goat", "europe"], ["action", "tear"], ["action", "impromptu"], ["action", "tired"], ["action", "europe"], ["tear", "impromptu"], ["tear", "tired"], ["tear", "europe"], ["impromptu", "tired"], ["impromptu", "europe"], ["tired", "europe"]
.join # => "goataction", "goattear", "goatimpromptu", "goattired", "goateurope", "actiontear", "actionimpromptu", "actiontired", "actioneurope", "tearimpromptu", "teartired", "teareurope", "impromptutired", "impromptueurope", "tiredeurope"
.scan(/[aeiou]/) # => ["o", "a", "a", "i", "o"], ["o", "a", "e", "a"], ["o", "a", "i", "o", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u", "o", "e"], ["a", "i", "o", "e", "a"], ["a", "i", "o", "i", "o", "u"], ["a", "i", "o", "i", "e"], ["a", "i", "o", "e", "u", "o", "e"], ["e", "a", "i", "o", "u"], ["e", "a", "i", "e"], ["e", "a", "e", "u", "o", "e"], ["i", "o", "u", "i", "e"], ["i", "o", "u", "e", "u", "o", "e"], ["i", "e", "e", "u", "o", "e"]
.uniq # => ["o", "a", "i"], ["o", "a", "e"], ["o", "a", "i", "u"], ["o", "a", "i", "e"], ["o", "a", "e", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "u"], ["a", "i", "o", "e"], ["a", "i", "o", "e", "u"], ["e", "a", "i", "o", "u"], ["e", "a", "i"], ["e", "a", "u", "o"], ["i", "o", "u", "e"], ["i", "o", "u", "e"], ["i", "e", "u", "o"]
.sort # => ["a", "i", "o"], ["a", "e", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "o", "u"], ["a", "e", "i", "o"], ["a", "i", "o", "u"], ["a", "e", "i", "o"], ["a", "e", "i", "o", "u"], ["a", "e", "i", "o", "u"], ["a", "e", "i"], ["a", "e", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"], ["e", "i", "o", "u"]
.join == 'aeiou' # => false, false, false, false, false, false, false, false, true, true, false, false, false, false, false
} # => [["action", "europe"], ["tear", "impromptu"]]
Looking at the code it was jumping through hoops to find whether all the vowels exist. Every time it checked it had to step through many methods before determining whether all the vowels were found; In other words it couldn't short-circuit and fail until the very end which isn't good.
This code will:
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
But I don't like using the regular expression engine this way as it's slower than doing a direct lookup, which lead to:
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
Here's the benchmark:
require 'fruity'
words = ["goat", "action", "tear", "impromptu", "tired", "europe"]
def viktor(ary)
ary.combination(2)
.select { |c| c.join('') =~ /\b(?=\w*?a)(?=\w*?e)(?=\w*?i)(?=\w*?o)(?=\w*?u)[a-zA-Z]+\b/ }
.map{ |w| w.join(' ') }
end
viktor(words) # => ["action europe", "tear impromptu"]
def ttm1(ary)
ary.combination(2).select { |a|
a.join.scan(/[aeiou]/).uniq.sort.join == 'aeiou'
}.map { |a| a.join(' ') }
end
ttm1(words) # => ["action europe", "tear impromptu"]
def ttm2(ary)
ary.combination(2).select { |a|
str = a.join
str[/a/] && str[/e/] && str[/i/] && str[/o/] && str[/u/]
}.map { |a| a.join(' ') }
end
ttm2(words) # => ["action europe", "tear impromptu"]
def ttm3(ary)
ary.combination(2).select { |a|
str = a.join
str['a'] && str['e'] && str['i'] && str['o'] && str['u']
}.map { |a| a.join(' ') }
end
ttm3(words) # => ["action europe", "tear impromptu"]
compare {
_viktor { viktor(words) }
_ttm1 { ttm1(words) }
_ttm2 { ttm2(words) }
_ttm3 { ttm3(words) }
}
With the results:
# >> Running each test 256 times. Test will take about 1 second.
# >> _ttm3 is similar to _viktor
# >> _viktor is similar to _ttm2
# >> _ttm2 is faster than _ttm1 by 2x ± 0.1
Now, because this looks so much like a homework assignment, it's important to understand that schools are aware of Stack Overflow, and they look for students asking for help, so you probably don't want to reuse this code, especially not verbatim.
Your code contains two errors, one of which is causing the error message.
(0..words.length) loops from 0 to 6 . words[6] however does not exist (arrays are zero-based), so you get nil. Replacing by (0..words.length-1) (twice) should take care of that.
You will get every correct result twice, once as "action europe" and once as "europe action". This is caused by looping too much, going two times over every combination. Replace the second loop from (0..words.length-1) to (i..words.length-1).
This cumbersome bookkeeping of indexes is boring and leads to mistakes very often. This is why Ruby programmers often prefer more hassle-free methods (like combination as in other answers), avoiding indexes altogether.

How do I flatten a nested hash, recursively, into an array of arrays with a specific format?

I have a nested hash that looks something like this:
{
'a' => {
'b' => ['c'],
'd' => {
'e' => ['f'],
'g' => ['h', 'i', 'j', 'k']
},
'l' => ['m', 'n', 'o', 'p']
},
'q' => {
'r' => ['s']
}
}
The hash can have even more nesting, but the values of the last level are always arrays.
I would like to "flatten" the hash into a format where I get a an array of arrays representing all keys and values that makes up an entire "path/branch" of the nested hash all they way from lowest level value to the top of the hash. So kind of like traversing up through the "tree" starting from the bottom while collecting keys and values on the way.
The output of that for the particular hash should be:
[
['a', 'b', 'c'],
['a', 'd', 'e', 'f'],
['a', 'd', 'g', 'h', 'i', 'j', 'k'],
['a', 'l', 'm', 'n', 'o', 'p'],
['q', 'r', 's']
]
I tried many different things, but nothing worked so far. Again keep in mind that more levels than these might occur, so the solution has to be generic.
Note: the order of the arrays and the order of the elements in them is not important.
I did the following, but it's not really working:
tree_def = {
'a' => {
'b' => ['c'],
'd' => {
'e' => ['f'],
'g' => ['h', 'i', 'j', 'k']
},
'l' => ['m', 'n', 'o', 'p']
},
'q' => {
'r' => ['s']
}
}
branches = [[]]
collect_branches = lambda do |tree, current_branch|
tree.each do |key, hash_or_values|
current_branch.push(key)
if hash_or_values.kind_of?(Hash)
collect_branches.call(hash_or_values, branches.last)
else # Reached lowest level in dependency tree (which is always an array)
# Add a new branch
branches.push(current_branch.clone)
current_branch.push(*hash_or_values)
current_branch = branches.last
end
end
end
collect_branches.call(tree_def, branches[0])
branches #=> wrong result
As hinted at in the comments:
Looks pretty straightforward. Descend into hashes recursively, taking note of keys you visited in this branch. When you see an array, no need to recurse further. Append it to the list of keys and return
Tracking is easy, just pass the temp state down to recursive calls in arguments.
I meant something like this:
def tree_flatten(tree, path = [], &block)
case tree
when Array
block.call(path + tree)
else
tree.each do |key, sub_tree|
tree_flatten(sub_tree, path + [key], &block)
end
end
end
tree_flatten(tree_def) do |path|
p path
end
This code simply prints each flattened path as it gets one, but you can store it in an array too. Or even modify tree_flatten to return you a ready array, instead of yielding elements one by one.
You can do it like that:
def flat_hash(h)
return [h] unless h.kind_of?(Hash)
h.map{|k,v| flat_hash(v).map{|e| e.unshift(k)} }.flatten(1)
end
input = {
'a' => {
'b' => ['c'],
'd' => {
'e' => ['f'],
'g' => ['h', 'i', 'j', 'k']
},
'l' => ['m', 'n', 'o', 'p']
},
'q' => {
'r' => ['s']
}
}
p flat_hash(input)
The output will be:
[
["a", "b", "c"],
["a", "d", "e", "f"],
["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"],
["q", "r", "s"]
]
This of course calls for a recursive solution. The following method does not mutate the original hash.
Code
def recurse(h)
h.each_with_object([]) do |(k,v),arr|
v.is_a?(Hash) ? recurse(v).each { |a| arr << [k,*a] } : arr << [k,*v]
end
end
Example
h = { 'a'=>{ 'b'=>['c'],
'd'=>{ 'e'=>['f'], 'g' => ['h', 'i', 'j', 'k'] },
'l' => ['m', 'n', 'o', 'p'] },
'q'=>{ 'r'=>['s'] } }
recurse h
#=> [["a", "b", "c"],
# ["a", "d", "e", "f"],
# ["a", "d", "g", "h", "i", "j", "k"],
# ["a", "l", "m", "n", "o", "p"],
# ["q", "r", "s"]]
Explanation
The operations performed by recursive methods are always difficult to explain. In my experience the best way is to salt the code with puts statements. However, that in itself is not enough because when viewing output it is difficult to keep track of the level of recursion at which particular results are obtained and either passed to itself or returned to a version of itself. The solution to that is to indent and un-indent results, which is what I've done below. Note the way I've structured the code and the few helper methods I use are fairly general-purpose, so this approach can be adapted to examine the operations performed by other recursive methods.
INDENT = 8
def indent; #col += INDENT; end
def undent; #col -= INDENT; end
def pu(s); print " "*#col; puts s; end
def puhline; pu('-'*(70-#col)); end
#col = -INDENT
def recurse(h)
begin
indent
puhline
pu "passed h = #{h}"
h.each_with_object([]) do |(k,v),arr|
pu " k = #{k}, v=#{v}, arr=#{arr}"
if v.is_a?(Hash)
pu " calling recurse(#{v})"
ar = recurse(v)
pu " return value=#{ar}"
pu " calculating recurse(v).each { |a| arr << [k,*a] }"
ar.each do |a|
pu " a=#{a}"
pu " [k, *a] = #{[k,*a]}"
arr << [k,*a]
end
else
pu " arr << #{[k,*v]}"
arr << [k,*v]
end
pu "arr = #{arr}"
end.tap { |a| pu "returning=#{a}" }
ensure
puhline
undent
end
end
recurse h
----------------------------------------------------------------------
passed h = {"a"=>{"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]}, "q"=>{"r"=>["s"]}}
k = a, v={"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]}, arr=[]
calling recurse({"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]})
--------------------------------------------------------------
passed h = {"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]},
"l"=>["m", "n", "o", "p"]}
k = b, v=["c"], arr=[]
arr << ["b", "c"]
arr = [["b", "c"]]
k = d, v={"e"=>["f"], "g"=>["h", "i", "j", "k"]}, arr=[["b", "c"]]
calling recurse({"e"=>["f"], "g"=>["h", "i", "j", "k"]})
------------------------------------------------------
passed h = {"e"=>["f"], "g"=>["h", "i", "j", "k"]}
k = e, v=["f"], arr=[]
arr << ["e", "f"]
arr = [["e", "f"]]
k = g, v=["h", "i", "j", "k"], arr=[["e", "f"]]
arr << ["g", "h", "i", "j", "k"]
arr = [["e", "f"], ["g", "h", "i", "j", "k"]]
returning=[["e", "f"], ["g", "h", "i", "j", "k"]]
------------------------------------------------------
return value=[["e", "f"], ["g", "h", "i", "j", "k"]]
calculating recurse(v).each { |a| arr << [k,*a] }
a=["e", "f"]
[k, *a] = ["d", "e", "f"]
a=["g", "h", "i", "j", "k"]
[k, *a] = ["d", "g", "h", "i", "j", "k"]
arr = [["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"]]
k = l, v=["m", "n", "o", "p"],
arr=[["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"]]
arr << ["l", "m", "n", "o", "p"]
arr = [["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"],
["l", "m", "n", "o", "p"]]
returning=[["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"],
["l", "m", "n", "o", "p"]]
--------------------------------------------------------------
return value=[["b", "c"], ["d", "e", "f"], ["d", "g", "h", "i", "j", "k"],
["l", "m", "n", "o", "p"]]
calculating recurse(v).each { |a| arr << [k,*a] }
a=["b", "c"]
[k, *a] = ["a", "b", "c"]
a=["d", "e", "f"]
[k, *a] = ["a", "d", "e", "f"]
a=["d", "g", "h", "i", "j", "k"]
[k, *a] = ["a", "d", "g", "h", "i", "j", "k"]
a=["l", "m", "n", "o", "p"]
[k, *a] = ["a", "l", "m", "n", "o", "p"]
arr = [["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"]]
k = q, v={"r"=>["s"]}, arr=[["a", "b", "c"], ["a", "d", "e", "f"],
["a", "d", "g", "h", "i", "j", "k"], ["a", "l", "m", "n", "o", "p"]]
calling recurse({"r"=>["s"]})
--------------------------------------------------------------
passed h = {"r"=>["s"]}
k = r, v=["s"], arr=[]
arr << ["r", "s"]
arr = [["r", "s"]]
returning=[["r", "s"]]
--------------------------------------------------------------
return value=[["r", "s"]]
----------------------------------------------------------------------
calculating recurse(v).each { |a| arr << [k,*a] }
a=["r", "s"]
[k, *a] = ["q", "r", "s"]
arr = [["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"], ["q", "r", "s"]]
returning=[["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"], ["q", "r", "s"]]
----------------------------------------------------------------------
#=> [["a", "b", "c"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"],
# ["a", "l", "m", "n", "o", "p"], ["q", "r", "s"]]
This will return an Array with all the paths.
def paths(element, path = [], accu = [])
case element
when Hash
element.each do |key, value|
paths(value, path + [key], accu)
end
when Array
accu << (path + element)
end
accu
end
For nicer printing you can do
paths(tree_def).map { |path| path.join(".") }
See following which will keep calling recursively till it reaches to array values.
This recursion call will go with multiple branches and op should be individual copy for each branch so I used string which is always created as a new object here otherwise array will be like going with call by reference
hash = {"a"=>{"b"=>["c"], "d"=>{"e"=>["f"], "g"=>["h", "i", "j", "k"]}, "l"=>["m", "n", "o", "p"]}, "q"=>{"r"=>["s"]}}
#output = []
def nested_array(h, op='')
h.map do |k,v|
if Hash === v
nested_array(v, op+k)
else
#output << (op+k+v.join).chars
end
end
end
nested_array(hash)
#output will be your desired array.
[
["a", "b", "c"],
["a", "d", "e", "f"],
["a", "d", "g", "h", "i", "j", "k"],
["a", "l", "m", "n", "o", "p"],
["q", "r", "s"]
]
update: key values pair can be more than single character so following approach for nested_array may work better.
def nested_array(h, op=[])
h.map do |k,v|
if Hash === v
nested_array(v, Array.new(op) << k)
else
#output << ((Array.new(op) << k) + v)
end
end
end
All the solutions here are recursive, below is a non-recursive
solution.
def flatten(input)
sol = []
while(input.length > 0)
unprocessed_input = []
input.each do |l, r|
if r.is_a?(Array)
sol << l + r
else
r.each { |k, v| unprocessed_input << [l + [k], v] }
end
end
input = unprocessed_input
end
return sol
end
flatten([[[], h]])
Code Explanation:
Hash in array form is [[k1, v1], [k2, v2]].
When input_hash is presented in the above form, [[], { a: {..} }], partial_solutions of this form, [ [a], {..} ], can be generated. Index '0' holds the partial solution and Index '1' holds the yet to be processed input.
As this format is easy to map partial_solution with unprocessed input and accumulate unprocessed input, converting input_hash to this format result in, [[[], input_hash]]
Solution:
[["a", "b", "c"], ["a", "l", "m", "n", "o", "p"], ["q", "r", "s"], ["a", "d", "e", "f"], ["a", "d", "g", "h", "i", "j", "k"]]

no implicit conversion of String into Integer, simple ruby function not working

When I run this code I get a typeError, but when I do it by hand in the IRB everything seems to be working out okay. I believe the problem lies somewhere in my IF statement but I don't know how to fix it.
numerals = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
def convertToNumbers(string)
arr = string.downcase.split('')
new_array = []
arr.each do |i|
if (arr[i] =~ [a-z])
numValue = numerals.index(arr[i]).to_s
new_array.push(numValue)
end
end
end
You probably meant
arr[i] =~ /[a-z]/
which matches the characters a through z. What you wrote
arr[i] =~ [a-z]
is constructing an array and trying to compare it using the regex comparison operator, which is a type error (assuming variables a and z are defined).
A few issues. As Tyler pointed out inside of the loop you are still referencing arr when you look to only need to use i. Also, the regex issue Max pointed out is valid as well. The function also will return arr and not the new_array array as that is the result of the for loop output.
I made a few modifications.
def convertToNumbers(string)
numerals = ["a", "b", "c", "d", "e", "f", "g", "h", "i", "j", "k", "l", "m", "n", "o", "p", "q", "r", "s", "t", "u", "v", "w", "x", "y", "z"]
arr = string.downcase.split('')
new_array = []
arr.each do |i|
if (i =~ /[a-z]/)
numValue = numerals.index(i).to_s
new_array.push(numValue)
end
end
new_array.join
end
puts convertToNumbers('abcd');
which prints out '0123'

How to match bar, b-a-r, b--a--r etc in a string by Regexp

Given a string, I want to find a word bar, b-a-r, b--a--r etc. where - can be any letter. But interval between letters must be the same.
All letters are lower case and there is no gap betweens.
For example bar, beayr, qbowarprr, wbxxxxxayyyyyrzzz should match this.
I tried /b[a-z]*a[a-z]*r/ but this matches bxar which is wrong.
I am wondering if I achieve this with regexp?
Here's is one way to get all matches.
Code
def all_matches_with_spacers(word, str)
word_size = word.size
word_arr = word.chars
str_arr = str.chars
(0..(str.size - word_size)/(word_size-1)).each_with_object([]) do |n, arr|
regex = Regexp.new(word_arr.join(".{#{n}}"))
str_arr.each_cons(word_size + n * (word_size - 1))
.map(&:join)
.each { |substring| arr << substring if substring =~ regex }
end
end
This requires word.size > 1.
Example
all_matches_with_spacers('bar', 'bar') #=> ["bar"]
all_matches_with_spacers('bar', 'beayr') #=> ["beayr"]
all_matches_with_spacers('bar', 'qbowarprr') #=> ["bowarpr"]
all_matches_with_spacers('bar', 'wbxxxxxayyyyyrzzz') #=> ["bxxxxxayyyyyr"]
all_matches_with_spacers('bobo', 'bobobocbcbocbcobcodbddoddbddobddoddbddob')
#=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
Explanation
Suppose
word = 'bobo'
str = 'bobobocbcbocbcobcodbddoddbddobddoddbddob'
then
word_size = word.size #=> 4
word_arr = word.chars #=> ["b", "o", "b", "o"]
str_arr = str.chars
#=> ["b", "o", "b", "o", "b", "o", "c", "b", "c", "b", "o", "c", "b", "c",
# "o", "b", "c", "o", "d", "b", "d", "d", "o", "d", "d", "b", "d", "d",
# "o", "b", "d", "d", "o", "d", "d", "b", "d", "d", "o", "b"]
If n is the number of spacers between each letter of word, we require
word.size + n * (word.size - 1) <= str.size
Hence (since str.size => 40),
n <= (str.size - word_size)/(word_size-1) #=> (40-4)/(4-1) => 12
We therefore will iterate over zero to 12 spacers:
(0..12).each_with_object([]) do |n, arr| .. end
Enumerable#each_with_object creates an initially-empty array denoted by the block variable arr. The first value passed to block is zero (spacers), assigned to the block variable n.
We then have
regex = Regexp.new(word_arr.join(".{#{0}}")) #=> /b.{0}o.{0}b.{0}o/
which is the same as /bar/. word with n spacers has length
word_size + n * (word_size - 1) #=> 19
To extract all sub-arrays of str_arr with this length, we invoke:
str_arr.each_cons(word_size + n * (word_size - 1))
Here, with n = 0, this is:
enum = str_arr.each_cons(4)
#=> #<Enumerator: ["b", "o", "b", "o", "b", "o",...,"b"]:each_cons(4)>
This enumerator will pass the following into its block:
enum.to_a
#=> [["b", "o", "b", "o"], ["o", "b", "o", "b"], ["b", "o", "b", "o"],
# ["o", "b", "o", "c"], ["b", "o", "c", "b"], ["o", "c", "b", "c"],
# ["c", "b", "c", "b"], ["b", "c", "b", "o"], ["c", "b", "o", "c"],
# ["b", "o", "c", "b"], ["o", "c", "b", "c"], ["c", "b", "c", "o"],
# ["b", "c", "o", "b"], ["c", "o", "b", "c"], ["o", "b", "c", "o"]]
We next convert these to strings:
ar = enum.map(&:join)
#=> ["bobo", "obob", "bobo", "oboc", "bocb", "ocbc", "cbcb", "bcbo",
# "cboc", "bocb", "ocbc", "cbco", "bcob", "cobc", "obco"]
and add each (assigned to the block variable substring) to the array arr for which:
substring =~ regex
ar.each { |substring| arr << substring if substring =~ regex }
arr => ["bobo", "bobo"]
Next we increment the number of spacers to n = 1. This has the following effect:
regex = Regexp.new(word_arr.join(".{#{1}}")) #=> /b.{1}o.{1}b.{1}o/
str_arr.each_cons(4 + 1 * (4 - 1)) #=> str_arr.each_cons(7)
so we now examine the strings
ar = str_arr.each_cons(7).map(&:join)
#=> ["boboboc", "obobocb", "bobocbc", "obocbcb", "bocbcbo", "ocbcboc",
# "cbcbocb", "bcbocbc", "cbocbco", "bocbcob", "ocbcobc", "cbcobco",
# "bcobcod", "cobcodb", "obcodbd", "bcodbdd", "codbddo", "odbddod",
# "dbddodd", "bddoddb", "ddoddbd", "doddbdd", "oddbddo", "ddbddob",
# "dbddobd", "bddobdd", "ddobddo", "dobddod", "obddodd", "bddoddb",
# "ddoddbd", "doddbdd", "oddbddo", "ddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
There are no matches with one spacer, so arr remains unchanged:
arr #=> ["bobo", "bobo"]
For n = 2 spacers:
regex = Regexp.new(word_arr.join(".{#{2}}")) #=> /b.{2}o.{2}b.{2}o/
str_arr.each_cons(4 + 2 * (4 - 1)) #=> str_arr.each_cons(10)
ar = str_arr.each_cons(10).map(&:join)
#=> ["bobobocbcb", "obobocbcbo", "bobocbcboc", "obocbcbocb", "bocbcbocbc",
# "ocbcbocbco", "cbcbocbcob", "bcbocbcobc", "cbocbcobco", "bocbcobcod",
# ...
# "ddoddbddob"]
ar.each { |substring| arr << substring if substring =~ regex }
arr #=> ["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
No matches are found for more than two spacers, so the method returns
["bobo", "bobo", "bddoddbddo", "bddoddbddo"]
For reference, there is a beautiful solution to the overall problem that is available in regex flavors that allow a capturing group to refer to itself:
^[^b]*bar|b(?:[^a](?=[^a]*a(\1?+.)))+a\1r
Sadly, Ruby doesn't allow this.
The interesting bit is on the right side of the alternation. After matching the initial b, we define a non-capturing group for the characters between b and a. This group will be repeated with the +. Between the a and r, we will inject capture group 1 with \1`. This group was captured one character at a time, overwriting itself with each pass, as each character between b and a was added.
See Quantifier Capture where the solution was demonstrated by #CasimiretHippolyte who refers to the idea behind the technique the "qtax trick".

Returning a list of indices where a certain object appears in a nested array

I am trying to figure out how to form an array that collects every index of a particular object (in this case a single letter) where it appears in a nested set of arrays. For instance, using the array set below,
boggle_board = [["P", "P", "X", "A"],
["V", "F", "S", "Z"],
["O", "P", "W", "N"],
["D", "H", "L", "E"]]
I would expect something like boggle_board.include?("P") to return a nested array of indices [[0,0][0,1],[2,1]]. Any ideas on how to do this?
Nothing super-elegant comes to mind for me right now. This seems to work:
def indices_of(board, letter)
indices = []
board.each_with_index do |ar, i|
ar.each_with_index do |s, j|
indices.push([i, j]) if s == letter
end
end
indices
end
boggle_board = [["P", "P", "X", "A"],
["V", "F", "S", "Z"],
["O", "P", "W", "N"],
["D", "H", "L", "E"]]
indices_of(boggle_board, "P")
# => [[0, 0], [0, 1], [2, 1]]
I would use Matrix#each_with_index.The below code is more Rubyistic:
require "matrix"
m = Matrix[["P", "P", "X", "A"],
["V", "F", "S", "Z"],
["O", "P", "W", "N"],
["D", "H", "L", "E"]]
ar = []
m.each_with_index {|e, row, col| ar << [row,col] if e == "P"}
ar #=> [[0, 0], [0, 1], [2, 1]]

Resources