Anagrams Code Kata, Ruby Solution very slow - ruby

I've been having a play with Ruby recently and I've just completed the Anagrams Code Kata from http://codekata.pragprog.com.
The solution was test driven and utilises the unique prime factorisation theorem, however it seems to run incredibly slow. Just on the 45k file it's been running for about 10 minutes so far. Can anyone give me any pointers on improving the performance of my code?
class AnagramFinder
def initialize
#words = self.LoadWordsFromFile("dict45k.txt")
end
def OutputAnagrams
hash = self.CalculatePrimeValueHash
#words.each_index{|i|
word = #words[i]
wordvalue = hash[i]
matches = hash.select{|key,value| value == wordvalue}
if(matches.length > 1)
puts("--------------")
matches.each{|key,value|
puts(#words[key])
}
end
}
end
def CalculatePrimeValueHash
hash = Hash.new
#words.each_index{|i|
word = #words[i]
value = self.CalculatePrimeWordValue(word)
hash[i] = value
}
hash
end
def CalculatePrimeWordValue(word)
total = 1
hash = self.GetPrimeAlphabetHash
word.downcase.each_char {|c|
value = hash[c]
total = total * value
}
total
end
def LoadWordsFromFile(filename)
contentsArray = []
f = File.open(filename)
f.each_line {|line|
line = line.gsub(/[^a-z]/i, '')
contentsArray.push line
}
contentsArray
end
def GetPrimeAlphabetHash
hash = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
end
end

Frederick Cheung has a few good points, but I thought I might provide you with a few descriptive examples.
I think your main problem is that you create your index in a way that forces you to do linear searches in it.
Your word list (#words) seems to look something like this:
[
"ink",
"foo",
"kin"
]
That is, it is just an array of words.
Then you create your hash index with CalculatePrimeValueHash, with hash keys being equal to the word's index in #words.
{
0 => 30659, # 23 * 43 * 31, matching "ink"
1 => 28717, # 13 * 47 * 47, matching "foo"
2 => 30659 # 31 * 23 * 43, matching "kin"
}
I would consider this a good start, but the thing is if you keep it like this, you will have to iterate through the hash to find what hash keys (i.e. indexes in #words) that belong together, and then iterate through those to join them. That is, the basic problem here is that you do things too granularly.
If you instead were to build this hash with the prime values as hash keys, and have them point to an array of the words with that key, you would get a hash index like this instead:
{
30659 => ["ink", "kin"],
28717 => ["foo"]
}
With this kind of structure, the only thing you have to do to write your output, is to just iterate over the hash values and print them, since they are already grouped.
Another thing with your code, is that it seems to generate a whole bunch of throwaway objects , which will make sure to keep your garbarge collector busy, and that is generally quite a big choke point in ruby.
It might also be a good thing to go find either a benchmark tool and/or a profiler to analyze your code and see where it could be approved upon.

Fundamentally your code is slow because for each word (45k) of them you iterate over the entire hash (45k of them) looking for words with the same signature, so you're doing 45k * 45k of these comparisons. Another way of phrasing that is to say that your complexity is n^2 in the number of words.
The code below implements your basic idea but runs in a few seconds on the 236k word file I happen to have lying around. It could definitely be faster - the second pass over the data to find the things with > 1 items could be eliminated but would be less readable
It's also a lot shorter than your code, around a third, while staying readable, largely because I used more standard library functions and idiomatic ruby.
For example, the load_words method uses collect to turn one array into another, rather than iterating over one array and adding things to a second one. Similarly the signature function uses inject rather than iterating over the characters. Lastly I've used group_by to do the actual grouping. All of these methods happen to be in Enumerable - it's well worth becoming very familiar with these.
signature_for_word could become even pithier with
word.each_char.map {|c| CHAR_MAP[c.downcase]}.reduce(:*)
This takes the word, splits it into characters and then maps each one of those to the right number. reduce(:*) (reduce is an alias for inject) then multiplies them all together.
class AnagramFinder
CHAR_MAP ={ "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }
def initialize
#words = load_words("/usr/share/dict/words")
end
def find_anagrams
words_by_signature = #words.group_by {|word| signature_for_word word}
words_by_signature.each do |signaure, words|
if words.length > 1
puts '----'
puts words.join('; ')
end
end
end
def signature_for_word(word)
word.downcase.each_char.inject(1) {| total, c| total * CHAR_MAP[c]}
end
def load_words(filename)
File.readlines(filename).collect {|line| line.gsub(/[^a-z]/i, '')}
end
end

You can start limiting the slowness by using the Benchmark tool. Some examples here:
http://www.skorks.com/2010/03/timing-ruby-code-it-is-easy-with-benchmark/
First of all it would be interesting to see how long it takes to run self.calculate_prime_value_hash and after that the calculate_prime_word_value.
Quite often the slowness boils down to the number of times the inners loops are run so you can also log how many times they are run.
One very quick improvement you can do is to set the prime alhabet hash as a constant because it's not changed at all:
PRIME_ALPHABET_HASH = { "a" => 2, "b" => 3, "c" => 5, "d" => 7, "e" => 11, "f" => 13, "g" =>17, "h" =>19, "i" => 23, "j" => 29, "k" => 31, "l" => 37, "m" => 41, "n" =>43, "o" =>47, "p" => 53, "q" =>59, "r" => 61, "s" => 67, "t" => 71, "u" => 73, "v" => 79, "w" => 83, "x" => 89, "y" => 97, "z" => 101 }

Related

Ruby - Hash function with unknown number of arguments

I'm trying to make a calorie counter for the below hash menu. in this example I've passed 3 arguments - what would the function need to look like it the number of parameters/arguments is unknown?
#menu = {
"hamburger" => 250,
"Cheese burger" => 350,
"cola" => 35,
"salad" => 120,
"dessert" => 350
}
def order(a, b, c)
return #menu[a] + #menu[b] + #menu[c]
end
puts order("hamburger", "Cheese burger", "cola")
tried
def order(**a)
total = 0
total += #menu[**a]
end
i know (*a) works for arrays.
I'd like to be able to get results for
puts order("hamburger")
and equally for
puts order("Cheese burger", "salad"), for example
In Ruby, it is often possible to write the code exactly the same way you would describe the solution in English. In this case, you need to get the values at specific keys of the hash and then compute the sum of the values.
You can use the Hash#values_at method to get the values at the specific keys and you can use the Array#sum method to compute the sum of the values:
def order(*items)
#menu.values_at(*items).sum
end
Note that it is strange to use an instance variable of the top-level main object. It would make much more sense to use a constant:
MENU = {
'hamburger' => 250,
'Cheese burger' => 350,
'cola' => 35,
'salad' => 120,
'dessert' => 350,
}
def order(*items)
MENU.values_at(*items).sum
end
It would also make sense to freeze the hash:
MENU = {
'hamburger' => 250,
'Cheese burger' => 350,
'cola' => 35,
'salad' => 120,
'dessert' => 350,
}.freeze
And last but not least, I find the name of the order method somewhat misleading. It is also ambiguous: is order meant to be a noun and this is meant to be a getter method that retrieves an order? Or is it meant to be a verb and it is meant to be a command method which tells the object to execute an order?
Either way, it does not seem that the method is doing either of those two things, rather it seems to compute a total. So, the name should probably reflect that.
I would do:
MENU = {
"hamburger" => 250,
"Cheese burger" => 350,
"cola" => 35,
"salad" => 120,
"dessert" => 350
}
def order(*args)
MENU.values_at(*args).sum
end
order("hamburger", "Cheese burger", "cola")
#=> 635
Read about the Ruby Splat Operator, Hash#values_at and Array#sum.
When you really want to use each (what I would not recommend), like mentioned in the comment, then you can implement it like this:
def order(*args)
total = 0
args.each { |name| total += MENU[name] }
total
end
or
def order(*args)
total = 0
MENU.values_at(*args).each { |value| total += value }
total
end

How do you access two methods from within another method?

I am creating a caesar cipher for The Odin Project's Ruby Programming course, I have my code to the point where I have one method that can take a single word and a shift value and returns the ciphered word using corresponding hash keys and values. And I have another method that takes a sentence and splits it into an array containing each separated word. What I would like to do is combine these two methods so that when you input a sentence, the words are split up into an array, then each part of the array is ciphered using the shift value, then the ciphered words from the array are printed back into sentence form.
Here is my code so far:
"a" => 1,
"b" => 2,
"c" => 3,
"d" => 4,
"e" => 5,
"f" => 6,
"g" => 7,
"h" => 8,
"i" => 9,
"j" => 10,
"k" => 11,
"l" => 12,
"m" => 13,
"n" => 14,
"o" => 15,
"p" => 16,
"q" => 17,
"r" => 18,
"s" => 19,
"t" => 20,
"u" => 21,
"v" => 22,
"w" => 23,
"x" => 24,
"y" => 25,
"z" => 26,
}```
```#multi letter caesar_cipher
def word_cipher(word, shift)
word.split(//).each {|letter| print #cipher.key(#cipher[letter]+ shift)}
end
> word_cipher("kittens", 2)
=> mkvvgpu
#split sentence string into an array of words
def sentence_array(sentence)
sentence.split.each { |word| print word.split }
end
>sentence_array("Look at all of these kittens")
=>["Look"]["at"]["all"]["of"]["these"]["kittens"]
And what I have for my the solution so far
def caesar_cipher(input, shift)
sentence_array(input) = words
words.split(//).each {|letter| print #cipher.key(#cipher[letter]+ shift)}
end
caesar_cipher("I love kittens", 2)
This is my first time posting on here so I'm sorry if I did a bad job explaining anything but any help would be appreciated!!
Thanks!
you have to slightly modify the methods:
#multi letter caesar_cipher
def word_cipher(word, shift)
output = ''
word.split(//).each {|letter| output << #cipher.key(#cipher[letter]+ shift)}
output
end
def sentence_array(sentence)
sentence.split
end
#multi letter caesar_cipher
def caesar_cipher(input, shift)
output = ""
words = sentence_array(input)
words.each do |word|
output << word_cipher(word.downcase, shift)
output << " " unless word == words.last
end
output.capitalize
end
puts caesar_cipher("I love kittens", 2)

Ruby: Scanning strings for matching adjacent vowel groups

I am building a script to randomly generate words that sound like english. I have broken down a large number of english words into VCV groups.
...where the V's represent ALL the adjacent vowels in a word and the C represents ALL the adjacent consonants. For example, the English word "miniature" would become
"-mi", "inia", "iatu", and "ure". "school" would become "-schoo" and "ool".
These groups will be assembled together with other groups from other words with
the rule being that the complete set of adjacent ending vowels must match the
complete set of starting vowels for the attached group.
I have constructed a hash in the following structure:
pieces = {
:starters => { "-sma" => 243, "-roa" => 77, "-si" => 984, ...},
:middles => { "iatu" => 109, "inia" => 863, "aci" => 229, ...},
:enders => { "ar-" => 19, "ouid-" => 6, "ude" => 443, ...}
}
In order to construct generated words, a "starter" string would need to end with the same vowel grouping as the "middle" string. The same applies when connecting the "middle" string with the "ender" string. One possible result using the examples above would be "-sma" + "aba" + "ar-" to give "smabar". Another would be "-si" + "inia" + "iatu" + "ude" to give "siniatude".
My problem is that when I sample any two pieces, I don't know how to ensure that the ending V group of the first piece exactly matches the beginning V group of the second piece. For example, "utua" + "uailo" won't work together because "ua" is not the same as "uai". However, a successful pair would be "utua" + "uado" because "ua" = "ua".
def match(first, second)
end_of_first = first[/[aeiou]+$|[^aeiou]+$/]
start_of_second = second[/^[aeiou]+|^[^aeiou]+/]
end_of_first == start_of_second
end
match("utua", "uailo")
# => false
match("inia", "iatu")
# => true
EDIT: I apparently can't read, I thought you just want to match the group (whether vowel or consonant). If you restrict to vowel groups, it's simpler:
end_of_first = first[/[aeiou]+$/]
start_of_second = second[/^[aeiou]+/]
Since you're already pre-processing the dictionary, I suggest doing a little more preprocessing to make generation simpler. I have two suggestions. First, for the starters and middles, separate each into a tuple (for which, in Ruby, we just use a two-element array) of the form (VC, V), so e.g. "inia" becomes ["in", "ia"]:
starters = [
[ "-sm", "a" ],
[ "-r", "oa" ],
[ "-s", "i" ],
# ...
]
We store the starters in an array since we just need to choose one at random, which we can do with Array#sample:
starter, middle1_key = starters.sample
puts starter # => "-sm"
puts middle1_key # => "a"
We want to be able to look up middles by their initial V groups, so we put those tuples in a Hash instead, with their initial V groups as keys:
middles = {
"ia" => [
[ "iat", "u" ],
[ "iabl", "e" ],
],
"i" => [
[ "in", "ia" ],
# ...
],
"a" => [
[ "ac", "i" ],
# ...
],
# ...
}
Since we stored the starter's final V group in middle1_key above, we can now use that as a key to get the array of middle tuples whose initial V group matches, and choose one at random as we did above:
possible_middles1 = middles[middle1_key]
middle1, middle2_key = possible_middles1.sample
puts middle1 # => "ac"
puts middle2_key => "i"
Just for kicks, let's pick a second middle:
middle2, ender_key = middles[middle2_key].sample
puts middle2 # => "in"
puts ender_key # => "ia"
Our enders we don't need to store in tuples, since we won't be using any part of them to look anything up like we did with middles. We can just put them in a hash whose keys are the initial V groups and whose values are arrays of all of the enders with that initial V group:
enders = {
"a" => [ "ar-", ... ],
"oui" => [ "ouid-", ... ],
"u" => [ "ude-", ... ],
"ia" => [ "ial-", "iar-", ... ]
# ...
}
We stored the second middle's final V group in ender_key above, which we can use to get the array of matching enders:
possible_enders = enders[ender_key]
ender = possible_enders.sample
puts ender # => "iar-"
Now that we have four parts, we just put them together to form our word:
puts starter + middle1 + middle2 + ender
# => -smaciniar-
Edit
The data structures above omit the relative frequencies (I wrote the above before I had a chance to read your answer to my question about the numbers). Obviously it's trivial to also store the relative frequencies alongside the parts, but I don't know off the top of my head a fast way to then choose parts in a weighted fashion. Hopefully my answer is of some use to you regardless.
You can do that using the methods Enumerable#flat_map, String#partition, Enumerable#chunk and a few more familiar ones:
def combine(arr)
arr.flat_map { |s| s.partition /[^aeiou-]+/ }.
chunk { |s| s }.
map { |_, a| a.first }.
join.delete('-')
end
combine ["-sma", "aba", "ar-"]) #=> "smabar"
combine ["-si", "inia", "iatu", "ude"] #=> "siniatude"
combine ["utua", "uailo", "orsua", "uav-"] #=> "utuauailorsuav"
To see how this works, let's look at the last example:
arr = ["utua", "uailo", "orsua", "uav-"]
a = arr.flat_map { |s| s.partition /[^aeiou-]+/ }
#=> ["u", "t", "ua", "uai", "l", "o", "o", "rs", "ua", "ua", "v", "-"]
enum = a.chunk { |s| s }
#=> #<Enumerator: #<Enumerator::Generator:0x007fdd14963888>:each>
We can see the elements of this enumerator by converting it to an array:
enum.to_a
#=> [["u", ["u"]], ["t", ["t"]], ["ua", ["ua"]], ["uai", ["uai"]],
# ["l", ["l"]], ["o", ["o", "o"]], ["rs", ["rs"]], ["ua", ["ua", "ua"]],
# ["v", ["v"]], ["-", ["-"]]]
b = enum.map { |_, a| a.first }
#=> ["u", "t", "ua", "uai", "l", "o", "rs", "ua", "v", "-"]
s = b.join
#=> "utuauailorsuav-"
s.delete('-')
#=> "utuauailorsuav"

counting length of number and character inside string using regex in ruby

how to counting length of number and character inside string using regex in ruby?
if i have some case like this, how to resolve it?
example :
abc = "12345678a"
after counting using regex, i want get result like this :
number = 8
char = 1
how to do that?
Try following
abc = "12345678a"
abc.scan(/\d/).length
# => 8
abc.scan(/\D/).length
# => 1
No regex:
abc = "12345678a"
p abc.count("0-9") # => 8
p abc.count("a-zA-Z") # => 1
This is an optional, but I still think regex is better.
irb(main):051:0> number, char = abc.bytes.to_a.partition { |e| e >= 48 and e <= 57}
=> [[49, 50, 51, 52, 53, 54, 55, 56], [97]]
irb(main):053:0> number.count
=> 8
irb(main):054:0> char.count
=> 1
partition: Returns two arrays, the first containing the elements of enum for which the block evaluates to true, the second containing the rest.

Ruby assignment hash [closed]

Closed. This question needs details or clarity. It is not currently accepting answers.
Want to improve this question? Add details and clarify the problem by editing this post.
Closed 8 years ago.
Improve this question
I am working on a program to calculate grades and am using a hash of values to help with the letter assignments. My hash looks like this
LETTERS = {
"A+" => 98, "A" => 95, "A-" => 92,
"B+" => 88, "B" => 85, "B-" => 82,
"C+" => 78, "C" => 75, "C-" => 72,
"D+" => 68, "D" => 65, "D-" => 62,
"F+" => 55, "F" => 40, "F-" => 25,
}
My question is how would I be able to assign, say, a 71 to a grade even though it is not an explicit value in the hash?
Firstly, in ruby we call it a hash - not a dictionary. You might do what you want with:
def grade(points)
LETTERS.find {|_, v| v <= points}.first
end
Note: Find method depends on the order of the hash - the function above will not work correctly if the hash is not ordered (desc) by values. Also - you didn't say what should happen if points are, say, 20 (below any threshold). Currently it will throw NoMethodError
I don't see the reason for using a hash here. In fact, the keys and the values in the OP's hash are the opposite, and useless.
[
[98, "A+"],
[95, "A"],
[92, "A-"],
[88, "B+"],
[85, "B"],
[82, "B-"],
[78, "C+"],
[75, "C"],
[72, "C-"],
[68, "D+"],
[65, "D"],
[62, "D-"],
[77, "F+"],
[40, "F"],
[25, "F-"],
]
.bsearch{|x, _| x <= 89}.to_a.last
# => "B+"
which turned out to be almost the same as BroiSatse's answer.
Not an exact answer, but:
You could instead use a function that returns the grade depending on the value
def get_grade(points)
return "A+" if points >= 98
return "A" if points < 98 and points >= 95
... # etc
end
That way you don't have to assign each value a grade.
Alternatively, you could assign each grade an array of points
Another possibility is to encode the logic into a method using case. I'm adding this option because #blueygh2's answer was burning my eyes :)
def grade(score)
case score
when 98..100
"A+"
when 95...98
"A"
when 92...95
"A-"
end
# and so forth
end
Using the non-inclusive ranges (with three dots) makes it work with fractional scores, but that may not be a requirement.

Resources