Simply put, I want to have an input of letters, and output all possible combinations for a set length range.
for example:
length range 1 - 2
input a, b, c
...
output a, b, c, aa, ab, ac, bb, ba, bc, cc, ca, cb
I am trying to make an anagram/spell check solver so that I can 'automate' the NYT's Spelling Bee game. So, I want to input the letters given into my program, get an array of all possible combinations for specific lengths (they have a min word length of 4) and then check that array against an array of all English words. What I have so far is:
letters = ["m","o","r"]
words = []
# Puts all the words into an array
File.open('en_words.txt') do |word|
word.each_line.each do |line|
words << line.strip
end
end
class String
def permutation(&block)
arr = split(//)
arr.permutation { |i| yield i.join }
end
end
letters.join.permutation do |i|
p "#{i}" if words.include?(i)
end
=>"mor"
=>"rom"
my issue with the above code is that it stop
s at the number of letters I have given it. For example, it will not repeat to return "room" or "moor". So, what I am trying to do is get a more complete list of combinations, and then check those against my word list.
Thank you for your help.
How about going the other way? Checking every word to make sure it only uses the allowed letters?
I tried this with the 3000 most common words and it worked plenty fast.
words = [..]
letters = [ "m", "o", "r" ]
words.each do |word|
all_letters_valid = true
word.chars.each do |char|
unless letters.include?(char)
all_letters_valid = false
break
end
end
if all_letters_valid
puts word
end
end
If letters can repeat there isn't a finite number of permutations so that approach doesn't make sense.
Assumption: English ascii characters only
If the goal is not to recode the combination for an educational purpose :
In the ruby standard library, the Array class has a combination method.
Here an examples :
letters = ["m","o","r"]
letters.combination(2).to_a
# => [["m", "o"], ["m", "r"], ["o", "r"]]
You also have a magic permutation method :
letters.permutation(3).to_a
# => [["m", "o", "r"], ["m", "r", "o"], ["o", "m", "r"], ["o", "r", "m"], ["r", "m", "o"], ["r", "o", "m"]]
If the goal is to recode theses methods. Maybe you can use them as validation. For exemple by counting the elements in your method and in the standard library method.
Related
i a m a begginner in ruby please i need directions in how to get the program to return a list containing "inlets"
Question: Given a word and a list of possible anagrams, select the correct sublist.
Given "listen" and a list of candidates like "enlists" "google" "inlets" "banana" the program should return a list containing "inlets".
This is what i have been able to do
puts 'Enter word'
word_input = gets.chomp
puts 'Enter anagram_list'
potential_anagrams = gets.chomp
potential_anagrams.each do |anagram|
end
this is how the program should behave assuming my word_input was "hello" but i do not know how to get this working.
is_anagram? word: 'hello', anagrams: ['helo', 'elloh', 'heelo', 'llohe']
# => 'correct anagrams are: elloh, llohe'
Would really appreciate ideas.
As hinted in comments, a string can be easily converted to an array of its characters.
irb(main):005:0> "hello".chars
=> ["h", "e", "l", "l", "o"]
irb(main):006:0> "lleho".chars
=> ["l", "l", "e", "h", "o"]
Arrays can be easily sorted.
irb(main):007:0> ["h", "e", "l", "l", "o"].sort
=> ["e", "h", "l", "l", "o"]
irb(main):008:0> ["l", "l", "e", "h", "o"].sort
=> ["e", "h", "l", "l", "o"]
And arrays can be compared.
irb(main):009:0> ["e", "h", "l", "l", "o"] == ["e", "h", "l",
l", "o"]
=> true
Put all of this together and you should be able to determine if one word is an anagram of another. You can then pair this with #select to find the anagrams in an array. Something like:
def is_anagram?(word, words)
words.select do |w|
...
end
end
I took the tack of creating a class in pursuit of re-usability. This is overkill for a one-off usage, but allows you to build up a set of known words and then poll it as many times as you like for anagrams of multiple candidate words.
This solution is built on hashes and sets, using the sorted characters of a word as the index to a set of words sharing the same letters. Hashing is O(1), Sets are O(1), and if we view words as having a bounded length the calculation of the key is also O(1), yielding an overall complexity of a constant time per word.
I've commented the code, but if anything is unclear feel free to ask.
require 'set'
class AnagramChecker
def initialize
# A hash whose default value is an empty set object
#known_words = Hash.new { |h, k| h[k] = Set.new }
end
# Create a key value from a string by breaking it into individual
# characters, sorting, and rejoining, so all strings which are
# anagrams of each other will have the same key.
def key(word)
word.chars.sort.join
end
# Add individual words to the class by generating their key value
# and adding the word to the set. Using a set guarantees no
# duplicates of the words, since set contents are unique.
def add_word(word)
word_key = key(word)
#known_words[word_key] << word
# return the word's key to avoid duplicate work in find_anagrams
word_key
end
def find_anagrams(word)
word_key = add_word(word) # add the target word to the known_words
#known_words[word_key].to_a # return all anagramatic words as an array
end
def inspect
p #known_words
end
end
Producing a library of known words looks like this:
ac = AnagramChecker.new
["enlists", "google", "inlets", "banana"].each { |word| ac.add_word word }
ac.inspect # {"eilnsst"=>#<Set: {"enlists"}>, "eggloo"=>#<Set: {"google"}>, "eilnst"=>#<Set: {"inlets"}>, "aaabnn"=>#<Set: {"banana"}>}
Using it looks like:
p ac.find_anagrams("listen") # ["inlets", "listen"]
p ac.find_anagrams("google") # ["google"]
If you don't want the target word to be included in the output, adjust find_anagrams accordingly.
Here's how I would do it.
Method
def anagrams(list, word)
ltr_freq = word.each_char.tally
list.select { |w| w.size == word.size && w.each_char.tally == ltr_freq }
end
Example
list = ['helo', 'elloh', 'heelo', 'llohe']
anagrams(list, 'hello')
#=> ["elloh", "llohe"]
Computational complexity
For practical purposes, the computational complexity of computing w.each_char.tally for any word w of length n can be regarded as O(n). That's because hash key lookups are nearly constant-time. It follows that the computational complexity of determining whether a word of length n is an anagram of another word of the same length can be regarded as O(n).
This compares with methods that sort the letters of a word, which have a computational complexity of O(n*log(n)), n being the word length.
Explanation
See Enumerable#tally. Note that w.each_char.tally is not computed when w.size == word.size is false.
Now let's add some puts statements to see what is happening.
def anagrams(list, word)
ltr_freq = word.each_char.tally
puts "ltr_freq = #{ltr_freq}"
list.select do |w|
puts "\nw = '#{w}'"
if w.size != word.size
puts "words differ in length"
false
else
puts "w.each_char.tally = #{w.each_char.tally}"
if w.each_char.tally == ltr_freq
puts "character frequencies are the same"
true
else
puts "character frequencies differ"
false
end
end
end
end
anagrams(list, 'hello')
ltr_freq = {"h"=>1, "e"=>1, "l"=>2, "o"=>1}
w = 'helo'
words differ in length
w = 'elloh'
w.each_char.tally = {"e"=>1, "l"=>2, "o"=>1, "h"=>1}
character frequencies are the same
w = 'heelo'
w.each_char.tally = {"h"=>1, "e"=>2, "l"=>1, "o"=>1}
character frequencies differ
w = 'llohe'
w.each_char.tally = {"l"=>2, "o"=>1, "h"=>1, "e"=>1}
character frequencies are the same
#=>["elloh", "llohe"]
Possible improvement
A potential weakness of the expression
w.each_char.tally == ltr_freq
is that the frequencies of all unique letters in the word w must be determined before a conclusion is reached, even if, for example, the first letter of w does not appear in the word word. We can remedy that as follows.
def anagrams(list, word)
ltr_freq = word.each_char.tally
list.select do |w|
next false unless w.size == word.size
ltr_freqs_match?(w, ltr_freq.dup)
end
end
def ltr_freqs_match?(w, ltr_freq)
w.each_char do |c|
return false unless ltr_freq.key?(c)
ltr_freq[c] -= 1
ltr_freq.delete(c) if ltr_freq[c].zero?
end
true
end
One would have to test this variant against the original version of anagrams above to determine which tends to be fastest. This variant has the advantage that it terminates (short-circuits) the comparison as soon as is found that the cumulative count of a given character in the word w is greater than the total count of the same letter in word. At the same time, tally is written in C so it may still be faster.
So I have a array of characters and I'd like to display all permutations of a given size meeting a certain condition. For instance, if my array contains 'L', 'E' and 'A' and I choose to display all permutations of size 3 that ends with 'L'. There are two possibilities, ["A", "E", "L"] and ["E", "A", "L"].
My problem is: how can I count the number of possibilities and print all the possibilities within the same each? Here's what I have so far:
count = 0
combination_array.select do |item|
count += 1 if item.last == 'L'
puts "#{item} " if item.last == 'L'
end
It works fine, but I have to write the condition 2 times and also I can't write before displaying all possibilities. I've created a method
def count_occurrences(arr)
counter = 0
arr.each do |item|
counter += 1 if item.last == 'L'
end
counter
end
but I still have to repeat my condition (item.last == 'L'). it doesn't seem very efficient to me.
You could use each_cons (docs) to iterate through each set of 3 items, and count (docs) in block form to have Ruby count for you without constructing a new array:
matches = [["E", "A", "L"], ["A", "E", "L"]]
match_count = data.each_cons(3).count do |set|
if matches.include?(set)
puts set.to_s
return true
end
end
If you really dislike the conditional block, you could technically simplify to a one-liner:
stuff_from_above.count do |set|
matches.include?(set) && !(puts set.to_s)
end
This takes advantage of the fact that puts always evaluates to nil.
And if you're feeling extra lazy, you can also write ["A", "E", "L"] as %w[A E L] or "AEL".chars.
If you specifically want to display and count permutations that end in "L", and the array arr is known to contain exactly one "L", the most efficient method is to simply generate permutations of the array with "L" removed and then tack "L" onto each permutation:
arr = ['B', 'L', 'E', 'A']
str_at_end = 'L'
ar = arr - [str_at_end]
#=> ["B", "E", "A"]
ar.permutation(2).reduce(0) do |count,a|
p a + [str_at_end]
count += 1
end
#=> 6
displaying:
["B", "E", "L"]
["B", "A", "L"]
["E", "B", "L"]
["E", "A", "L"]
["A", "B", "L"]
["A", "E", "L"]
If you want to do something else as well you need to state specifically what that is.
Note that the number of permutations of the elements of an array of size n is simply n! (n factorial), so if you only need the number of permutations with L at the end you could compute that as factorial(arr.size-1), where factorial is a simple method you would need to write.
This question already has answers here:
Measure the distance between two strings with Ruby?
(7 answers)
Closed 6 years ago.
I was trying to find the difference in letters between two strings.
For example, if I put the word ATTGCC and GTTGAC, the difference would be 2 since A and G and C and G are not the same characters.
class DNA
def initialize (nucleotide)
#nucleotide = nucleotide
end
def length
#nucleotide.length
end
def hamming_distance(other)
self.nucleotide.chars.zip(other.nucleotide) { |a,b| a == b }.count
end
protected
attr_reader :nucleotide
end
dna1 = DNA.new("ATTGCC")
dna2 = DNA.new("GTTGAC")
puts dna1.hamming_distance(dna2)
The method hamming_distance doesn't really work as it gives a wrong argument type String (must respond to :each) (TypeError)
Assuming the strings are of the same length, you can split them, zip them, and find how many pairs match:
string1 = "RATTY"
string2 = "CATTI"
string1.chars.zip(string2.chars).select { |a,b| a == b }.count
The .chars produces an array of the characters in the string ("RATTY" => ["R", "A", "T", "T", "Y"])
The .zip call merges the two arrays together into a an array of pairs, ["R", "A", "T"].zip(["C", "A", "T"]) => [ ["R", "C"], ["A", "A"], ["T", "T"]]
The select filters out the pairs where the values aren't equal
The count returns the number of pairs matched by select
You can find the number of non-matching pairs by negating the selection
If I have a string with no spaces in it, just a concatenation like "hellocarworld", I want to get back an array of the largest dictionary words. so I would get ['hello','car','world']. I would not get back words such as 'a' because that belongs in 'car'.
The dictionary words can come from anywhere such as the dictionary on unix:
words = File.readlines("/usr/share/dict/words").collect{|x| x.strip}
string= "thishasmanywords"
How would you go about doing this?
I would suggest the following.
Code
For a given string and dictionary, dict:
string_arr = string.chars
string_arr.size.downto(1).with_object([]) { |n,arr|
string_arr.each_cons(n) { |a|
word = a.join
arr << word if (dict.include?(word) && !arr.any? {|w| w.include?(word) })}}
Examples
dict = File.readlines("/usr/share/dict/words").collect{|x| x.strip}
string = "hellocarworld"
#=> ["hello", "world", "loca", "car"]
string= "thishasmanywords"
#=> ["this", "hish", "many", "word", "sha", "sma", "as"]
"loca" is the plural of "locus". I'd never heard of "hish", "sha" or "sma". They all appear to be slang words, as I could only find them in something called the "Urban Dictonary".
Explanation
string_arr = "hellocarworld".chars
#=> ["h", "e", "l", "l", "o", "c", "a", "r", "w", "o", "r", "l", "d"]
string_arr.size
#=> 13
so for this string we have:
13.downto(1).with_object([]) { |n,arr|...
where arr is an initially-empty array that will be computed and returned. For n => 13,
enum = string_arr.each_cons(13)
#<Enumerator: ["h","e","l","l","o","c","a","r","w","o","r","l","d"]:each_cons(13)>
which enumerates over an array consisting of the single array string_arr:
enum.size #=> 1
enum.first == string_arr #=> true
That single array is assigned to the block variable a, so we obtain:
word = enum.first.join
#=> "hellocarworld"
We find
dict.include?(word) #=> false
so this word is not added to the array arr. It is was in the dictionary we would check to make sure it was not a substring of any word already in arr, which are all of the same size or larger (longer words).
Next we compute:
enum = string_arr.each_cons(12)
#<Enumerator: ["h","e","l","l","o","c","a","r","w","o","r","l","d"]:each_cons(12)>
which we can see enumerates two arrays:
enum = string_arr.each_cons(12).to_a
#=> [["h", "e", "l", "l", "o", "c", "a", "r", "w", "o", "r", "l"],
# ["e", "l", "l", "o", "c", "a", "r", "w", "o", "r", "l", "d"]]
corresponding to the words:
enum.first.join #=> "hellocarworl"
enum.last.join #=> "ellocarworld"
neither of which are in the dictionary. We continue in this fashion, until we reach n => 1:
string_arr.each_cons(1).to_a
#=> [["h"], ["e"], ["l"], ["l"], ["o"], ["c"],
# ["a"], ["r"], ["w"], ["o"], ["r"], ["l"], ["d"]]
We find only "a" in the dictionary, but as it is a substring of "loca" or "car", which are already elements of the array arr, we do not add it.
This can be a bit tricky if you're not familiar with the technique. I often lean heavily on regular expressions for this:
words = File.readlines("/usr/share/dict/words").collect(&:strip).reject(&:empty?)
regexp = Regexp.new(words.sort_by(&:length).reverse.join('|'))
phrase = "hellocarworld"
equiv = [ ]
while (m = phrase.match(regexp)) do
phrase.gsub!(m[0]) do
equiv << m[0]
'*'
end
end
equiv
# => ["hello", "car", "world"]
Update: Strip out blank strings which would cause the while loop to run forever.
Starting at the beginning of the input string, find the longest word in the dictionary. Chop that word off the beginning of the input string and repeat.
Once the input string is empty, you are done. If the string is not empty but no word was found, remove the first character and continue the process.
This isn't homework, just an interview question I found on the web that looks interesting.
So I took a look at this first: Telephone Words problem -- but it seems to be poorly worded/created some controversy. My question is pretty much the same, except my question is more about the time complexity behind it.
You want to list all the possible words when given a 10-digit phone number as your input. So here is what I have done:`
def main(telephone_string)
hsh = {1 => "1", 2 => ["a","b","c"], 3 => ["d","e","f"], 4 => ["g","h","i"],
5 => ["j","k","l"], 6 => ["m","n","o"], 7 => ["p","q","r","s"],
8 => ["t","u","v"], 9 => ["w","x","y","z"], 0 => "0" }
telephone_array = telephone_string.split("-")
three_number_string = telephone_array[1]
four_number_string = telephone_array[2]
string = ""
result_array = []
hsh[three_number_string[0].to_i].each do |letter|
hsh[three_number_string[1].to_i].each do |second_letter|
string = letter + second_letter
hsh[three_number_string[2].to_i].each do |third_letter|
new_string = string + third_letter
result_array << new_string
end
end
end
second_string = ""
second_result = []
hsh[four_number_string[0].to_i].each do |letter|
hsh[four_number_string[1].to_i].each do |second_letter|
second_string = letter + second_letter
hsh[four_number_string[2].to_i].each do |third_letter|
new_string = second_string + third_letter
hsh[four_number_string[3].to_i].each do |fourth_letter|
last_string = new_string + fourth_letter
second_result << last_string
end
end
end
end
puts result_array.inspect
puts second_result.inspect
end
First off, this is what I hacked together in a few minutes time, no refactoring has been done. So I apologize for the messy code, I just started learning Ruby 6 weeks ago, so please bear with me!
So finally to my question: I was wondering what the time complexity of this method would be. My guess is that it would be O(n^4) because the second loop (for the four letter words) is nested four times. I'm not really positive though. So I would like to know whether that is correct, and if there is a better way to do this problem.
This is actually a constant time algorithm, so O(1) (or to be more explicit, O(4^3 + 4^4))
The reason this is a constant time algorithm is that for each digit in the telephone number, you're iterating through a fixed number (at most 4) of possible letters, that's known beforehand (which is why you can put hsh statically into your method).
One possible optimization would be to stop searching when you know there are no words with the current prefix. For example, if the 3-digit number is "234", you can ignore all strings that start with "bd" (there are some bd- words, like "bdellid", but none that are 3-letters, at least in my /usr/share/dict/words).
From the original phrasing, I would assume that is requesting all of the possibilities, instead of the number of possibilities as output.
Unfortunately, if you need to return every combination, there is no way to lower the complexity below that determined by the specified keys.
If it were simply the number, it could be in constant time. However, to print them all out, the end result depends highly on assumptions:
1) Assuming that all of the words you are checking for are composed solely of letters, you only need to check against the eight keys from 2 to 9. If this is incorrect, just sub out 8 in the function below.
2) Assuming the layout of all keys is exactly as set up here (no octothorpes or asterisks), with the contents of the empty arrays taking up no space in the final word.
{
1 => [],
2 => ["a", "b", "c"],
3 => ["d", "e", "f"],
4 => ["g", "h", "i"],
5 => ["j", "k", "l"],
6 => ["m", "n", "o"],
7 => ["p", "q", "r", "s"],
8 => ["t", "u", "v"],
9 => ["w", "x", "y", "z"],
0 => []
}
At each stage, you would simply check the number of possibilities for the next step, and append each possible choice to the end of a string. If you were to do, so, the minimum time would be (essentially) constant time (0, if the number consisted of all ones and zeros). However, the function would be O(4^n), where n reaches a maximum at 10. The largest possible number of combinations would be 4^10, if they hit 7 or nine each time.
As for your code, I would recommend a single loop, with a few basic nested loops. Here is the code, in Ruby, although I haven't run it, so there may be syntax errors.
def get_words(number_string)
hsh = {"2" => ["a", "b", "c"],
"3" => ["d", "e", "f"],
"4" => ["g", "h", "i"],
"5" => ["j", "k", "l"],
"6" => ["m", "n", "o"],
"7" => ["p", "q", "r", "s"],
"8" => ["t", "u", "v"],
"9" => ["w", "x", "y", "z"]}
possible_array = hsh.keys
number_array = number_string.split("").reject{|x| possible_array.include?(x)}
if number_array.length > 0
array = hsh[number_array[0]]
end
unless number_array[1,-1].nil?
number_array.each do |digit|
new_array = Array.new()
array.each do |combo|
hsh[digit].each do |new|
new_array = new_array + [combo + new]
end
end
array = new_array
end
new_array
end