I am currently using this function, and the code works exactly as it should.
self.chars.permutation.map(&:join).uniq.group_by(&:chr)
However, once the string is more than 10 characters, it takes a lot of time to generate all permutations. How could I generate permutations quicker?
Rather than computing all permutations of each word, a better approach is to first create a hash from the dictionary, whose keys are strings sorted by character and whose values are arrays containing all words in the dictionary which are anagrams of the key. The arrays are empty when the word contains no anagrams in the dictionary (other than itself).
words = %w| god act bat tar a lion stop |
#=> ["god", "act", "bat", "tar", "a", "lion", "stop"]
dictionary = %w| cat dog a fowl bat god act lion pig donkey loin post pots
spot stop tops|
#=> ["cat", "dog", "a", "fowl", "bat", "god", "act", "lion", "pig",
# "donkey", "loin", "post", "pots", "spot", "stop", "tops"]
h = dictionary.each_with_object(Hash.new { |h,k| h[k] = [] }) do |w,h|
h[w.each_char.sort.join] << w
end
#=> {"act"=>["cat", "act"], "dgo"=>["dog", "god"], "a"=>["a"], "flow"=>["fowl"],
# "abt"=>["bat"], "ilno"=>["lion", "loin"], "gip"=>["pig"], "deknoy"=>["donkey"],
# "opst"=>["post", "pots", "spot", "stop", "tops"]}
We can then obtain all the anagrams of each word in words by sorting the word on its characters and seeing whether that is a key in the hash.
words.each_with_object({}) do |w,g|
key = w.downcase.chars.sort.join
values = h.key?(key) ? (h[key]-[w]) : []
g[w] = values
end
#=> {"god"=>["dog"], "act"=>["cat"], "bat"=>[], "tar"=>[], "a"=>[],
# "lion"=>["loin"], "stop"=>["post", "pots", "spot", "tops"]}
I am currently using this function, and the code works exactly as it should.
self.chars.permutation.map(&:join).uniq.group_by(&:chr)
However, once the string is more than 10 characters, it takes a lot of time to generate all permutations. How could I generate permutations quicker?
You can't. Well, maybe there are ways of speeding it up a little, but there is really no point: the number of permutations is much too large. For just 25 characters, even if we assume that you can generate one permutation for every CPU cycle, even if we assume that you have a 5GHz CPU, even if we assume that your CPU has 100 cores, even if we assume that the work can be perfectly distributed among those cores, it will still take close to one million years to generate. There's just that many of them.
In short: there is no point in even trying to speed up your algorithm. You need to get away from generating permutations altogether.
Theory
No need for permutations :
Sort the letters in your string
Sort the letters in every word in the dictionary
Look for same sorted letters
Done!
Implementation
class String
def sorted_letters
downcase.chars.sort.join
end
end
class AnagramFinder
#dict = '/usr/share/dict/american-english'
class << self
def get_anagrams(word)
sorted_dict[word.sorted_letters]
end
def all
sorted_dict.values.select { |anagrams| anagrams.size > 1 }
end
def sorted_dict
#sorted_dict ||= generate_sorted_dict
end
private
def generate_sorted_dict
File.foreach(#dict).with_object(Hash.new { |h, k| h[k] = [] }) do |word, sorted_dict|
word.chomp!
sorted_dict[word.sorted_letters] << word
end
end
end
end
p AnagramFinder.get_anagrams('impressiveness')
#=> ["impressiveness", "permissiveness"]
p AnagramFinder.get_anagrams('castor')
#=> ["Castor", "Castro", "Croats", "actors", "castor", "costar", "scrota"]
p AnagramFinder.all.last(5)
#=> [["wist", "wits"], ["withers", "writhes"], ["woodworm", "wormwood"], ["wriest", "writes"], ["wrist", "writs"]]
p AnagramFinder.all.max_by(&:length)
#=> ["Stael", "Tesla", "least", "slate", "stale", "steal", "tales", "teals"]
This example needed 0.5s on my slowish server, and most of it was spent building the sorted dictionary. Once it is done, the lookup is almost instantaneous.
"impressiveness" has 14 characters, you would need a very long time to generate all the permutations (14! = 87178291200).
Perhaps lazy might be an option. It doesn't need as much memory as generation all permutations before checking for a special condition.
Something like:
'my_string'.chars.permutation.lazy.map(&:join).each do |permutation|
puts permutation if dictionary.include?(permutation)
end
If we look at Permutation we see the number of permutations of an eleven letter word with no letter repeated would be 39,916,800. However for MISSISSIPPI it is 11! / ( 1! * 4! * 4! * 2!) = 34,650. The first is going to take a long time however you do it, but if you can reduce the search space using repeating characters it might become more manageable. The standard permutation method does not remove repeats.
Searching for "ruby permutations without repetition" might turn up some algorithms.
Related
I have been reading this:
https://docs.ruby-lang.org/en/2.4.0/Enumerator.html
I am trying to understand why someone would use .to_enum, I mean how is that different than just an array? I see :scan was passed into it, but what other arguments can you pass into it?
Why not just use .scan in the case below? Any advice on how to understand .to_enum better?
"Hello, world!".scan(/\w+/) #=> ["Hello", "world"]
"Hello, world!".to_enum(:scan, /\w+/).to_a #=> ["Hello", "world"]
"Hello, world!".to_enum(:scan).each(/\w+/).to_a #=> ["Hello", "world"]
Arrays are, necessarily, constructs that are in memory. An array with a a lot of entries takes up a lot of memory.
To put this in context, here's an example, finding all the "palindromic" numbers between 1 and 1,000,000:
# Create a large array of the numbers to search through
numbers = (1..1000000).to_a
# Filter to find palindromes
numbers.select do |i|
is = i.to_s
is == is.reverse
end
Even though there's only 1998 such numbers, the entire array of a million needs to be created, then sifted through, then kept around until garbage collected.
An enumerator doesn't necessarily take up any memory at all, not in a consequential way. This is way more efficient:
# Uses an enumerator instead
numbers = (1..1000000).to_enum
# Filtering code looks identical, but behaves differently
numbers.select do |i|
is = i.to_s
is == is.reverse
end
You can even take this a step further by making a custom Enumerator:
palindromes = Enumerator.new do |y|
1000000.times do |i|
is = (i + 1).to_s
y << i if (is == is.reverse)
end
end
This one doesn't even bother with filtering, it just emits only palindromic numbers.
Enumerators can also do other things like be infinite in length, whereas arrays are necessarily finite. An infinite enumerator can be useful when you want to filter and take the first N matching entries, like in this case:
# Open-ended range, new in Ruby 2.6. Don't call .to_a on this!
numbers = (1..).to_enum
numbers.lazy.select do |i|
is = i.to_s
is == is.reverse
end.take(1000).to_a
Using .lazy here means it does the select, then filters through take with each entry until the take method is happy. If you remove the lazy it will try and evaluate each stage of this to completion, which on an infinite enumerator never happens.
I am trying to build a method in Ruby that will take in a string that has been split into an array of letters and then iterate through the array, swapping the element at index n with that at index n+1. The method will then join the new array into a string and push it to another array.
Here is an example of what I am looking to do:
string = "teh"
some_method(string)
some ruby magic here
array << new_string
end
Expected output:
["eth", "the"]
This is for a spell checker program I am writing for school. The method will check if letters in a misspelled word are swapped by checking to see if the output array elements are in the dictionary. If they are, it will return the word with that is most likely the correct word. I haven't had any luck finding articles or documentation on how to build such a method in ruby or on an existing method to do this. I've been tinkering with building this method for awhile now but my code isn't behaving anything like what I need. Thanks in advance!
As #Sergio advised, you want to use parallel assignment for this:
def reverse_em(str)
(0...str.size-1).map do |i|
s = str.dup
s[i], s[i+1] = s[i+1], s[i]
s
end
end
candidates = reverse_em "alogrithm"
#=> ["laogrithm", "aolgrithm", "algorithm", "alorgithm",
# "alogirthm", "alogrtihm", "alogrihtm", "alogritmh"]
dictionary_check(candidates)
#=> algorithm
# al·go·rithm
# noun \ˈal-gə-ˌri-thəm\
# a set of steps that are followed in order to solve a
# mathematical problem or to complete a computer process
Without splitting it into arrays then joining to new arrays (because that doesn't seem necessary):
def some_method(string)
swapped_strings = []
(0...string.size-1).each do |i|
temp_string = string.dup
temp_string[i], temp_string[i+1] = temp_string[i+1], temp_string[i]
swapped_strings << temp_string
end
swapped_strings
end
For an assignment I am working on, I'm trying to sort words in a piece of text by frequency of words in the text. I have a function that almost accomplishes what I'd like to do but not quite. Below is my code:
require 'pry'
def top_words(words)
word_count = Hash.new(0)
words = words.split(" ")
words.each { |word| word_count[word] += 1 }
word_count = word_count.sort_by do |words, frequencies|
frequencies
end
binding.pry
word_count.reverse!
word_count.each { |word, frequencies| puts word + " " + frequencies.to_s }
end
words = "1st RULE: You do not talk about FIGHT CLUB.
2nd RULE: You DO NOT talk about FIGHT CLUB.
3rd RULE: If someone says 'stop' or goes limp, taps out the fight is over.
4th RULE: Only two guys to a fight.
5th RULE: One fight at a time.
6th RULE: No shirts, no shoes.
7th RULE: Fights will go on as long as they have to.
8th RULE: If this is your first night at FIGHT CLUB, you HAVE to fight."
For some reason, the sort_by method above my binding.pry is changing the structure of my Hash into an array of an array. Why?
What I'd like to do is to sort the words within a hash and then grab the top three words from the Hash. I've yet to figure out how to do this but I'm pretty sure I can do this once I've sorted the array of an array problem.
Now, I suppose I could grab them using .each and array[0].each { |stuff| puts stuff[0] + stuff[1] } but I don't think that is the most efficient way. Any suggestions?
For some reason, the sort_by method above my binding.pry is changing the structure of my Hash into an array of an array. Why?
Explanation is below :
sort_by { |obj| block } → array method give always array.
The current implementation of sort_by generates an array of tuples containing the original collection element and the mapped value. This makes sort_by fairly expensive when the keysets are simple.
Now in your case word_count is a Hash object, thus sort_by is giving you like - [[key1,val],[key2,val2],..]. This is the reason you are getting array of array.
What I'd like to do is to sort the words within a hash and then grab the top three words from the Hash. I've yet to figure out how to do this but I'm pretty sure I can do this once I've sorted the array of an array problem.
Yes, possible.
sorted_array_of_array = word_count.sort_by do |words, frequencies| frequencies }
top_3_hash = Hash[ sorted_array_of_array.last(3) ]
I would write the code as below :
def top_words(words)
# splitting the string words on single white space to create word array.
words = words.split(" ")
# creating a hash, which will have key as word and value is the number of times,
# that word occurred in a sentence.
word_count = words.each_with_object(Hash.new(0)) { |word,hash| hash[word] += 1 }
# sorting the hash, to get a descending order sorted array of array
sorted_array_of_array = word_count.sort_by { |words, frequencies| frequencies }
# top 3 word/frequency is taken from the sorted list. Now reading them from last
# to show the output as first top,second top and so on..
sorted_array_of_array.last(3).reverse_each do |word, frequencies|
puts "#{word} has #{frequencies}"
end
end
This question already has an answer here:
Ruby Array Initialization [duplicate]
(1 answer)
Closed 3 years ago.
I'm writing a radix sort implementation in Ruby as a self-teaching exercise, and something very odd is happening here. letters ought to be a 2D array of buckets, one for each letter of the alphabet (0 for space/nil, 1-26 for letters). For some reason, this code is not inserting my word at the one index of letters, however. It's inserting it at every index. There also seems to be some sort of infinite loop which prevents it from terminating, which is also odd.
What am I doing wrong? Here is my code.
def radix_sort(words)
letters = Array.new(27, [])
offset = 'a'.ord
max_length = words.max_by { |word| word.length }.length
(max_length-1).downto(0) do |i|
words.each do |word|
if word[i] != nil then
index = word[i].downcase.ord - offset
letters[index + 1] << word
else
letters[0] << word
end
end
words = letters.flatten
letters = letters.map { |bucket| bucket.clear }
end
words
end
w = ["cat", "dog", "boar", "Fish", "antelope", "moose"]
p radix_sort(w)
First of all, there are no 2D arrays, just arrays-of-arrays. Secondly, this:
default_array = [ ]
letters = Array.new(27, default_array)
simply copies the default_array reference to every one of the 27 automatically created values so letters looks like this:
letters = [
default_array,
default_array,
...
]
Have a look at letters[0].object_id and letters[1].object_id and you'll see that they're exactly the same. The fine manual even says as much:
new(size=0, obj=nil)
new(array)
new(size) {|index| block }
Returns a new array.
[...] When a size and an optional obj are sent, an array is created with size copies of obj. Take notice that all elements will reference the same object obj.
Emphasis mine in the last sentence.
However, if you say:
letters = Array.new(27) { [ ] }
then the block will be executed once for each of the 27 initial values and each execution of the block will create a brand new array. Check letters[0].object_id and letters[1].object_id you'll see that they really are different.
So I need to get all possible permutations of a string.
What I have now is this:
def uniq_permutations string
string.split(//).permutation.map(&:join).uniq
end
Ok, now what is my problem: This method works fine for small strings but I want to be able to use it with strings with something like size of 15 or maybe even 20. And with this method it uses a lot of memory (>1gb) and my question is what could I change not to use that much memory?
Is there a better way to generate permutation? Should I persist them at the filesystem and retrieve when I need them (I hope not because this might make my method slow)?
What can I do?
Update:
I actually don't need to save the result anywhere I just need to lookup for each in a table to see if it exists.
Just to reiterate what Sawa said. You do understand the scope? The number of permutations for any n elements is n!. It's about the most aggressive mathematical progression operation you can get. The results for n between 1-20 are:
[1, 2, 6, 24, 120, 720, 5040, 40320, 362880, 3628800, 39916800, 479001600,
6227020800, 87178291200, 1307674368000, 20922789888000, 355687428096000,
6402373705728000, 121645100408832000, 2432902008176640000]
Where the last number is approximately 2 quintillion, which is 2 billion billion.
That is 2265820000 gigabytes.
You can save the results to disk all day long - unless you own all the Google datacenters in the world you're going to be pretty much out of luck here :)
Your call to map(&:join) is what is creating the array in memory, as map in effect turns an Enumerator into an array. Depending on what you want to do, you could avoid creating the array with something like this:
def each_permutation(string)
string.split(//).permutation do |permutaion|
yield permutation.join
end
end
Then use this method like this:
each_permutation(my_string) do |s|
lookup_string(s) #or whatever you need to do for each string here
end
This doesn’t check for duplicates (no call to uniq), but avoids creating the array. This will still likely take quite a long time for large strings.
However I suspect in your case there is a better way of solving your problem.
I actually don't need to save the result anywhere I just need to lookup for each in a table to see if it exists.
It looks like you’re looking for possible anagrams of a string in an existing word list. If you take any two anagrams and sort the characters in them, the resulting two strings will be the same. Could you perhaps change your data structures so that you have a hash, with keys being the sorted string and the values being a list of words that are anagrams of that string. Then instead of checking all permutations of a new string against a list, you just need to sort the characters in the string, and use that as the key to look up the list of all strings that are permutations of that string.
Perhaps you don't need to generate all elements of the set, but rather only a random or constrained subset. I have written an algorithm to generate the m-th permutation in O(n) time.
First convert the key to a list representation of itself in the factorial number system. Then iteratively pull out the item at each index specified by the new list and of the old.
module Factorial
def factorial num; (2..num).inject(:*) || 1; end
def factorial_floor num
tmp_1 = 0
1.upto(1.0/0.0) do |counter|
break [tmp_1, counter - 1] if (tmp_2 = factorial counter) > num
tmp_1 = tmp_2 #####
end # #
end # #
end # returns [factorial, integer that generates it]
# for the factorial closest to without going over num
class Array; include Factorial
def generate_swap_list key
swap_list = []
key -= (swap_list << (factorial_floor key)).last[0] while key > 0
swap_list
end
def reduce_swap_list swap_list
swap_list = swap_list.map { |x| x[1] }
((length - 1).downto 0).map { |element| swap_list.count element }
end
def keyed_permute key
apply_swaps reduce_swap_list generate_swap_list key
end
def apply_swaps swap_list
swap_list.map { |index| delete_at index }
end
end
Now, if you want to randomly sample some permutations, ruby comes with Array.shuffle!, but this will let you copy and save permutations or to iterate through the permutohedral space. Or maybe there's a way to constrain the permutation space for your purposes.
constrained_generator_thing do |val|
Array.new(sample_size) {array_to_permute.keyed_permute val}
end
Perhaps I am missing the obvious, but why not do
['a','a','b'].permutation.to_a.uniq!